RFC (round 1), JEP draft: Low-level Object layout introspection methods

Mon Aug 17 15:13:47 UTC 2020

Hi Aleksey,

I think the way this addressOf API (and others) is used is really a key 
factor. You have a question that you want answers to, and addressOf can 
help you figure out the answer. But
knowing what the question is, seems crucial for what form the API should 
take to answer that question. And I don't think I understand well enough 
how these low-level APIs are
*really* intended to be used. What are the actual high level questions 
we want answered? I read some use cases in the JEP description, but I 
don't see how neither addresses nor
offsets have to be exposed to answer the actual high level questions.

This problem seems strikingly similar to that of measuring time. Let's 
say you would like to measure how long time it took to run your micro 
benchmark and you need an API to do
that. The most obvious solution is to expose an API that tells you how 
long time has passed since some reference point. This allows you to 
measure a start time and an end time,
and compute the difference. Excellent.
Except, now as a provider of this API, you have to go through a world of 
trouble to deal with various things like monotonicity of time counters 
on different levels in the stack.
Because the implicit expectation is that surely time never goes 
backward. Except when it does because one socket is hotter than another 
and threads migrate between sockets and
what not, and now we have to hide that from the unknowing user. To 
ensure monotonicity of the time stamps, you gotta put in stuff in the 
whole tech stack (hardware, hypervisor,
VM, OS, JVM, etc) and deal with many problems.
An alternative API would to get the same job done for this use case 
would be to expose a timer that you can start and stop and then return 
the duration. It would hide the details
about time stamps and encapsulate how to reason about them with a check 
that if the presumed duration between internal time stamps is negative, 
return zero. Voila - monotonic time
measurements.

In a very similar way, if what you wish to do is to measure the distance 
in bytes between two objects in order to measure some sense of locality 
(I think you see where I am going
with this), then the obvious API to do that is to expose the current 
address of an object. Then you can manually compute the distance by 
taking the address of one object minus
the address of the other object. Except the user code does not have 
explicit control of scheduling of safepoints between the two points of 
measurement. Therefore, the result of
computing the difference might lead to some similarly surprising 
results, including but not limited to:
* The computed distance between o1 and o2 is zero bytes, but it is not 
the same object. Impossible! Except when it isn't.
* The computed distance between o1 and o2 is non-zero bytes, but it is 
the same object. Impossible! Except when it isn't.
* The computed distance between o1 and o2 does not reflect the actual 
distance between o1 and o2 at any snapshot of time. Impossible! Except 
when it isn't.

Again we have exposed something slippery exposed to a lot of 
implementation weirdness, like a time stamp, and let the user of the API 
know how to interpret relative differences,
hoping they won't get surprised by potential implementation artifacts 
(like relocation, slippery safepoints, pointer tagging, pointer 
compression schemes, etc).

Perhaps, another way of answering the same question without the 
addressOf API, is to have an estimatedDistanceInBytes(Object o1, Object 
o2) or even an
estimatedDistanceInBytes(Object o1, Field f1, Object o2, Field f2) API. 
This API could run in a mode where there are no
safepoints, and ensure that none of the above mentioned "impossible" 
situations actually remain impossible, and hence more effectively 
actually answering the high-level question.
It would also importantly never expose any addresses or offsets, while 
still allowing various locality heuristics to be computed by performance 
people.

As for the other question - what is the cache line alignment of this 
object - a similar API closer to the high level question could be built, 
like: alignmentInBytes(Object o1, int alignment),
where alignment is a power of 2 up to a "reasonable" size that allows 
you to answer all the questions you had about the object layout, without 
exposing its address. Although we might want
to think a bit more about this one. We don't want to hardcode 
assumptions that the alignmentInBytes of an object is where its oop is 
pointing at a cache line. There are no oops in the user
model. If you for example consider an alternative JVM implementation 
like Jikes RVM, then the object pointer is at an offset into the object 
payload (in fact where an array payload would
start). This was done to reduce the instructions needed for RISC 
processors to perform array element addressing. This reinforces that a 
JVM implementation might choose to have their object
pointers point at different offsets either before or after the payload 
of where its memory cell begin. In the previous Shenandoah design 
(before the LRB), it would for example point one
word into the payload of the cell. And that would still be before the 
payload of the object - a rather arbitrary point. So we would have to 
have some kind of consensus about what offset
into the payload to expose the alignment of. Where the fields start? 
Once again, the nature of the question becomes very relevant. Because is 
the root question really "what is the cache
line alignment of my object", or is it "what is the cache line alignment 
of the field foo", which might be better represented with 
alignmentInBytes(Object o1, Field field, int alignment).
That would allow a possibly less sensitive and more heuristic in nature 
piece of information, to leave the JVM. But then why are you asking what 
the alignment of the field is really?
Is there yet a more high level question we are trying to answer, such as 
"is field foo and bar on the same cache line in object o?". Now at this 
high level, we are getting to a point where
we could expose just a boolean that is heuristic in nature. This would 
allow covering up many thinkable implementation choices from the user of 
the API.

I, like Brian, am not a big fan of exposing the very concept that fields 
have offsets or that objects have addresses for that matter. For me, the 
exposed user model is that objects don't
have addresses, they are logical concepts composed of a mapping to 
"memory cells" that may or may not be 1:1. As you know better than most 
people, there are GC designs that do not have
a to-space invariant. Like Shenandoah a few years ago, and seemingly the 
new Alibaba Platinum GC. There are also GC designs where fields are not 
offsets, like Schism and Jamaica VM in
the literature, that split objects into multiple fixed sized memory 
cells, to combat fragmentation without the need for compaction.
I think that we want to maintain as much flexibility as we can for 
future GC algorithms to thrive on the Java platform, and not perform the 
same mistake that languages like Go did by
exposing the address of the objects, and hence forever closing the door 
to moving garbage collection for the go platform. I hope you understand 
where I am coming from as a GC maintainer.

TL/DR: Exposing the low level details about object layout that requires 
the user to know implementation details such as the very concept that an 
object is associated with an address, or
that fields are associated with an offset, ought to be the last possible 
resort. And I have a feeling that if we look more closely at the high 
level questions you really want answers for
with the proposed low-level API, we might be able to design more 
high-level APIs, closer to the original questions, that might be both 
more effective at answering such questions without
strange implementation anomalies due to leaving the measuring between 
different relative points of data be done in an uncontrolled fashion by 
users, instead of by letting the JVM know in
the API what you are really asking), and hide more implementation 
details (addresses, offsets) we don't want to expose, at the same time.

I have heard two high-level questions that addressOf is proposed to answer:
* What is the byte distance between o1 and o2
* What is the alignment of o1?

As mentioned, zooming out even one more step, are we asking these 
questions as a means to an end, or because we have another even more 
high level question? Perhaps if we get to the root
of what we would like to find out, we might never have to expose neither 
offsets of fields nor addresses of objects. Because why do you need them 
if not to answer an even more high level
question about the layout of an object, e.g. "is foo of obj1 and bar of 
obj2 on the same cache line" or "what is the byte distance between foo 
of obj1 and bar of obj2"? When I read your
JEP description, it seems like those are indeed the kind of high level 
questions we want answered really. And for that, I think a much more 
high level API would be more appropriate.

What do you think?

Thanks,
/Erik

On 2020-08-17 14:33, Aleksey Shipilev wrote:
> On 8/16/20 12:41 PM, Peter Levart wrote:
>> On 8/11/20 12:22 PM, Aleksey Shipilev wrote:
>>> ...but dislike:
>>>       public static long addressOf(Object obj);
>>>       public static long fieldOffsetOf(Field field);
>> What exactly is the purpose of "addressOf" method in terms of
>> information API? Is it used to estimate relative placement of several
>> objects in the heap to see how they are scattered around which affects
>> the CPU cache performance when accessing them?
> Yes, it says so in "Motivation" section in JEP. Additionally, checking the object address against
> the cache line size.
>
>> If this is the case, then maybe the method could return a "mangled"
>> address: the address + some secret random value calculated once for the
>> whole VM.
> Now that is an interesting suggestion!
>
> Implemented here:
>    https://hg.openjdk.java.net/jdk/sandbox/rev/248807bfa78e
>
> There is little-to-none loss of performance, because the offset can be trivially used in intrinsics.
> JEP text is updated to mention this technique. I believe this makes the address exposure story less
> problematic, although the result is still conceptually a useful proxy for a memory location.
>