RFC (round 1), JEP draft: Low-level Object layout introspection methods

Tue Aug 11 10:22:35 UTC 2020

Hi Brian,

Thanks for taking a look!

I think we need to get one thing very straight: these are best-effort informational APIs.

They are not required to work 100% reliably. They are expected to provide reasonable answers where
possible, until something else gets in the way, including disabled feature flags, corner cases where
current and future optimizations turn the answers ambiguous, future work where layouts change on the
fly, etc.

I tried to be clear about this in JEP text, does that intent register when you read the text?

On 8/7/20 9:53 PM, Brian Goetz wrote:
> People still use Unsafe to access off-heap memory, but we're working on a plan there too -- the
> Panama foreign memory access API. This is less mature than VarHandles, but the goal is similar; by
> the time we're done, there should be no excuse to use Unsafe for access to memory at all, ever,
> because there will be better, safer, supported APIs that do the same thing with comparable
> performance.  (Similarly, Panama has a notion of Layout, and it is possible that, over time, it
> might be possible to get a Layout for a Java object, and access it through the safer Panama APIs.) 

OK, so let's look a bit forward. Assume Panama does the layout for Java objects. Does it mean that
current APIs, that is the MemoryLayout.{bitOffset|byteOffset}:

https://hg.openjdk.java.net/jdk/jdk/file/60612063f75a/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemoryLayout.java#l332

...apply to Java fields as well? This looks similar to what proposed fieldOffsetOf does. This seems
to run counter to the idea that "fields do not have offsets", and that must mean Panama should never
do this for heap objects?

> The very model that Unsafe assumes for on-heap data (and therefore encourages users to assume) --
> that a field offset is a fixed thing that can be relied on -- undermines the VMs ability to
> optimize.  As just one example, the VM could dynamically redo layout based on profiling data
> (putting frequently-accessed-together fields so they are in the same cache line) and rewrite
> existing instances during GC.  In a world where even one user can have Unsafe, then _no one_ can
> ever have this optimization.  This seems a bad trade! 

Yes, I agree with this part.

> Unfortunately the proposal to include field offset and address information in the API grabs the
> ball and runs not only to the opposite goal line, but into the next stadium -- by elevating this
> model to "something you rely on at your own risk" to "something we promise Java will be
> constrained by until the end of time."  No thank you! And it seems especially bad to pour fresh
> concrete on "no one can have this ever" when we're five years into jackhammering away the old
> dam.
But I disagree with this part. I don't think the existence of the best-effort informational APIs
undermine the future work. If anything comes up that goes against those APIs, then APIs are told to
get off our collective lawn, and start replying "don't know". But until that (and even if that)
happens, it seems odd to deprive people from getting useful best-effort answers today.

> In fact, the next methods I would like to _remove_ from Unsafe are the get-field-offset and
> get-object-address methods.   You might reply you are enabling this goal, but that would be
> mistaking the letter for the spirit: I want people to stop assuming that fields HAVE an offset, or
> that Java objects HAVE an address.  And, what else would someone do with the address and offset,
> other than provide it to Unsafe (or to the native equivalent)?  

Before I address the core of this argument below, let me mention there is the answer to that
question in JEP text: JOL would introspect the internal and external object layout; Java code would
test that padding really works; etc. It would be a mistake to paint Unsafe as the only use.

> So, while there are some things in this JEP that seem quite cool and which I enthusiastically
> support, there are some fundamental assumptions -- that field offsets and object addresses should be
> accessible to Java code -- that I disagree with.

I agree that from the pure Java standpoint, language objects/fields are more or less the platonic
ideals of themselves. Nothing is really known about them, except the limited set of their expected
behaviors.

But we cannot be oblivious that implementation details also matter. I would argue that object/field
sizes are also such an implementation detail. Yet, we seem to agree that object/field sizes are good
to know. So it follows that the problem here is not object/field sizes being the implementation
details. We are able to put fundamental/idealistic assumptions on the side, if we really want it.

A funny example of such the implementation details concession is not even JVMTI/Instrumentation, but
the core libraries themselves:
  https://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html#SIZE

I guess the real question is where to put something that goes beyond the "pure Java" and provides
the implementation details. So it could avoid the impression of telling "pure Java" facts? I put
prototype methods in Runtime, because that looked like one of the least "pure Java" facilities. I'd
be happy to learn some other place where such API can be put :)

> Setting aside philosophy and stewardship for a moment, I think what's really going on is that this
> JEP is trying to serve two different audiences: those that are happy to have estimates and/or use
> the information for making approximate decisions, and those who are interested in low-level hackery
> for accessing the Java heap, rather than through `getfield` and `putfield`.  I think if you were to
> split this into two JEPs, you would find enthusiastic support for the first, and strong resistance
> to the second. 

With my JEP author hat on: JEP caters for low-level introspection use case, such as JOL and JAMM. It
does not intend to cater for "low-level hackery for accessing the Java heap", and it tries to
specifically make the argument that new APIs do not allow this low-level hackery beyond what is
possible today and in a foreseeable future.

I can see that the JEP can be split in "non-controversial" (sizeOf, deepSizeOf) and "controversial"
(addressOf, fieldOffsetOf) parts. For the process overhead reasons, I'd push forward with just one
JEP at the moment, though.

> To the extent that there is overlap between these groups (e.g., fieldOffsetOf is useful to offline
> tools that may be used to estimate layouts, as well as online users who would be tempted to believe
> that this is really a field offset), some creativity will be required to keep the first from
> becoming an attractive nuisance to the second.

I am open for suggestions :)

One of the reasons the prototype has RuntimeAddressOf and RuntimeOffsetsOf JVM flags (as much as I
hate having another JVM option) is to consider if they can be disabled by default, so online users
would have to explicitly opt-in for using these facilities:

https://builds.shipilev.net/patch-openjdk-jdk-jep-8249196/src/hotspot/share/runtime/globals.hpp.sdiff.html

I do believe the Java API is useful, to enable Java-only libraries like JOL. Otherwise, the argument
could be made to make fieldOffsetOf-like data available by turning PrintFieldLayout into product VM
flag, and then let JOL parse the output of forked JVM. That's tedious, but doable. addressOf cannot
be done like that, because it requires passing the object reference somehow.

> So, my feedback is "glass half full" -- I'm happy to see the JEP that provides estimates and such. 
> My recommendation is to focus on that half for now.

Let me ask the specific question: which APIs do you like, and which do you dislike?

I gather you like:
    public static long sizeOf(Object obj);
    public static long deepSizeOf(Object obj);
    public static long deepSizeOf(Object obj, ToLongFunction<Object> includeCheck);

...but dislike:
    public static long addressOf(Object obj);
    public static long fieldOffsetOf(Field field);

...and it is not clear what is your position on:
    public static long fieldSizeOf(Field field);

-- 
Thanks,
-Aleksey