RFC (round 1), JEP draft: Low-level Object layout introspection methods

Fri Aug 7 19:53:28 UTC 2020

> I would like to solicit early feedback on "Low-level Object layout 
> introspection methods" JEP:
> https://openjdk.java.net/jeps/8249196

I enjoyed reading this JEP draft; clearly a lot of thought has gone into 
the technical details.  I find much to like, and some to dislike, in 
this proposal.

First, the good news.  I am sympathetic to the needs that certain 
libraries have to assess memory usage of objects, such as for cache 
eviction.  While to some level this is an impossible problem (how do we 
know whether an object reference buried in a graph is aliased or not?), 
providing users with better ways of measuring cost than simply counting 
cache entries is a noble cause, and the work you've done on 
`estimateSize` definitely is a step forward.

I am also sympathetic for the desire to better understand layout, for 
purposes of optimizing footprint ("cool, I can stash this field in the 
alignment shadow") and cache efficiency ("how do I get these fields so 
they are more likely on the same cache line.")  While such feedback can 
never be relied on 100%, I agree it can be useful to have this 
information so that one can make more accurate estimates of cost.

Now, the bad news: I have deep objections to several of the sub-features 
in this JEP (unfortunately some of them even overlap with the second 
point above.)  Specifically, I have deep misgivings about exposing field 
offsets, and deeper misgivings about exposing object addresses.

For context, remember that we are in the middle of a deliberate, 
decade-long transition away from Unsafe.  We knew that we can't just 
turn off Unsafe immediately, but we also knew we had to wean people from 
Unsafe -- and not only from the specific class, but from some of the 
concepts.  And of course we can't do so without providing good-enough 
replacements for at least some of the use cases, so it will necessarily 
take time.  (Reasonable people can reasonably disagree on the meaning of 
"good enough" and which use cases, but that's a separate discussion.)

The down payment on this plan was VarHandles; a key goal here was 
obviate the need for using Unsafe to access data in the Java heap. 
Naturally it took some time to flesh out the feature set, shake out the 
performance, and adapt the JDK code to use it, but the goal is clear -- 
there should be no excuse for using Unsafe to access the Java heap at 
all.   And we're  well on the way.

People still use Unsafe to access off-heap memory, but we're working on 
a plan there too -- the Panama foreign memory access API. This is less 
mature than VarHandles, but the goal is similar; by the time we're done, 
there should be no excuse to use Unsafe for access to memory at all, 
ever, because there will be better, safer, supported APIs that do the 
same thing with comparable performance. (Similarly, Panama has a notion 
of Layout, and it is possible that, over time, it might be possible to 
get a Layout for a Java object, and access it through the safer Panama 
APIs.)

The very model that Unsafe assumes for on-heap data (and therefore 
encourages users to assume) -- that a field offset is a fixed thing that 
can be relied on -- undermines the VMs ability to optimize.  As just one 
example, the VM could dynamically redo layout based on profiling data 
(putting frequently-accessed-together fields so they are in the same 
cache line) and rewrite existing instances during GC.  In a world where 
even one user can have Unsafe, then _no one_ can ever have this 
optimization.  This seems a bad trade! Unfortunately the proposal to 
include field offset and address information in the API grabs the ball 
and runs not only to the opposite goal line, but into the next stadium 
-- by elevating this model to "something you rely on at your own risk" 
to "something we promise Java will be constrained by until the end of 
time."  No thank you!  And it seems especially bad to pour fresh 
concrete on "no one can have this ever" when we're five years into 
jackhammering away the old dam.

In fact, the next methods I would like to _remove_ from Unsafe are the 
get-field-offset and get-object-address methods.   You might reply you 
are enabling this goal, but that would be mistaking the letter for the 
spirit: I want people to stop assuming that fields HAVE an offset, or 
that Java objects HAVE an address.  And, what else would someone do with 
the address and offset, other than provide it to Unsafe (or to the 
native equivalent)?  This is propping up the wrong model, and runs 
counter to the long-term direction we've been on for years, and are 
already halfway there on.

So, while there are some things in this JEP that seem quite cool and 
which I enthusiastically support, there are some fundamental assumptions 
-- that field offsets and object addresses should be accessible to Java 
code -- that I disagree with.

Setting aside philosophy and stewardship for a moment, I think what's 
really going on is that this JEP is trying to serve two different 
audiences: those that are happy to have estimates and/or use the 
information for making approximate decisions, and those who are 
interested in low-level hackery for accessing the Java heap, rather than 
through `getfield` and `putfield`.  I think if you were to split this 
into two JEPs, you would find enthusiastic support for the first, and 
strong resistance to the second.

To the extent that there is overlap between these groups (e.g., 
fieldOffsetOf is useful to offline tools that may be used to estimate 
layouts, as well as online users who would be tempted to believe that 
this is really a field offset), some creativity will be required to keep 
the first from becoming an attractive nuisance to the second.

So, my feedback is "glass half full" -- I'm happy to see the JEP that 
provides estimates and such.  My recommendation is to focus on that half 
for now.

Cheers,
-Brian