for the record: header bit dynamics could possibly change after Leyden training runs

Wed Aug 28 22:28:49 UTC 2024

There’s a possible new consideration to take into account in
planning future reductions to header size, which I want to
state for the record.  (And NOT for immediate action!
Lilliput has lots to do for the present release.)

In https://bugs.openjdk.org/browse/JDK-8198331 there is
a discussion of tactics for moving id-hash and synch-control
bits out of the header, into side tables.  This RFE proposed
try to get rid of the mark word.  Lilliput subsumes that RFE.
Lilliput gives more work to mark word, filling it with klass,
id-hash, and synch-control, and also GC forwarding.

So far so good; I think we are getting to a good place.

But we can still see a basic fact about both id-hash and
synch, and that is that most classes don’t use them, most
of the time.  (There are exceptions: serialization can use
id-hash indiscriminately.)

If we only had a way to determine which classes actually
need to be provisioned with id-hash support, then other classes
could avoid surrendering any header bits at all to the needs
of id-hash.  For occasional outlier uses of id-hash, if they
are rare enough, a slow path and/or a side data structure
would be adequate.  The same point is true for synch support.

There may be a new way coming to determine which classes actually
do use the VM’s runtime support for id-hash and/or synch.
That is a Leyden training run.  A training run executes somebody’s
application on a representative workload, and then dumps all kinds
of information about what happened during the run.  It would be
pretty easy to record (during a training run) whether or not any
given class EVER used id-hash, and/or EVER used synchronization,
on its instances.  That information, fed forward via a Leyden
AOT cache, could cause the VM to make a different decision about
object layout, for classes that do, or do not, appear to need
those runtime resources.  A class that needs a resource, such
as storage for id-hash, or some pointer or counter to help with
synch, could allocate that resource in an extension field in
the object layout.  You will probably recognize that this is
the same basic idea as the “big klass pointer” idea which
could allow headers to have intentionally substandard klass
ID fields, as long as there is a way to put the overflow
bits elsewhere.

There are lots of pitfalls here, notably complexity of
implementation and maintenance costs. (And this may be
poor timing on my part for bringing this up at all.  Sorry,
but not sorry: This timing is driven by Leyden more than
Lilliput.)  Another pitfall is the increase of instructions
it requires to read a header which has more and more
variant layouts; this was why I was careful to observe
(for the klass ID overflow tricks) that there might be
flow-free idioms, which at least don’t burden the branch
predictor with a new bunch of odd jobs.  If I’m right,
though, that a fast-path/slow-path scheme might work
(for id-hash and/or synch), then the problem boils down
to making the slow path detection as fast as possible,
and keeping the fast path simple.  This might be doable
by (a) using a sentinel bit in the header and (b) putting
the injected field (holding the runtime resource) at a
fixed offset, at least in 99% of the cases.

I don’t want to pursue this in detail, but I would like to put
it out here in the spirit of brainstorming the sorts of new
tradeoffs that we can make about object layout, in the presence
of relatively trustworthy speculations about object dynamics,
as obtained from Leyden training runs.  It does seem to be
a fundamentally new move for making tradeoffs about those
heavily contended header bits.  Not now, but certainly in
a foreseeable future.

On leyden-dev we may sometimes discuss other possible “tricks”
in this vein, of profile-based format adjustments.  Here is a
link to a companion note I just wrote, to go with this one:

https://mail.openjdk.org/pipermail/leyden-dev/2024-August/000907.html