Call for Discussion: New Project: Lilliput

Tue Mar 9 15:26:33 UTC 2021

----- Mail original -----
> De: "Roman Kennke" <rkennke at redhat.com>
> À: "discuss" <discuss at openjdk.java.net>
> Envoyé: Mardi 9 Mars 2021 15:39:43
> Objet: Call for Discussion: New Project: Lilliput

> We would like to propose a new project called Lilliput, with the goal of
> exploring ways to shrink the object header.
> 
> Goal:
> 1. Reduce the object header to 64 bits. It may be possible to shrink it
> down to 32 bits as a secondary goal.
> 2. Make the header layout more flexible, i.e. allow some build-time
> (possibly even run-time) configuration of how we use the bits.
> 
> Motivation:
> In 64-bit Hotspot, Java objects have an object header of 128 bits: a 64
> bit multi-purpose header (‘mark’ or ‘lock’) word and a 64-bit class
> pointer. With typical average object sizes of 5-6 words, this is quite
> significant: 2 of those words are always taken by the header. If it were
> possible to reduce the size of the header, we could significantly reduce
> memory pressure, which directly translates to one or more of (depending
> what you care about or what your workload does):
> 
> - Reduced heap usage
> - Higher object allocation rate
> - Reduced GC activity
> - Tighter packing of objects -> better cache locality
> 
> In other words, we could reduce the overall CPU and/or memory usage of
> all Java workloads, whether it’s a large in-memory database or a small
> containerized application.
> 
> 
> The object header is used (and overloaded) for the following purposes:
> 
> - Locking: the lower 3 bits are used to indicate the locking state of an
> object and the higher bits *may* be used to encode a pointer to a
> stack-allocated monitor or inflated lock object
> - GC: 4 bits are used for tracking the age of each object (in
> generational collectors). The whole header *may* be used to store
> forwarding information, depending on the collector
> - Identity hash-code: Up to 32 bits are used to store the identity hash-code
> - Type information: 64 bits are used to point to the Klass that
> describes the object type
> 
> 
> We have a wide variety of techniques to explore for allocating and
> down-sizing header fields:
> 
> - Pointers can be compressed, e.g. if we expect a maximum of, say, 8192
> classes, we could, with some careful alignment of Klass objects,
> compress the class pointer down to 13 bits: 2^13=8192 addressable
> Klasses. Similar considerations apply to stack pointers and monitors.
> - Instead of using pointers, we could use class IDs that index a lookup
> table
> - We could backfill fields which are known at compile-time (e.g.
> alignment gap or hidden fields)
> - We could use backfill fields appended to an object after the GC moved
> it (e.g. for hashcode)
> - We could use side-tables
> 
> 
> We also have a bewildering number of constraints. To name a few:
> - Performance
> - If we limit e.g. number of classes/monitors/etc that we can encode, we
> need a way to deal with overflow
> - Requires changes in assembly across all supported platforms (also
> consider 32 bits)
> - Interaction with other projects like Panama, Loom, maybe Leyden, etc
> 
> And a couple of opportunities for further work (possibly outside of this
> project):
> - If we leave arraylength in its own 64-bit field, perhaps we should
> consider 64-bit addressable arrays?
> - Improvements to hashcode. Maybe salt it to avoid repetition of nursery
> objects, maybe expand it to 64 or even 128 bit.
> 
> 
> I would propose myself as the project lead for Lilliput. :-)
> For initial committers I think we need all expertise in runtime and GC
> that we can get. From the top of my head I’m thinking of John Rose, Dave
> Dice, Andrew Dinn, Andrew Haley, Erik Österlund, Aleksey Shipilev,
> Coleen Phillimore, Stefan Karlsson, Per Liden. Please suggest anybody
> who you think should be involved in this too. (Or yourself if you want
> to be in, or if you have no interest in it.)
> 
> 
> My initial work plan is to:
> 
> - Brainstorm, collect ideas and propose techniques in the Wiki
> - Come up with a proof of concept as quickly as possible
>   - Use ZGC: no header usage
>   - Use existing class-pointer compression
>   - Shrink hashcode
> - Work from there, decide-as-we-go with insights from previous steps
> 
> 
> Please let me know what you think!

Hi Roman,
you can also reduce the size of the header for certain kind of classes, by example for a record, you know that the field are truly final so you can avoid to compute the hashCode and use the fields to calculate the identity hashCode the same way Valhalla does for the primitive classes.
for a primitive class, when they are on the heap, again, you can avoid the identity hashCode (and also the lock bits, but that's less interresting).

backfilling the hashCode by forcing an instance to be moved by the GC to use a fat class header (a header with a hashCode zone) also seems a good idea.

An a question, for the class pointer, using compact class pointer is not enough to have them using 32 bits instead of 64 bits.
For me, having the hashCode optional and having a 32 bits class pointer seems to be a good way to free 64 bits per header. 

> 
> Thanks,
> Roman

Rémi