Call for Discussion: New Project: Lilliput

Fri Mar 12 17:23:49 UTC 2021

Ah btw, according to the bylaw, we need one Group Lead (in this case, I 
expect Hotspot group, that is Vladimir Kozlov) declare to be sponsor of 
this project.

Thanks,
Roman

> We would like to propose a new project called Lilliput, with the goal of 
> exploring ways to shrink the object header.
> 
> Goal:
> 1. Reduce the object header to 64 bits. It may be possible to shrink it 
> down to 32 bits as a secondary goal.
> 2. Make the header layout more flexible, i.e. allow some build-time 
> (possibly even run-time) configuration of how we use the bits.
> 
> Motivation:
> In 64-bit Hotspot, Java objects have an object header of 128 bits: a 64 
> bit multi-purpose header (‘mark’ or ‘lock’) word and a 64-bit class 
> pointer. With typical average object sizes of 5-6 words, this is quite 
> significant: 2 of those words are always taken by the header. If it were 
> possible to reduce the size of the header, we could significantly reduce 
> memory pressure, which directly translates to one or more of (depending 
> what you care about or what your workload does):
> 
> - Reduced heap usage
> - Higher object allocation rate
> - Reduced GC activity
> - Tighter packing of objects -> better cache locality
> 
> In other words, we could reduce the overall CPU and/or memory usage of 
> all Java workloads, whether it’s a large in-memory database or a small 
> containerized application.
> 
> 
> The object header is used (and overloaded) for the following purposes:
> 
> - Locking: the lower 3 bits are used to indicate the locking state of an 
> object and the higher bits *may* be used to encode a pointer to a 
> stack-allocated monitor or inflated lock object
> - GC: 4 bits are used for tracking the age of each object (in 
> generational collectors). The whole header *may* be used to store 
> forwarding information, depending on the collector
> - Identity hash-code: Up to 32 bits are used to store the identity 
> hash-code
> - Type information: 64 bits are used to point to the Klass that 
> describes the object type
> 
> 
> We have a wide variety of techniques to explore for allocating and 
> down-sizing header fields:
> 
> - Pointers can be compressed, e.g. if we expect a maximum of, say, 8192 
> classes, we could, with some careful alignment of Klass objects, 
> compress the class pointer down to 13 bits: 2^13=8192 addressable 
> Klasses. Similar considerations apply to stack pointers and monitors.
> - Instead of using pointers, we could use class IDs that index a lookup 
> table
> - We could backfill fields which are known at compile-time (e.g. 
> alignment gap or hidden fields)
> - We could use backfill fields appended to an object after the GC moved 
> it (e.g. for hashcode)
> - We could use side-tables
> 
> 
> We also have a bewildering number of constraints. To name a few:
> - Performance
> - If we limit e.g. number of classes/monitors/etc that we can encode, we 
> need a way to deal with overflow
> - Requires changes in assembly across all supported platforms (also 
> consider 32 bits)
> - Interaction with other projects like Panama, Loom, maybe Leyden, etc
> 
> And a couple of opportunities for further work (possibly outside of this 
> project):
> - If we leave arraylength in its own 64-bit field, perhaps we should 
> consider 64-bit addressable arrays?
> - Improvements to hashcode. Maybe salt it to avoid repetition of nursery 
> objects, maybe expand it to 64 or even 128 bit.
> 
> 
> I would propose myself as the project lead for Lilliput. :-)
> For initial committers I think we need all expertise in runtime and GC 
> that we can get. From the top of my head I’m thinking of John Rose, Dave 
> Dice, Andrew Dinn, Andrew Haley, Erik Österlund, Aleksey Shipilev, 
> Coleen Phillimore, Stefan Karlsson, Per Liden. Please suggest anybody 
> who you think should be involved in this too. (Or yourself if you want 
> to be in, or if you have no interest in it.)
> 
> 
> My initial work plan is to:
> 
> - Brainstorm, collect ideas and propose techniques in the Wiki
> - Come up with a proof of concept as quickly as possible
>    - Use ZGC: no header usage
>    - Use existing class-pointer compression
>    - Shrink hashcode
> - Work from there, decide-as-we-go with insights from previous steps
> 
> 
> Please let me know what you think!
> 
> Thanks,
> Roman