Call for Discussion: New Project: Lilliput

Mon Mar 15 17:46:50 UTC 2021

Hi John, as you suggested I collected (that is, started to collect) my 
notes here:

http://cr.openjdk.java.net/~rkennke/lilliput.md

If nobody yells at me to stop it, I would go ahead and send the official 
project proposal for voting.

As soon as the project gets off the ground, I would transfer my notes to 
the project wiki.

Please let me know if anybody else should be in the initial committers 
list. It's easier to add them now rather than later, and it's useful for 
project wiki access (and code of course). My list so far is:

John Rose, Dave Dice, Andrew Dinn, Andrew Haley, Erik Österlund, Aleksey 
Shipilev, Coleen Phillimore, Stefan Karlsson, Per Liden, Thomas Stuefe, 
Gil Tene, David Holmes, Kim Barrett

Cheerio,
Roman

> Many people, including myself, have puzzled for years
> over the problem of header size.  I am glad to see a frontal
> assault being made on it now!  There is a bewilderingly
> complex shadowy thicket of potential partial solutions,
> including side tables, optional backfill zones (good term,
> Remi), data compression,  bitfields, tagged pointers, and
> compute vs. store trade-offs, for klass, GC mark, hash,
> and lock.  Welcome to the thicket!
> 
> I think you have already gathered a list of potential
> techniques.  It would be very good for you to publish
> your notes somewhere, to help give more shape to
> the conversation.
> 
> I personally use my page on cr.ojn for such notes,
> and that’s what I recommend now.
> 
> While it would be better to use a wiki somewhere
> for collaborative edits, there aren’t any good options
> under ojn for this (due to inconvenient rules for
> authoring the wiki).  So cr.ojn is a good start, and
> when the project boots up you can have a notes
> file (or files) in the repo itself.
> 
> — John
> 
> P.S. I want to give a shout-out to Dave and Alex’s
> paper https://arxiv.org/pdf/2102.04188.pdf about
> rethinking the lock word.  This work would
> probably give us lots of “slack” for crunching
> down the fixed monitor overhead.
> 
> On Mar 9, 2021, at 6:39 AM, Roman Kennke <rkennke at redhat.com> wrote:
>>
>> We would like to propose a new project called Lilliput, with the goal of exploring ways to shrink the object header.
>>
>> Goal:
>> 1. Reduce the object header to 64 bits. It may be possible to shrink it down to 32 bits as a secondary goal.
>> 2. Make the header layout more flexible, i.e. allow some build-time (possibly even run-time) configuration of how we use the bits.
>>
>> Motivation:
>> In 64-bit Hotspot, Java objects have an object header of 128 bits: a 64 bit multi-purpose header (‘mark’ or ‘lock’) word and a 64-bit class pointer. With typical average object sizes of 5-6 words, this is quite significant: 2 of those words are always taken by the header. If it were possible to reduce the size of the header, we could significantly reduce memory pressure, which directly translates to one or more of (depending what you care about or what your workload does):
>>
>> - Reduced heap usage
>> - Higher object allocation rate
>> - Reduced GC activity
>> - Tighter packing of objects -> better cache locality
>>
>> In other words, we could reduce the overall CPU and/or memory usage of all Java workloads, whether it’s a large in-memory database or a small containerized application.
>>
>>
>> The object header is used (and overloaded) for the following purposes:
>>
>> - Locking: the lower 3 bits are used to indicate the locking state of an object and the higher bits *may* be used to encode a pointer to a stack-allocated monitor or inflated lock object
>> - GC: 4 bits are used for tracking the age of each object (in generational collectors). The whole header *may* be used to store forwarding information, depending on the collector
>> - Identity hash-code: Up to 32 bits are used to store the identity hash-code
>> - Type information: 64 bits are used to point to the Klass that describes the object type
>>
>>
>> We have a wide variety of techniques to explore for allocating and down-sizing header fields:
>>
>> - Pointers can be compressed, e.g. if we expect a maximum of, say, 8192 classes, we could, with some careful alignment of Klass objects, compress the class pointer down to 13 bits: 2^13=8192 addressable Klasses. Similar considerations apply to stack pointers and monitors.
>> - Instead of using pointers, we could use class IDs that index a lookup table
>> - We could backfill fields which are known at compile-time (e.g. alignment gap or hidden fields)
>> - We could use backfill fields appended to an object after the GC moved it (e.g. for hashcode)
>> - We could use side-tables
>>
>>
>> We also have a bewildering number of constraints. To name a few:
>> - Performance
>> - If we limit e.g. number of classes/monitors/etc that we can encode, we need a way to deal with overflow
>> - Requires changes in assembly across all supported platforms (also consider 32 bits)
>> - Interaction with other projects like Panama, Loom, maybe Leyden, etc
>>
>> And a couple of opportunities for further work (possibly outside of this project):
>> - If we leave arraylength in its own 64-bit field, perhaps we should consider 64-bit addressable arrays?
>> - Improvements to hashcode. Maybe salt it to avoid repetition of nursery objects, maybe expand it to 64 or even 128 bit.
>>
>>
>> I would propose myself as the project lead for Lilliput. :-)
>> For initial committers I think we need all expertise in runtime and GC that we can get. From the top of my head I’m thinking of John Rose, Dave Dice, Andrew Dinn, Andrew Haley, Erik Österlund, Aleksey Shipilev, Coleen Phillimore, Stefan Karlsson, Per Liden. Please suggest anybody who you think should be involved in this too. (Or yourself if you want to be in, or if you have no interest in it.)
>>
>>
>> My initial work plan is to:
>>
>> - Brainstorm, collect ideas and propose techniques in the Wiki
>> - Come up with a proof of concept as quickly as possible
>>   - Use ZGC: no header usage
>>   - Use existing class-pointer compression
>>   - Shrink hashcode
>> - Work from there, decide-as-we-go with insights from previous steps
>>
>>
>> Please let me know what you think!
>>
>> Thanks,
>> Roman
>>
>