Call for Discussion: New Project: Lilliput

Fri Mar 12 23:15:04 UTC 2021

One of the first items on the agenda for a project Lilliput should probably
be to decide which end of the object the header belongs on [1]. We may also
want to adopt terms from that work and name space in describing the main
two technical alternative; big-headerness and little-headerness? [2] ;-)

I would be happy to participate and bring some real-world experiences to
bear, as the Zing JVM has been using one word, 64 bit headers for quite a
few years, and we've built up quite a bit of practical experience with
them (and probably learned a lot about what not to do).

Specifically, in Zing, currently:

- We use a kid (klass id) instead of a klass pointer. [e.g. a 23 bit kid].
     - The is a straightforward klass table for looking up klass pointers
       from kids.

- We use pre-headers to hold identity hashes.
     - Identity hashes are computed until an object's first relocation.
     - The identity hash computation is based on the object's offset
       within the region it sits in.
     - Upon relocation of an object that has an established identity hash,
       pre-header space is allocated and the previously computed has is
       stored there.
     - The states associated (has an identity hash or not, and has a
       pre-header or not are tracked) in bit in the header.

- We use a locking scheme that does not involve header word displacement,
  and instead deals with lock state in the bottom 32 bits of the 64 bit
  header (owner tid for thin locks, monitor id for inflated locks, state
  bits).
     - One key reason for changing from a displaced header locking scheme
       was the wish to able to (continue to) extract the kid directly from
       the header, regardless of lock state. This was a key enabler for
       single word headers.
     - Background: In current hotspot, the klass pointer can be directly
       accessed in the header regardless of lock state, because it sits in
       a word that is separate from the displaced header word. But collapsing
       the klass ptr (or kid) and lock state into a single word makes it
       "hard" to keep the object klass identity at a fixed place when using
       displaced header locking (and 64 bit stack pointers). Conditionally
       accessing kid (or klass pointer) at different locations depending on
       locking state brought in a tangle of other issues, including (but not
       limited to) performance problems.

- Pre-headers, once added as a capability, can have other interesting uses.

- GC interaction, while fairly straightforward, needs robust rules and
  invariants, and will have added considerations
     - E.g. potentially-inflated object size requirements for relocation,
       coupled with the potential for concurrent (with the collector)
       first-call-to-identity-hash situations, can mess with things that
       seem obvious without it.
     - E.g. "how much empty space do you need available to hold all the
       objects in this region” becomes much more “interesting”.

— Gil.

[1] https://en.wikipedia.org/wiki/Lilliput_and_Blefuscu
[2] https://www.rfc-editor.org/ien/ien137.txt (the actual origin of the
    terms "Big Endian" and "Little Endian" as used in computer science).

> On Mar 9, 2021, at 4:39 AM, Roman Kennke <rkennke at redhat.com> wrote:
> 
> We would like to propose a new project called Lilliput, with the goal of exploring ways to shrink the object header.
> 
> Goal:
> 1. Reduce the object header to 64 bits. It may be possible to shrink it down to 32 bits as a secondary goal.
> 2. Make the header layout more flexible, i.e. allow some build-time (possibly even run-time) configuration of how we use the bits.
> 
> Motivation:
> In 64-bit Hotspot, Java objects have an object header of 128 bits: a 64 bit multi-purpose header (‘mark’ or ‘lock’) word and a 64-bit class pointer. With typical average object sizes of 5-6 words, this is quite significant: 2 of those words are always taken by the header. If it were possible to reduce the size of the header, we could significantly reduce memory pressure, which directly translates to one or more of (depending what you care about or what your workload does):
> 
> - Reduced heap usage
> - Higher object allocation rate
> - Reduced GC activity
> - Tighter packing of objects -> better cache locality
> 
> In other words, we could reduce the overall CPU and/or memory usage of all Java workloads, whether it’s a large in-memory database or a small containerized application.
> 
> 
> The object header is used (and overloaded) for the following purposes:
> 
> - Locking: the lower 3 bits are used to indicate the locking state of an object and the higher bits *may* be used to encode a pointer to a stack-allocated monitor or inflated lock object
> - GC: 4 bits are used for tracking the age of each object (in generational collectors). The whole header *may* be used to store forwarding information, depending on the collector
> - Identity hash-code: Up to 32 bits are used to store the identity hash-code
> - Type information: 64 bits are used to point to the Klass that describes the object type
> 
> 
> We have a wide variety of techniques to explore for allocating and down-sizing header fields:
> 
> - Pointers can be compressed, e.g. if we expect a maximum of, say, 8192 classes, we could, with some careful alignment of Klass objects, compress the class pointer down to 13 bits: 2^13=8192 addressable Klasses. Similar considerations apply to stack pointers and monitors.
> - Instead of using pointers, we could use class IDs that index a lookup table
> - We could backfill fields which are known at compile-time (e.g. alignment gap or hidden fields)
> - We could use backfill fields appended to an object after the GC moved it (e.g. for hashcode)
> - We could use side-tables
> 
> 
> We also have a bewildering number of constraints. To name a few:
> - Performance
> - If we limit e.g. number of classes/monitors/etc that we can encode, we need a way to deal with overflow
> - Requires changes in assembly across all supported platforms (also consider 32 bits)
> - Interaction with other projects like Panama, Loom, maybe Leyden, etc
> 
> And a couple of opportunities for further work (possibly outside of this project):
> - If we leave arraylength in its own 64-bit field, perhaps we should consider 64-bit addressable arrays?
> - Improvements to hashcode. Maybe salt it to avoid repetition of nursery objects, maybe expand it to 64 or even 128 bit.
> 
> 
> I would propose myself as the project lead for Lilliput. :-)
> For initial committers I think we need all expertise in runtime and GC that we can get. From the top of my head I’m thinking of John Rose, Dave Dice, Andrew Dinn, Andrew Haley, Erik Österlund, Aleksey Shipilev, Coleen Phillimore, Stefan Karlsson, Per Liden. Please suggest anybody who you think should be involved in this too. (Or yourself if you want to be in, or if you have no interest in it.)
> 
> 
> My initial work plan is to:
> 
> - Brainstorm, collect ideas and propose techniques in the Wiki
> - Come up with a proof of concept as quickly as possible
>  - Use ZGC: no header usage
>  - Use existing class-pointer compression
>  - Shrink hashcode
> - Work from there, decide-as-we-go with insights from previous steps
> 
> 
> Please let me know what you think!
> 
> Thanks,
> Roman