Flattening to date

Mon Apr 25 14:52:17 UTC 2022

Let me give a brief overview of where things are with respect to flattening, since some of this influences the user-model discussion Kevin has initiated.)  This is a very rough sketch, and not written for a general audience, so if you’re tempted to post this to Twitter because it seems cool and curiosity-satisfying, while I can’t stop you, you’re probably anti-helping.)

Layout is always at the discretion of the JVM; that’s how we like it. There will be no directives for “forcing” any kind of layout, including flattening.  The JVM always has the option of indirecting with a pointer.  Currently it always does this for object references, and never does this for primitives.  For Bucket 1 classes, we will almost certainly continue to lay out an LBucket1 as a pointer.  (Remember that layout of an object with an LFoo field often happens before Foo is loaded; flattening introduces an ordering edge into the class loading graph.)  

Most people think of flattening as being only flattening of heap layouts, but there is also flattening in the calling convention, and this can be a huge source of benefit.  Flattening in the calling convention means that rather than passing an aggregate to or from an out-of-line call via a pointer, we scalarize the value and pass the field values instead.   Calling convention is generally determined early in the run, so if we load the class after the calling convention is set, we may miss out on this.  

For a reference type (e.g., B2 classes, and B3.ref), we are constrained by two properties of reference-ness; the need to represent null, and the JMM constraint that loads and stores of references are atomic with respect to one another.  (This is where tear-freedom comes from.)  Nullity can be represented as some sort of footprint tax (inject a boolean, or reinterpret slack bits such as low order pointer bits in existing fields.)  Tearing is not relevant to stack (calling convention) flattening, so even L types can get flattening on the stack.  

I’ll pause because this is sort of amazing: an LB2, while a reference type, is, in the current implementation, routinely flattened in calling convention, using an extra synthetic field for null.  If you thought references were always indirections, you’ll be surprised.  Long chains of things like Optional.map(…).flatMap(…) are routinely allocation-free in C2-compiled code, even for out-of-line calls.  (The interpreter and C1 still use indirections on the stack and in locals.)

In the heap, this is where reference types (including B2) have some trouble.  The atomicity requirement bites hard here.  References in the heap are routinely laid out as indirections.  Final references to id-free instances _could_ safely be flattened, but they are not yet.  Mutable references to id-free instances are problematic because of potential tearing.  We *could* (but do not yet, and its complicated) flatten 64 bit values by stuffing multiple 32 bit values into a single synthetic field or by storing/loading multiple fields with a single load (“fat loads”), and on platforms with fast 128 bit atomics (which include some intel cores where the spec was recently revised to commit to atomicity), but the complexity cost here is high, and flattening would be limited by the instruction set.  This is under investigation but unlikely to be a magic bullet.  

In the heap, Q types (B3.val) can be fully flattened (though the VM will likely impose a threshold above which it uses indirections anyway, such as 512 bits.)  Full flattening means not only the layout, but that we can access the fields of the nested object with narrow single-field loads and stores.  

Scorecard:
 - Identity-free reference (L) types can be flattened, within limits (which is amazing)
 - Identity-free reference types usually pay some footprint tax for the null channel
 - Identity-free reference types are routinely flattened on stack, and may get some more heap flattening in the future
 - Identity-free immediate types have no null channel, and can be fully flattened and accessed with narrow loads and stores, because they’re allowed to tear

To the extent we treat B2 and B3.ref the same way (which we want to),  any flattening wins for refs (e.g., final fields, fat access) will apply to both.