Null channels (was: User model stacking)

Tue May 3 17:56:04 UTC 2022

About six months ago we started working on flattening references in 
calling conventions in the Valhalla repos.  We use the Preload attribute 
to force preloading of classes that are known to be (or expected to be) 
value classes, but which are referenced only via L descriptors, so that 
at the (early) time that calling convention is chosen, we have the 
additional information that that this is an identity-free class.  In 
these cases, we scalarize the calling convention as we do with Q types, 
but we add an extra boolean channel for null; it is as if we add a 
boolean field to the object layout.  When we adapt between the 
scalarized and indirected forms (e.g., c2i adapters), we apply the 
obvious semantics to the null channel.

We have not yet applied the same treatment to field layout, but we can 
(and it has the same timing constraints, so it also needs Preload), and 
the VM has additional degrees of implementation freedom in doing so.  
The simplest is to let the layout engine choose to flatten a preloaded L 
value type by injecting a boolean field which represents nullity, and 
adapting null checks to check this field (which can be hoisted etc.)

The layout engine has other tricks available to it as well, to further 
reduce the footprint of representing "might be null", if it can find 
suitable slack space in the representation.  Such tricks could include 
using slack bits in boolean fields (potentially seven of them), low 
order bits of pointers (a la compressed OOPs), unused color bits of 64 
bit pointers, etc.  Some of these choices require transforms on 
load/store (e.g., those that use pointer bits), not unlike what we do 
with compressed OOPs.  This is entirely "VM's choice" and affects only 
quality of implementation; there is nothing in the classfile that 
conditions this, other than the ACC_VALUE indication and L/Q type 
carriers.  So the VM has a rich set of footprint/computation tradeoffs 
for encoding the null channel, but logically, it is an "extra boolean 
field" that all nullable value types have.

> I'd like to reserve judgement on this stacking as I'm uncomfortable
> (uncertain maybe?) about the practicality of the extra null channel.
> Without having validated the extra null channel, I'm concerned we're
> exposing a broader set of options in the language that will, in
> practice, map down to the existing 3 buckets we've been talking about.
> Maybe this factoring allows a slightly larger number of classes to be
> flattened or leaves the door open for them to get it in the future?

What I'm trying to do here is decomplect flattening from nullity. Right 
now, we have an unfortunate interaction which both makes certain 
combinations impossible, and makes the user model harder to reason about.

Identity-freedom unlocks flattening in the stack (calling convention.)  
The lesson of that exercise (which was somewhat surprising, but good) is 
that nullity is mostly a non-issue here -- we can treat the nullity 
information as just being an extra state component when scalarizing, 
with some straightforward fixups when we adapt between direct and 
indirect representations.  This is great, because we're not asking users 
to choose between nullability and flattening; users pick the combination 
of { identity, nullability } they want, and they get the best flattening 
we can give:

     case (identity, _) -> 1; // no flattening
     case (non-identity, non-nullable) -> nFields;  // scalarize fields
     case (non-identity, nullable) -> nFields + 1;  // scalarize fields 
with extra null channel

Asking for nullability on top of non-identity means only that there is a 
little more "footprint" in the calling convention, but not a qualitative 
difference.  That's good.

In the heap, it is a different story.  What unlocks flattening in the 
heap (in addition to identity-freedom) is some permission for 
_non-atomicity_ of loads and stores.  For sufficiently simple classes 
(e.g., one int field) this is a non-issue, but because loads and stores 
of references must be atomic (at least, according to the current JMM), 
references to wide values (B2 and B3.ref) cannot be flattened as much as 
B3.val.  There are various tricks we can do (e.g., stuffing two 32 bit 
fields into a 64 bit atomic) to increase the number of classes that can 
get good flattening, but it hits a wall much faster than "primitives".

What I'd like is for the flattening story on the heap and the stack to 
be as similar as possible.  Imagine, for a moment, that tearing was not 
an issue.  Then where we would be in the heap is the same story as 
above: no flattening for identity classes, scalarization in the heap for 
non-nullable values, and scalarization with an extra boolean field 
(maybe, same set of potential optimizations as on the stack) for 
nullable values.  This is very desirable, because it is so much easier 
to reason about:

  - non-identity unlocks scalarization on the stack
  - non-atomicity unlocks flattening in the heap
  - in both, ref-ness / nullity means maybe an extra byte of footprint 
compared to the baseline

(with additional opportunistic optimizations that let us get more 
flattening / better footprint in various special cases, such as very 
small values.)

> In previous discussions around the extra null channel for flattened
> values, we were really looking at narrowly applicable optimization -
> basically for nullable values that would fit within 64bits.  With this
> stacking, and the info about intel allowing atomicity up to 128bits,
> the extra null channel becomes more widely applicable.

Yes.  What I'm trying to do is separate this all from the details of 
what instructions CPU X has, and instead connect optimizations to 
semantics: nullity requires extra footprint (unless it can be optimized 
away by stealing bits somehow), and does so uniformly across the buckets 
/ heap / stack / whatever.  Nullability is a semantic property; 
providing this property may have a cost, but the more uniform we can 
make it, the simpler it is to reason about, and the simpler to implement 
(since we can use the same encoding tricks in both stack and heap.)

> Some of my hesitation comes from experiences writing structs or
> multi-field invariants in C where memory barriers and careful
> read/write protocols are important to ensure consistent data in the
> face of races.  Widening the set of cases that have a multifield
> invariant *created and enforced by the VM* by adding an additional
> null channel will make it more likely the VM (and optimized jit code!)
> can do the wrong thing.

Yes, this is why I want to bring it into the programming model.  I don't 
want to magically analyze the constructor and say "whoa, that looks like 
a cross-field invariant"; I want the class author to say "you have 
permission to shred" or "you do not have permission to shred", and we 
optimize within the semantic properties declared by the author.

In addition to cross-field invariants being part of the boundary between 
whether or not we need atomicity, transparency also comes into play.  
When we "construct" a long, we have a pretty clear idea how the value 
maps to all the bits; with encapsulation, we do not (but for records, we 
do again, because we've constrained away the ability to let 
representation diverge from interface.)  Again, though, I think we are 
better off having the author declare the required atomicity properties 
rather than trying to derive them from other things (e.g., constructor 
body, record-ness, etc.)

> I have always been somewhat uneasy about the injected nullchannel
> approach and concerned about how difficult it will be for service
> engineers to support when something goes wrong.  If there's experience
> that can be shared that shows this works well in an implementation,
> then I'll be less concerned.

Perhaps Tobias and Frederic can share more about what we've discovered here?