Null channels (was: User model stacking)
Brian Goetz
brian.goetz at oracle.com
Tue May 3 17:56:04 UTC 2022
About six months ago we started working on flattening references in
calling conventions in the Valhalla repos. We use the Preload attribute
to force preloading of classes that are known to be (or expected to be)
value classes, but which are referenced only via L descriptors, so that
at the (early) time that calling convention is chosen, we have the
additional information that that this is an identity-free class. In
these cases, we scalarize the calling convention as we do with Q types,
but we add an extra boolean channel for null; it is as if we add a
boolean field to the object layout. When we adapt between the
scalarized and indirected forms (e.g., c2i adapters), we apply the
obvious semantics to the null channel.
We have not yet applied the same treatment to field layout, but we can
(and it has the same timing constraints, so it also needs Preload), and
the VM has additional degrees of implementation freedom in doing so.
The simplest is to let the layout engine choose to flatten a preloaded L
value type by injecting a boolean field which represents nullity, and
adapting null checks to check this field (which can be hoisted etc.)
The layout engine has other tricks available to it as well, to further
reduce the footprint of representing "might be null", if it can find
suitable slack space in the representation. Such tricks could include
using slack bits in boolean fields (potentially seven of them), low
order bits of pointers (a la compressed OOPs), unused color bits of 64
bit pointers, etc. Some of these choices require transforms on
load/store (e.g., those that use pointer bits), not unlike what we do
with compressed OOPs. This is entirely "VM's choice" and affects only
quality of implementation; there is nothing in the classfile that
conditions this, other than the ACC_VALUE indication and L/Q type
carriers. So the VM has a rich set of footprint/computation tradeoffs
for encoding the null channel, but logically, it is an "extra boolean
field" that all nullable value types have.
> I'd like to reserve judgement on this stacking as I'm uncomfortable
> (uncertain maybe?) about the practicality of the extra null channel.
> Without having validated the extra null channel, I'm concerned we're
> exposing a broader set of options in the language that will, in
> practice, map down to the existing 3 buckets we've been talking about.
> Maybe this factoring allows a slightly larger number of classes to be
> flattened or leaves the door open for them to get it in the future?
What I'm trying to do here is decomplect flattening from nullity. Right
now, we have an unfortunate interaction which both makes certain
combinations impossible, and makes the user model harder to reason about.
Identity-freedom unlocks flattening in the stack (calling convention.)
The lesson of that exercise (which was somewhat surprising, but good) is
that nullity is mostly a non-issue here -- we can treat the nullity
information as just being an extra state component when scalarizing,
with some straightforward fixups when we adapt between direct and
indirect representations. This is great, because we're not asking users
to choose between nullability and flattening; users pick the combination
of { identity, nullability } they want, and they get the best flattening
we can give:
case (identity, _) -> 1; // no flattening
case (non-identity, non-nullable) -> nFields; // scalarize fields
case (non-identity, nullable) -> nFields + 1; // scalarize fields
with extra null channel
Asking for nullability on top of non-identity means only that there is a
little more "footprint" in the calling convention, but not a qualitative
difference. That's good.
In the heap, it is a different story. What unlocks flattening in the
heap (in addition to identity-freedom) is some permission for
_non-atomicity_ of loads and stores. For sufficiently simple classes
(e.g., one int field) this is a non-issue, but because loads and stores
of references must be atomic (at least, according to the current JMM),
references to wide values (B2 and B3.ref) cannot be flattened as much as
B3.val. There are various tricks we can do (e.g., stuffing two 32 bit
fields into a 64 bit atomic) to increase the number of classes that can
get good flattening, but it hits a wall much faster than "primitives".
What I'd like is for the flattening story on the heap and the stack to
be as similar as possible. Imagine, for a moment, that tearing was not
an issue. Then where we would be in the heap is the same story as
above: no flattening for identity classes, scalarization in the heap for
non-nullable values, and scalarization with an extra boolean field
(maybe, same set of potential optimizations as on the stack) for
nullable values. This is very desirable, because it is so much easier
to reason about:
- non-identity unlocks scalarization on the stack
- non-atomicity unlocks flattening in the heap
- in both, ref-ness / nullity means maybe an extra byte of footprint
compared to the baseline
(with additional opportunistic optimizations that let us get more
flattening / better footprint in various special cases, such as very
small values.)
> In previous discussions around the extra null channel for flattened
> values, we were really looking at narrowly applicable optimization -
> basically for nullable values that would fit within 64bits. With this
> stacking, and the info about intel allowing atomicity up to 128bits,
> the extra null channel becomes more widely applicable.
Yes. What I'm trying to do is separate this all from the details of
what instructions CPU X has, and instead connect optimizations to
semantics: nullity requires extra footprint (unless it can be optimized
away by stealing bits somehow), and does so uniformly across the buckets
/ heap / stack / whatever. Nullability is a semantic property;
providing this property may have a cost, but the more uniform we can
make it, the simpler it is to reason about, and the simpler to implement
(since we can use the same encoding tricks in both stack and heap.)
> Some of my hesitation comes from experiences writing structs or
> multi-field invariants in C where memory barriers and careful
> read/write protocols are important to ensure consistent data in the
> face of races. Widening the set of cases that have a multifield
> invariant *created and enforced by the VM* by adding an additional
> null channel will make it more likely the VM (and optimized jit code!)
> can do the wrong thing.
Yes, this is why I want to bring it into the programming model. I don't
want to magically analyze the constructor and say "whoa, that looks like
a cross-field invariant"; I want the class author to say "you have
permission to shred" or "you do not have permission to shred", and we
optimize within the semantic properties declared by the author.
In addition to cross-field invariants being part of the boundary between
whether or not we need atomicity, transparency also comes into play.
When we "construct" a long, we have a pretty clear idea how the value
maps to all the bits; with encapsulation, we do not (but for records, we
do again, because we've constrained away the ability to let
representation diverge from interface.) Again, though, I think we are
better off having the author declare the required atomicity properties
rather than trying to derive them from other things (e.g., constructor
body, record-ness, etc.)
> I have always been somewhat uneasy about the injected nullchannel
> approach and concerned about how difficult it will be for service
> engineers to support when something goes wrong. If there's experience
> that can be shared that shows this works well in an implementation,
> then I'll be less concerned.
Perhaps Tobias and Frederic can share more about what we've discovered here?
More information about the valhalla-spec-observers
mailing list