Consequences of null for flattenable representations

Wed Nov 3 17:58:29 UTC 2021

As we just discussed in the EG, allowing null to co-exist
with flattenable representations is a challenge.  It is
one we have in the past tried to avoid, but the very
legitimate needs for (what we now call) reference
semantics for all of Bucket 2 and some of Bucket 3
require us to give null a place at the table, even while
continuing to aim at flattening nullable values,
when possible.

A good example of this is Optional, migrated from a
Bucket 1 *value-based class* to a proper Bucket 2
*reference-based primitive*.   (See that tricky change
in POV?)  Another example to keep in mind is the
reference projection of a Bucket 3 type such as
Complex.ref or Point.ref.

The simplest way to support null is just to do what
we do today, and buffer on the heap, with the option
of a null reference instead of a reference to a boxed value.

(We call such things “buffers” rather than “boxes” simply
because, unlike int/Integer, the type of thing that’s in
the box might not be denotably different from the type
of the “box” itself.)

The next thing to do is inject a *pivot field* into the flattened
layout of the primitive object.  When this invisible field
contains all zero bits, the flattened object encodes a null.
All the other bits are either ignorable or must be zero,
depending on what you are trying to do.

This idea splits into two directions:  How to work with
“pivoted” non-null values, and how to represent the pivot
efficiently. Both lines of thought are more or less required
exercises, once you allow null its place at the table.

We know where null comes from (the null literal and
aconst_null).   Where do pivoted values come from?
You need an original source of them for the initial
value of “this” in the primitive constructor (a factory
method at the bytecode level).  Specifically, you need
that bit pattern which is almost but not quite all
zero bits; the pivot field is set to the “non-null”
state but all other field values are zero.  Then
the constructor can get to work.

This might be the job of an “initialvalue” bytecode,
which is a repackaging of the “defaultvalue” bytecode.
Given a suitable definition with suitable restrictions
for initialvalue, a constructor uses a mix of initialvalue
and withfield executions to get to its output state for “this”.
None of the intermediate states would be confusable
with null.

(We sometimes assumed, wrongly in hindsight, that
doing this simply requires assigning “this” to
null in the constructor and then special-casing
withfield and maybe getfield to allow a null input
and maybe a null output.  But this is a thicket of
tangles and irregularities, and it doesn’t quite
get rid of the need for a separate operation to
actually set the pivot field.  Basically, once null
gets entrenched, defaultvalue has to turn into
initialvalue, or so it appears to me at this moment.)

Once the constructor returns a non-null set of
bits, all subsequent assignments continue to
separate null from non-null.  That’s true even
for racy assignments, assuming that pivot field
states are individually atomic, even if they race
relative to other fields.

(Race control might be important for Bucket 3
references like Complex.ref, if we ever try to
flatten those.  I’m digressing; my focus is to
build out Bucket 2, which suppresses such races.)

To allow Bucket 2 constructors control over their
outputs, it follows that initialvalue (unlike its
earlier version defaultvalue) must be restricted
to those same contexts where withfield is allowed.
Either to constructors only (for the same class)
or to the capsule (nest) of that class.

OK, so how is the pivot field physically represented?
Again, we have discussed this in years past, but I’ll
summarize some of the thinking:

1. It can be just a boolean, a byte or a packed bit
that is made free somehow.  A 65th bit to a 64-bit
payload perhaps.  This is sad, but also hard to get
around when every single bitwise encoding in the
existing layout already has a meaning.

But the payload of the primitive type might use a
field with “slack”, aka unused bitwise encodings.
We can pounce on this and use bit-twiddling
to internally reserve the zero state, and declare
that when that field is zero, it is the pivot field
denoting null, and when it is non-zero it is
doing its normal job.

2. If the language tells us, “yes I promise not
to use the default value on this field” then maybe
the JVM can do something with that promise.
There are issues, but it’s tempting for (say)
a Rational type where the denominator is
never zero.

3. More reliably, if the JVM knows that the
a field has unused encodings, it can just swap
the all-zero state with some other state.
People will immediate think of unused bits
which can be flipped to true in the field
when it is pivoted to non-null.

It’s better, IMO, to start out with the humble
increment operator (rather than the bit-set
operator) and work from there.  As long as
the encoding of all-one-bits is not taken,
for a given field (true for booleans and
managed pointers!) then the JVM can
simply perform an unsigned non-overflowing
increment when storing payload to the
pivot field (preserving the non-zero
invariant) and do a non-overflowing
unsigned decrement when loading.

I can just hear the GC folks groaning in the
distance about such increments, on managed
pointers.  For them, a slightly less JIT-friendly
operation might be preferable, to perform
the increment (on store) only when the value
is null, and vice versa on load, decrement
only when 1.  Or use bit twiddling in the
low bits of the pointer.  Or use all-one-bits
as the “payload null” which is distinct
from the “pivot is zero” state.  I think the
JIT and GC folks can come to an agreement,
in any given JVM.  When the JIT people
groan back about weirdo encodings of
managed pointers, we can gently tell them,
“it’s just another flavor of managed pointer
transcoding, a problem we solved when
we went to compressed oops.”

(On balance, I think the GC should define
a small family of “quasi-null sentinel values”
which can be easily stored into any managed
pointer for ad hoc purposes like this and others. 
Others would be at least 1. an Optional::isEmpty
state for optionals *which are null-friendly*
and 2. a distinction between null and unbound,
for lazy variables which are also null-friendly.
Neither of these exist today, of course, and
none of these hypothetical sentinels would ever
be visible to normal Java code.)

My point is that we don’t have to just slap
a boolean on everything.  In particular,
when migrating ju.Optional to Bucket 2,
we can preserve its very attractive one-field
representation by invisibly assigning a
bad managed pointer value to encode
Optional::isEmpty.  No Java code changes
are needed (or desired) to pull this off,
just the increment hack sketched above,
or one of its variations.

Even Bucket 3 references could be encoded
in this way, if and when we desire to.  That
is, whatever JVM algorithm constructors a
pivot field and its logic could be pointed at
a Bucket 3 reference projection, if we think
this would be desirable.  One result would
be that Map.get, which returns T.ref, could
avoid buffering on the heap.  N.B. This assumes
stuff we don’t have yet, to specialize Map::get
to a particular flattenable type.  I hope we
will get there.

— John