Valhalla EG reminder for November 21, 2018

Wed Nov 21 21:27:38 UTC 2018

Here is a write-up giving an option for nullable value types.
This is what we talked about this morning.  I had hoped to
produce it before the meeting—better late than never.

http://cr.openjdk.java.net/~jrose/values/nullable-values.html

# Nullable Value Types in L-World
#### Or, "Read my lips:  No new nulls"
#### Or, "Bottom-lander:  There can be only one"
#### John Rose, Brian Goetz, Valhalla Working Group

## Basic Premises

**Null is the default reference:** In Java, all Java reference types
have a common default value, the null reference.  This reference
appears in uninitialized array elements and field values of that type.
The null reference is distinct from any reference that is produced by
a `new` expression and/or a constructor call.

**A problem with legacy value classes:** In order for today's
value-based classes to migrate to reference types proper value types,
they must retain the property that their default value is the null
reference, since the null reference may appear in user code that uses
such classes.

**Nullable value classes:** This implies that _some_ value types
require the ability to represent a true null reference, as one of the
many points in their set of possible values.  (This _does not_ imply
that any other value of such a type must _also_ be a reference: All
other values of the type can and should be proper values.)

**Nullability is rare:** Most value types, such as complex numbers or
vectors, must _not_ represent the null reference.  In particular,
arithmetic value types of size `N` bits may need to assign all `2**N`
code points to regular non-null values.  For example, a type that
emulates `byte` could not give up one of its 256 encodings to encode
`null`, and adding an extra hidden bit _to all value types_ would have
very large costs.  In addition, most value types "work like an int",
and require a default value which can accept method calls without
throwing `NullPointerExceptions`.

**Not the default:** Nullability must be explicitly selected by the
designer of a value type; it is expected to be a rarely used feature,
because it is likley to incur extra costs, in space and time, for
encoding and decoding the null reference to and from the flattened
form of a value class.

**Tweaking the key slogan:** In summary, value types "code like a
class and work like an int".  But there are a few value types that
_also_ want to "work like an `Integer`", in that their default value
is the null reference, rather than an appropriate pattern of zero
bits.

## User Model

**New keyword:** A value class can be declared with a pseudo-modifier
`__Nullable`, (TBD, may be just `nullable`) which must be accompanied
by the `value` pseudo-modifier.  Such a value class is called a
_nullable value class_.  Other value classes are called _regular value
classes_.

**Two variable kinds:** Variables come in two kinds, _heap_ and
_stack_.  A heap variable is a class field (static or non-static) or
an array element.  A stack variable is a method parameter or local
variable.  (Local variables include specially declared names such as a
`catch` variable.)  Heap variables exhibit type-specific default
values.  Stack variables do not require a default value convention,
because they are subject to definite assignment rules, which require
explicitly assigned values.

**Default values:** For a regular value class `RV`, the expression
`RV.default` evaluates to a non-null value of `RV` all of whose fields
are their respective default values (typically zero, `false`, or
`null`).  The uninitialized value of a heap variable of type `RV` or
`RV.val` is `RV.defalt`.  The uninitialized value of a heap variable
of type `RV.box` is `null`, and `RV.box.default` evaluates to `null`.

**Nullable defaults to null:** For a nullable value class `NV`, the
expression `NV.default` evaluates to `null` (the null reference).  The
uninitalized value of a heap variable of type `NV`, `NV.val`, or
`NV.box` is `null`.

**Regular values never null:** Regular value classes can never
represent `null` values in their normal unboxed form.  (In this, they
"work like an int".)  Casting a null to a regular value class will
throw a `NullPointerException`, just like casting a null `Integer` to
`int`.  Reflectively storing an untyped null reference to a heap
variable of a regular value type will also throw a
`NullPointerException`.  Loading a heap variable of a regular,
non-boxed value class will never produce a null.

**Constructors are null-checked:** In order to keep a clear
distinction between the null default value and constructed values,
value class constructors have a null check on exit.  This means that
if constructor code accidentally assigns zero or default values to all
the instance fields, the constructor will throw `NullPointerException`
rather than return the null instance value.  No such check is done for
regular value classes, since null values are impossible for them.

**All values are flattenable:** In heap variables, instances of both
regular and nullable value classes behave as if they were flattened,
and are in fact routinely flattened.  This is likely to affect the
performance and footprint of programs which use such variables.  In
stack variables, flattening may or may not happen, depending on how
the interpreter or the JIT is directing execution.

**All values are boxable:** All value classes support a "boxed" view
which interoperates with `null` and with erased generics.  The
expression `V.box.default` evaluates to `null` for all value types,
both regular and nullable.  Heap variables of type `V.box` are _never_
flattened, but as a consolation prize they _can_ receive nulls even
for regular `V`.  For generics, note that `List<V.box>` is always
legal, but `List<V.val>` is currently illegal and reserved for future
use, when specialized generics are available.  Note that the unadorned
value type name `V` usually denotes the same type as `V.val`, but we
reserve the right to have some occurrences of `V` for certain types to
denote `V.box` instead.  (This is TBD; perhaps it is part of the
migration package for nullable classes.)

### Observations and Fine Print

**Null stores as vull:** When storing a `null` to a heap variable of a
nullable value class, the JVM will reset all (non-static) fields of
that variable to their default values.  On the heap, a logically null
flat value is called a _flattened value null_, or "vull" for short.

**Vull loads as null:** When loading a value from a heap variable of a
nullable value class, the JVM will detect "vull" and convert it to a
proper null reference.

**Vull is a ghost:** Thus, for a nullable value class `NV`, it is
impossible to create or observe on stack a non-null instance of `NV`
for which all fields of the instance are default.  This means that
"vulls" are confined to the heap.  The JVM enforces this as a
low-level invariant, by dynamically transcoding between on-heap
"vulls" and on-stack nulls.

**Pivot fields:** As a "pro move", a nullable value class can declare
that one or more of its non-static fields with the `__NullablePivot`
keyword (TBD).  When detecting "vulls", the JVM consults only such
marked fields for their default value, not all fields of the instance.
This may makes "vull" detection faster for legacy classes like
`LocalDate`.  Such a specially marked field (or fields) may be called
a _pivot field_, since the task of "vull" detection "pivots" around
that field.  By default, if no fields are marked as pivot fields, then
in effect all of them serve as pivot fields.

**Null stops bad calls:** It is arguable that the most legitimate job
of `null` is to avoid executing a method call on a receiver which has
not yet been specified.  After all, objects do not always have
reasonable default values, and so Java (and the JVM) assigns a
"default default" value of `null` to object variables that are not
otherwise initialized.  The null value ensures that if buggy code
tries to call a method on an uninitialized variable, an exception will
be thrown immediately, rather than executing a method body on an
unexpected input.  (In this view, field gets are the same as method
calls.  Other uses of nulls, such as a API sentinel values, were
created by creative programmers, who given a hammer will always find
more nails.)  For a value type without a reasonable default value,
programmers have a right to a similar sentinel value which prevents
method execution on uninitialized variables.  But for a value _with_ a
reasonable default value, such machinery would be pure annoyance.
Only the designer of the value class knows which case is true.

**Inner value classes:** Any non-static nested ("inner") value class
`C.IV` must also be declared to be nullable.  The reason for this is
that every properly constructed instance of `C.IV` must specify a
non-null outer instance of type `C`.  But if `C.IV` were regular and a
method were called on the default value `C.IV.default`, then that
method then it would not be able to observe a definite non-null value
`C.this`.  Such a method call would be inescapably broken.  Thus, such
method calls must be prevented.  The existing language achieves this
result by throwing `NullPointerException` on when invoking methods on
the default value of `C.IObj`, for an object class `I.IObj`.  To
preserve this behavior, an inner value class `C.IV` must also present
a null default value.  This restriction does not apply to static
nested value classes `C.NV`.

**A slogan:** The slogan for this user model of nullable value types
is "no new nulls".  It refuses to introduce new "work-alikes" for the
null reference.  There is no new `NullValueException` which pairs with
`NullPointerException`.  There is no `Nullable` interface.  There not
an `isNull` method, certainly not a user-definable one.  (An `isNull`
method could never return a `false` result, could it?  It would have
to throw `NullPointerException` instead!)  There are no directly
observable "vull" values to compete for the throne of `null`; "vulls"
are only indirectly observable in the flattening of certain heap
variables.  In short, the heavy cost of nulls (arguably a "billion
dollar mistake") is not multiplied by a new set of null-like values.
And the historic cost of nulls is not pushed forward to new value
types that don't request it, such as arithmetic types.

**Another slogan:** Alternatively, the slogan from "Highlander"
applies: There is only ever one `null`.  Any would-be "vull" value is
dissected from the value space and conjoined to `null` just as soon as
it tries to enter the stack.

## Implementation

**Affected bytecodes:** At the bytecode level, the instructions
`getfield`, `putfield`, `getstatic`, `putstatic`, `withfield`,
`aaload`, and `aastore` must transcode between "vull" and proper null.
The `defaultvalue` bytecode must not produce "vull".

**Null containers rejected:** If a `getfield` instruction is asked for
an instance field `NV.f` of a value class `NV`, and if the on-stack
value is `null`, then `NullPointerException` is thrown.  There is no
conversion of the containing instance to a "vull".  This is true
regardless of the type of the field `NV.f`.  If the on-stack value is
non-null, and `NV.f` is a "vull", then transcoding occur as usual.
This means that the sequence `defaultvalue V; getfield V.f:T` is
equivalent to `defaultvalue T` only if `V` is a normal value type.
(It must also possess a non-static field `f` of type `T`.)  If `V` is
nullable, then the `getfield` instruction will throw.

**Withfield transcodes twice:** The `withfield` instruction must
transcode on both input and output.  It must convert a null input
value to a temporary "vull", one of whose fields is then updated.  It
must then detect whether the resulting value is a "vull" and convert
that (and only that) back to a null.  Unlike `getfield` and
`putfield`, `withfield` does not reject a `null` container value.  For
example, it will produce a null result value if asked to store a
default value to a field in an instance where all of the other fields
are already set to default values.

**Unaffected bytecodes:** Instructions which operate only on stacked
or local values do not need further modification to detect "vulls",
since "vulls" are never on stack.  Receiver null checks for
`invokevirtual` and its siblings are unchanged; these instructions
will never encounter `vull` values.  The `acmp`, `checkcast`, and
`instanceof` instructions (and the `aastore` store check) already have
special semantics for null references which are unchanged.

**Transcoding in field instructions:** For the field instructions,
transcoding is a reasonable incremental cost to add, since these
instructions resolve their field and therefore know the specific field
type; thus the cost of adding transcoding between "vull" and `null` is
incremental and added only for fields whose types require this extra
step.

**Transcoding in array instructions:** Array element access
instructions first check the layout of the target array element and
then use the proper sequence of steps to convert from a flattened
array element (if present) to a regular on-stack reference.  As part
of this sequence of steps, if the element type is a nullable,
flattened value type, "vull" must be detected on load and produced on
store, corresponding to an on-stack null reference.

**Reflection, etc.:** Access to fields and array elements via
reflection, method handles, or JNI is defined in terms of the
behaviors of bytecodes, as usual.  Thus, reflectively loading or
storing a value instance must include a transcoding step exactly when
transcoding is required by the corresponding bytecode instruction.

**Variable declaration:** The bytecode-level descriptor for a
flattenable variable of a value type has the form of a _Q-descriptor_,
which begins with the letter "Q" instead of the letter "L" normally
used with class-based types.  Variables which hold a boxed value are
introduced with _L-descriptors_ beginning with "L", like any other
reference type.  In the setting of the JVM type system, L-descriptors
and Q-descriptors denote _L-types_ and _Q-types_.  Q-types and L-types
roughly correspond to user-level `V.val` and `V.box` types,
respectively.  Again, in the setting of JVM types (only) we say a
value of a Q-type or L-type is a _Q-value_ or _L-value_.

**Layout includes nullability:** The nullability of a value class `V`
is logically a part of `V`'s overall layout, its size and the format
of its fields.  This is because layout is the information that
dictates the JVM's exact steps when loading or storing a flattened
value.  Since these steps necessarily include a "vull" check when the
value class is nullable, layout includes nullability.

**Q-values in the heap:** Q-types are introduced in the heap as part
of a class declaration, or when an array type derived from the Q-type
is mentioned.  The JVM consults the layout of a Q-type `Q-V` when it
lays out an instance field of type `Q-V`, or prepares a static field
of type `Q-V`, or computes the layout of an array whose element type
is `Q-V`.  In all cases, the class declaration of `V` is loaded if
necessary, and the layout of `V` is consulted, to determine the steps
needed to load or store the Q-value.

**Q-values on the stack:** Q-types are introduced on the stack as part
of method, field, or array type descriptors, or as `checkcast`
targets.  For nullable value types, both Q-types and L-types can carry
null values, while for regular value types, Q-types cannot carry null.
Thus, a `checkcast` to a "Q-type" will throw `NullPointerException` if
presented at runtime with a null reference on the stack, but only if
the referenced type is regular (not nullable).

**Verifier rules:** The verifier tracks the distinction between
Q-types and L-types, and specifically ensures that an on-stack L-value
is never consumed by an instruction which expects a Q-value, if the
L-type accepts null but the Q-type does not.  If the Q-type is
nullable, implicit conversions are logically permissible, and the
verifier should allow them.  Given the nullable and regular value
types `NV` and `RV`, it follows that `Q-NV` is a proper subtype of
`L-NV`, but `Q-RV` and `L-RV` can be treated as the same type, since
they have the same set of on-stack values.

**Verifier rules for supers:** If `C` is a super (class or interface)
of `NV` or (respectively) `RV`, then `L-C` is a proper supertype of
`Q-NV` and `L-NV`, or (respectively) `Q-RV` and `L-RV`.  Note that
supertypes of value types are always nullable.  Thus, there is no need
to distinguish between Q-types and L-types when converting to supers;
if there is a null it will be welcome in the supertype.

**Optimized calling sequences:** The JIT may elect to use `vull`
values for non-receiver parameters or return values, as an alternative
to buffering via a nullable indirection.  Such calling sequences must
be made invisible to the end-user by ensuring that "vull" parameters
are transcoded (detected and converted to nulls) as needed, and vice
versa on return.  Such transcoding operations seem to be reorderable
and trackable much like `null` detection is at present.  Thus, "vull"
transcoding is thought to be optimizable as a straightforward
extension to today's JITs.

**Constructor translation:** A value class constructor starts with the
default value of its class, and builds up the value by assigning to
its fields.  The rules of the Java language (for final fields and
value instance fields) ensure that each field is assigned once and
only once.  The tracking of assignment along all paths uses a pair of
conditions called "definite assignment" and "definite unassignment".
On every normal exit from a constructor, each field must be definitely
assigned and not definitely unassigned.  The JVM has no such rules for
tracking assignment at bytecode boundaries.  Instead, constructors are
allowed to write to final fields any number of times.  For values, the
corresponding rule is that `withfield` is allowed to assign to an
uninitialized value any number of times.  In order to prevent null
values from escaping from a value class constructor, the compiler must
precede each return instruction by a null check.  (This can be done
with a call to `Objects.requireNonNull` or `Object.getClass`.)
Optionally (TBD) the JVM could perform this check automatically.

**Non-throwing getter:** Optionally (and TBD), the JVM may choose to
define `getfield` on a Q-type container to transcode the container to
"vull".  Most uses of `getfield` would use the regular L-type
container, but this variation would give a "hook" for translation
strategies that need to operate on the fields of a possibly
uninitialized value.  This could happen, for example, inside a
constructor.  The class component of the `CONSTANT_Fieldref` of such
an instruction would be Q-descriptor, rather than the name of a class.
Note that, in the JVM, unadorned class names usually denote L-types,
not Q-types, so directing a `getfield` instruction to a Q-type
container is an unusual step.