JVM alternatives for supporting nullable value types

Wed Sep 12 23:46:31 UTC 2018

For LW10, one of our goals is to support interactions between value types and erased generics by having some form of a nullable value type.

The needs of the language factor heavily into the JVM design. We're not ready to commit to language-level details, but it's likely that the language will support nullable and non-nullable variations of the types declared by value classes; and these variations will probably be supported in most places that types can appear.

More generally, the language may support up to three different flavors of nullability on some or all types:
- null-free: a type that does not include null (could be spelled Foo!)
- null-permitting: a type that allows but ignores nulls (could be spelled Foo~)
- null-checked: a type that allows and checks for nulls (could be spelled Foo?)

(Please note that this is placeholder syntax. There are lots of ways to map this to real syntax. Unadorned names will map to one of these; it's possible that migrating a class to be a value class will change the interpretation of its unadorned name.)

Null-permitting and null-checked types are both "nullable"; the difference is in how strongly the compiler enforces null checks. ("Null-permitting" is the existing behavior for types like 'String'; "null-checked" is the style that requires proof that nulls are absent before dereferencing.)

The other important concept from the language is conversions:
- A widening conversion (or something similar) supports treating a value of a null-free type as null-permitting or null-checked
- A "null-free conversion" is required to go in the opposite direction, and includes a runtime null check
- A "nullability conversion", like an unchecked conversion, might allow other forms of conversions between types involving different nullabilities, including in their type arguments or array component type.

Turning to the JVM with those language-level concepts in mind, I've put together the following summary of four main designs we've considered. The goal here is not to reach a conclusion about which path is best, but to make sure we're accurately considering all of the implications in each case.

Nullable value types, null-free storage
---------------------------------------

In this approach, we use regular L types to represent value types, and these types are nullable. Fields and arrays, via some sort of modifier, may choose to be nullable or null-free.

JVM implications

- Need a mechanism (new opcode?) to indicate that an array allocation is null-free
- The default value of a field/array depends on whether the "null-free" modifier is used
- Fields and arrays that are marked null-free can, of course, be flattened
- Stack variables and method parameters/returns may always be null
- A putfield, putstatic, or aastore may fail with an NPE (or maybe ASE)
- JIT can optimistically assume no nulls and scalarize, but must check and de-opt when a null is encountered
- The "null-free" modifier is only allowed with value class types, and must be validated early (e.g., to decide on field layout)

Compilation strategy

Val? maps to LVal;
Val~ maps to LVal;
Val! maps to LVal;

The nullability of the type in a field declaration or array creation expression determines whether the "null-free" modifier is used or not.

Nullability conversions are no-ops; null-free conversions are either compiled to explicit null checks or are implicit in a invoke*/getfield/putfield.

Language implications

- Null-free value types typically get flattened storage and scalarized invocations
- Array store runtime checks may include a null check
- Methods may not be overloaded on different nullabilities of the same type
- Null-free parameters/returns may be polluted with nulls due to inconsistent compilation or non-Java interop—detected with an NPE on storage or dereference
- A conversion from Val~[] to Val![] could be supported, but the result would not perform the expected runtime checks

Migration implications

- Refactoring a class to be a value class is a binary compatible change (except where this involves incompatible changes like removing a public constructor); before recompilation (which may reinterpret some unadorned names), treatment of nulls does not change
- Changing the nullability of a type is a binary compatible change; library clients who expect nullable storage may see surprising NPEs or ASEs

Always null-free value types
----------------------------

In this approach, we use regular L types to represent value types, and these types are null-free. Non-value L types continue to be nullable. A use-site attribute tracks which class names represent value classes; validation lazily ensures consistency with the declaration.

JVM implications

- Fields, arrays, and method parameters and returns with value class types can be flattened/scalarized
- The 'null' verification type is not a subtype of any value class types
- Casts to value class types must fail on 'null' (CCE or NPE)
- At method preparation, field/method resolution, and class loading, a check similar to class loader constraints ensures that classes agree on value classes in the descriptor
- Various other vectors for getting data into the JVM should prevent nulls, or have contracts that allow crashing, etc., if data is corrupted
- Classes in the value classes attribute are allowed to be loaded early (e.g., to decide on field layout)
- If the value classes attribute does not mention a value class, it's possible for variables/fields of that type to be null, but an error will occur when an attempt is made to load the class or resolve against a class that disagrees

Compilation strategy

Val? maps to Ljava/lang/Object;
Val~ maps to Ljava/lang/Object;
Val! maps to LVal;

Every referenced value class is listed in the value classes attribute.

Nullability conversions are no-ops; null-free conversions are compiled to checkcasts (even for member access). Casts that target Val?/Val~ compile to a checkcast guarded by a null check, where null always succeeds.

Language implications

- Null-free value types typically get flattened storage and scalarized invocations
- Array store runtime checks may include a null check
- Val~[] and Val?[] do not perform array store checks at all—any Object may end up polluting these arrays (creating arrays of these types might be treated as an error, like T[])
- Val~ and Val? are overloading-hostile: their use in signatures conflicts with Object and all other null-permitting/null-checked value types
- Null-permitting/null-checked value type parameters and returns may be polluted with other types due to inconsistent compilation or non-Java interop—detected with a CCE on null-free conversion
- A conversion from Val~[] to Val![] cannot be allowed

Migration implications

- Refactoring a class to be a value class is a binary incompatible change due to inconsistent value class attributes
- Changing from a null-permitting/null-checked to null-free type (or vice versa) is a binary incompatible change unless there's some form of support for type migrations

Null-free types with new descriptors
------------------------------------

In this approach, we use regular L types to represent nullable value types, and introduce other types (spelled, say, with a "K") to represent null-free value types. K types are subtypes of L types, and casts can be used to convert from L to K.

JVM implications

- Descriptor syntax needs to support 'K'
- To support K casts, we need ClassRefs that indicate K-ness, a new opcode, or some other mechanism
- Fields, arrays, and method parameters and returns with K types can be flattened/scalarized
- The 'null' verification type is not a subtype of K types
- Casts to K types must fail on 'null'
- Various other vectors for getting data into the JVM should prevent nulls, or have contracts that allow crashing, etc., if data is corrupted
- Classes named by K types are allowed to be loaded early (e.g., to decide on field layout)

Compilation strategy

Val? maps to LVal;
Val~ maps to LVal;
Val! maps to KVal;

Nullability conversions are no-ops; null-free conversions are either compiled to explicit casts or are implicit in an invoke*/getfield/putfield.

Language implications

- Null-free value types typically get flattened storage and scalarized invocations
- Array store runtime checks may include a null check
- Methods may be overloaded with a null-free type vs. a null-permitting/null-checked type (but null-permitting vs. null-checked is not allowed)
- Pollution of null-free variables or arrays is impossible
- A conversion from Val~[] to Val![] cannot be allowed

Migration implications

- Refactoring a class to be a value class is a binary compatible change (except where this involves incompatible changes like removing a public constructor); before recompilation (which may reinterpret some unadorned names), treatment of nulls does not change
- Changing from a null-permitting/null-checked to null-free type (or vice versa), is a binary incompatible change unless there's some form of support for type migrations

Nullability notations on types
------------------------------

In this approach, we use regular L types to represent value types, and these types are nullable by default. To indicate that a particular field, array, or parameter/return is null-free, some form of side notation is used. (Deliberately using the word "notation" rather than "annotation" or "modifier" here to avoid committing to an encoding.)

This is similar to "nullable value types, null-free storage", except that the null-free notation can be used on method parameters/returns.

This is similar to "always null-free value types", except that instead of tracking value classes in each class file, we track null-free value types per use site.

This is similar to "null-free types with new descriptors", except that the notations are not part of descriptors and don't require any explicit conversions—they are not part of the verification type system.

JVM implications

- Need a mechanism to encode notations, both for descriptors and for array creations
- The default value of a field/array depends on whether the "null-free" notation is used
- Fields, arrays, and method parameters and returns that are marked null-free can be flattened/scalarized
- Stack variables may generally be null, unless a static analysis proves otherwise
- A putfield, putstatic, aastore, or method invocation may fail with an NPE (or maybe ASE)
- Method overriding allows nullability mismatches; calls must be able to dynamically adapt (e.g., through multiple v-table entries and VM-generated bridges)
- Types marked null-free are allowed to be loaded early (e.g., to decide on field layout)

Compilation strategy

Where '*' represents a side notation that a type is null-free:

Val? maps to LVal;
Val~ maps to LVal;
Val! maps to LVal;*

Nullability conversions are no-ops; null-free conversions are either compiled to explicit null checks or are implicit in a invoke*/getfield/putfield.

Language implications

- Null-free value types typically get flattened storage and scalarized invocations
- Array store runtime checks may include a null check
- Methods may not be overloaded on different nullabilities of the same type
- Pollution of null-free variables arrays, or parameters/returns is impossible
- A conversion from Val~[] to Val![] could be supported, but the result would not perform the expected runtime checks

Migration implications

- Refactoring a class to be a value class is a binary compatible change (except where this involves incompatible changes like removing a public constructor); before recompilation, treatment of nulls does not change
- Changing the nullability of a type is a binary compatible change; library clients who expect a nullable API may see surprising NPEs or ASEs