Nullness markers to enable flattening

Tue Feb 7 01:26:42 UTC 2023

A quick review:

The Value Objects feature (see https://openjdk.org/jeps/8277163) captures the Valhalla project's central idea: that objects don't have to have identity, and if programmers opt out of identity, JVMs can provide optimizations comparable to primitive performance.

However, one important implementation technique is not supported by that JEP: maximally flattened heap storage. ("Maximally flattened" as in "just the bits necessary to encode an instance".) This is because flattened fields and arrays store an object's field values directly, and so 1) need to be initialized "at birth" to a non-null class instance, 2) may not store null, and 3) may by updated non-atomically. These are semantics that need to be surfaced in the language model.

We've tackled (3) by allowing value classes to be declared non-atomic (syntax/limitations subject to bikeshedding), and then claiming by fiat that fields/arrays of such classes are tearing risks. Races are rare enough that this doesn't really call for a use-site opt-in, and we don't necessarily need any deeper explanation for how new objects derived from random combinations of old objects can be created by a read operation. That's just how it works. <shrug>

We also allow value classes to declare that they support an all-zeros default instance (again, subject to bikeshedding). You could imagine similarly claiming that fields/arrays of these classes are null-hostile, as a side effect of how their storage works. But this is an idiosyncrasy that is going to affect a lot more programmers, and "that's just how it works" is pretty unsatisfactory. Sometimes programs count on being able to use 'null' in their computation. We need something in the language model to let programs opt in/out of nulls at the use site, and thus opt out/in of maximally flattenable heap storage.

We've long discussed "reference type" vs. "value type" as the language concept that captures this distinction. But where we once had a long list of differences between references and values, most of those have gone away. Notably, it's *not* useful for performance intuitions to imagine that references are pointers and values are inline. Value objects get inlined when the JVM want to do so. Reference-ness is not relevant.

Really, for most programmers, nullness is all that distinguishes a "reference type" from a "value type".

Meanwhile, expressing nullness is not a problem unique to Valhalla. Whether a variable is meant to store nulls is probably the most important property of most programs that isn't expressible in the language. Workarounds include informal javadoc specifications, type annotations (as explored by JSpecify), lots of 'Objects.requireNonNull' calls, and blanket "if you pass in a null, you might get an NPE" policies.

In Amber, pattern matching has its own problems with nullness: there are a lot of ad hoc rules to distinguish between "is this a non-null instance of class Foo?" vs. "is this null *or* an instance of class Foo?", because there's no good way to express those two queries as explicitly different.

---

To address these problems, we've been exploring nullness markers as an alternative to '.val' and '.ref'. The goal is a general-purpose feature that lets programmers express intent about nulls, and that is preserved at runtime sufficiently for JVMs to observe that "not null" + "value class" + "non-atomic (or compact) class" --> "maximally flattenable storage". There are no "value types", and there is no direct control over flattenability.

(A lot of these ideas build on what JSpecify has done, so appreciation to them for the good work and useful documentation.)

Some key ideas:

- Nullness is an *optional* property of variables/expressions/etc., distinct from types. If the program doesn't say what kind of nullness a variable has, and it can't be inferred, the nullness is "unspecified". (Interpreted as "might be null, but the programmer hasn't told us if that's their intent".) Variables/expressions with unspecified nullness continue to behave the way they always have.

- Because nullness is distinct from types, it shouldn't impact type checking rules, subtyping, overriding, conversions, etc. Nullness has its own analysis, subject to its own errors/warnings. The precise error/warning conditions haven't been fleshed out, but our bias is towards minimal intrusion—we don't want to make it hard to adopt these features in targeted ways.

- That said, *type expressions* (the syntax in programs that expresses a type) are closely intertwined with *nullness markers*. 'Foo!' refers to a non-null Foo, and 'Foo?' refers to a Foo or null. And nullness is an optional property of type arguments, type variable bounds, and array components. Nullness markers are the way programmers express their intent to the compiler's nullness analysis.

- Nullness may also be implicit. Catch parameters and pattern variables are always non-null. Lots of expressions have '!' nullness, and the null literal has '?' nullness. Local variables get their nullness from their initializers. Control flow analysis can infer properties of a variable based on its uses.

- There are features that change the default interpretation of the nullness of class names. This is still pretty open-ended. Perhaps certain classes can be declared (explicitly or implicitly) null-free by default (e.g., 'Point' is implicitly 'Point!'). Perhaps a compilation-unit- or module- level directive says that all unadorned types should be interpreted as '!'. Programs can be usefully written without these convenience features, but for programmers who want to widely adopt nullness, it will be important to get away from "unspecified" as the default everywhere.

- Nullness is generally enforced at run time, via cooperation between javac and JVMs. Methods with null-free parameters can be expected to throw if a null is passed in. Null-free storage should reject writes of nulls. (Details to be worked out, but as a starting point, imagine 'Q'-typed storage for all types. Writes reject nulls. Reads before any writes produce a default value, or if none exists, throw.)

- Type variable types have nullness, too. Besides 'T!' and 'T?', there's also a "parametric" 'T*' that represents "whatever nullness is provided by the type argument". (Again, some room for choosing the default interpretation of bare 'T'; unspecified nullness is useful for type variables as well.) Nullness of type arguments is inferred along with the types; when both a type argument and its bound have nullness, bounds checks are based on '!' <: '*' <: '?'. Generics are erased for now, but in the future '!' type arguments will be reified, and specialized classes will provide the expected runtime behaviors.

There are, of course, a lot of details behind these points. But hopefully this provides a good high-level introduction.

A worry in taking on extra features like this is that we'll get distracted from our primary goal, which is to support maximally flattened storage of value objects. But I think it feels manageable, and it's certainly a lot more useful than the sort of targeted usage of '.val' we were thinking about before.

Our main tasks for delivering a feature include:
- Work out the declaration syntax/class file encoding for opting in to non-atomic-ness and default instances
- Implement nullness markers and some analysis/diagnostics in javac
- Provide a language spec for the parts of the analysis standardized in the language
- Settle on a class file format and division of responsibility for runtime behaviors
- Implement some targeted new JVM behaviors; use nullness as a signal for flattening
- Design/implement how nullness is exposed by reflection

For the future, we'll want to:
- Anticipate how a "change the defaults" feature will work
- Consider the interaction of nullness with Amber features
- Think about how runtime nullness interacts with specialization and type restrictions