Design document on nullability and value types

Brian Goetz brian.goetz at oracle.com
Wed May 31 18:37:34 UTC 2023


As we've hinted at, we've made some progress refining the essential 
differences between primitive and reference types, which has enabled us 
to shed the `.val` / `.ref` distinction and lean more heavily on 
nullability.  The following document outlines the observations that have 
enabled this current turn of direction and some of its consequences.

This document is mostly to be interpreted in the context of the Valhalla 
journey, and so talks about where we were a few months ago and where 
we're heading now.



# Rehabilitating primitive classes: a nullity-centric approach

Over the course of Project Valhalla, we have observed that there are two
distinct groups of value types.  We've tried stacking them in various 
ways, but
there are always two groups, which we've historically described as "objects
without identity" and "primitive classes", and which admit different 
degrees of
flattening.

The first group, which we are now calling "value objects" or "value 
classes",
represent the minimal departure from traditional classes to disavow object
identity.  The existing classes that are described as "value-based", such as
`Optional` or `LocalDate`, are candidate for migrating to value 
classes.  Such
classes give up object identity; identity-sensitive behaviors are either 
recast
as state-based (such as for `==` and `Objects::identityHashCode`) or 
partialized
(`synchronized`, `WeakReference`), and such classes must live without the
affordances of identity (mutability, layout polymorphism.)  In return, they
avoid being burdened by "accidental identity" which can be a source of 
bugs, and
gain significant optimization for stack-based values (e.g., scalarization in
calling convention) and other JIT optimizations.

The second group, which we had been calling "primitive classes" (we are now
moving away from that term), are those that are more like the existing
primitives, such as `Decimal` or `Complex`.  Where ordinary value 
classes, like
identity classes, gave rise to a single (reference) type, these classes gave
rise to two types, a value type (`X.val`) and a reference type 
(`X.ref`).  This
pair of types was directly analogous to legacy primitives and their 
boxes. These
classes come with more restrictions and more to think about, but are 
rewarded
with greater heap flattening.  This model -- after several iterations -- 
seemed
to meet the goals for expressiveness and performance: we can express the
difference between `int`-like behavior and `Integer`-like behavior, and get
routine flattening for `int`-like types.  But the result still had many
imbalances; the distinction was heavyweight, and a significant fraction 
of the
incremental specification complexity was centered only on these types.  We
eventually concluded that the source of this was trying to model the `int` /
`Integer` distinction directly, and that this distinction, while grounded in
user experience, was just not "primitive" enough.

In this document, we will break down the characteristics of so-called 
"primitive
classes" into more "primitive" (and hopefully less ad-hoc) 
distinctions.  This
results in a simpler model, streamlines the syntactic baggage, and 
enables us to
finally reunite with an old friend, null-exclusion (bang) types.  Rather 
than
treating "value types" and "reference types" as different things, we can 
treat
the existing primitives (and the "value projection" of user-defined 
primitive
classes) as being restricted references, whose restrictions enable the 
desired
runtime properties.

## Primitives and objects

In a previous edition of _State of Valhalla_, we outlined a host of 
differences
between primitives and objects:

| Primitives                                 | 
Objects                                   |
| ------------------------------------------ | 
----------------------------------------- |
| No identity (pure values)                  | 
Identity                                  |
| `==` compares state                        | `==` compares object 
identity             |
| Built-in                                   | Declared in 
classes                       |
| No members (fields, methods, constructors) | Members (including 
mutable fields)        |
| No supertypes or subtypes                  | Class and interface 
inheritance           |
| Represented directly in memory             | Represented indirectly 
through references |
| Not nullable                               | 
Nullable                                  |
| Default value is zero                      | Default value is 
null                     |
| Arrays are monomorphic                     | Arrays are 
covariant                      |
| May tear under race                        | Initialization safety 
guarantees          |
| Have reference companions (boxes)          | Don't need reference 
companions           |

Over many iterations, we have chipped away at this list, mostly by making
classes richer: value classes can disavow identity (and thereby opt into
state-based `==` comparison); the lack of members and supertypes are an
accidental restriction that can go away with declarable value classes; 
we can
make primitive arrays covariant with arrays of their boxes; we can let some
class declarations opt into non-atomicity under race.  That leaves the
following, condensed list of differences:

| Primitives                        | 
Objects                                   |
| --------------------------------- | 
----------------------------------------- |
| Represented directly in memory    | Represented indirectly through 
references |
| Not nullable                      | 
Nullable                                  |
| Default value is zero             | Default value is 
null                     |
| Have reference companions (boxes) | Don't need reference 
companions           |

The previous approach ("primitive classes") started with the assumption that
this is the list of things to be modeled by the value/reference 
distinction.  In
this document we go further, by showing that flattening (direct 
representation)
is derived from more basic principles around nullity and initialization
requirements, and perhaps surprisingly, the concept of "primitive type" can
disappear almost completely, save only for historical vestiges related 
to the
existing eight primitives.  The `.val` type can be replaced by restricted
references whose restrictions enable the desired representational 
properties. As
is consistent with the goals of Valhalla, flattenability is an emergent
property, gained by giving up those properties that would undermine
flattenability, rather than being a linguistic concept on its own.

### Initialization

The key distinction between today's primitives and objects has to do with
_initialization requirements_.   Primitives are designed to be _used
uninitialized_; if we declare a field `int count`, it is reliably 
initialized to
zero by the JVM before any code can access it.  This initial value is a
perfectly good default, and it is not a bug to read or even increment 
this field
before it has been explicitly assigned a value by the program, because 
it has
_already_ been initialized to a known good value by the JVM. The zero value
pre-written by the JVM is not just a safety net; it is actually part of the
programming model that primitives start out life with "good enough" 
defaults.
This is part of what it means to be a primitive type.

Objects, on the other hand, are not designed for uninitialized use; they 
must be
initialized via constructors before use.  The default zero values 
written to an
object's fields by the JVM typically don't necessarily constitute a 
valid state
according to the classes specification, and, even if it did, is rarely a 
good
default value.  Therefore, we require that class instances be initialized by
their constructors before they can be exposed to the rest of the 
program.  To
ensure that this happens, objects are referenced exclusively through _object
references_, which _can_ be safely used uninitialized -- because they 
reliably
have the usable default value of `null`.  (Some may quibble with this use of
"safely" and "usable", because null references are fairly limited, but 
they do
their limited job correctly: we can easily and safely test whether a 
reference
is null, and if we accidentally dereference a null reference, we get a clear
exception rather than accessing uninitialized object state.)

 > Primitives can be safely used without explicit initialization; 
objects cannot.
 > Object references are nullable _precisely because_ objects cannot be used
 > safely without explicit initialization.

### Nullability

A key difference between today's primitives and references is that 
primitives
are non-nullable and references are nullable.  One might think this was
primarily a choice of convenience: null is useful for references as a 
universal
sentinel, and not all that useful for primitives (when we want nullable
primitives we can use the box classes -- but we usually don't.) But the
reality is not one of convenience, but of necessity: nullability is 
_required_
for the safety of objects, and usually _detrimental_ to the performance of
primitives.

Nullability for object references is a forced move because null is what is
preventing us from accessing uninitialized object state. Nullability for
primitives is usually not needed, but that's not the only reason 
primitives are
non-nullable.  If primitives were nullable, `null` would be another 
state that
would have to be represented in memory, and the costs would be out of 
line with
the benefits.  Since a 64-bit `long` uses all of its bit patterns, a 
nullable
`long` would require at least 65 bits, and alignment requirements would 
likely
round this up to 128 bits, doubling memory usage.  (The density cost here is
substantial, but it gets worse because most hardware today does not have 
cheap
atomic 128 bit loads and stores.  Since tearing might conflate a null 
value with
a non-null value -- even worse than the usual consequences of tearing -- 
this
would push us strongly towards using an indirection instead.) So
non-nullability is a precondition for effective flattening and density of
primitives, and nullable primitives would involve giving up the flatness and
density that are the reason to have primitives in the first place.

 > Nullability interferes with heap flattening.

To summarize, the design of primitives and objects implicitly stems from the
following facts:

  - For most objects, the uninitialized (zeroed) state is either invalid 
or not a
    good-enough default value;
  - For primitives, the uninitialized (zeroed) state is both valid and a
    good-enough default value;
  - Having the uninitialized (zeroed) state be a good-enough default is a
    precondition for reliable flattening;
  - Nullability is required when the the uninitialized (zeroed) state is 
not a
    good-enough default;
  - Nullability not only has a footprint cost, but often is an impediment to
    flattening.

 > Primitives exist in the first place because they can be flattened to 
give us
 > better numeric performance; flattening requires giving up nullity and
 > tolerance of uninitialized (zero) values.

These observations were baked in to the language (and other languages 
too), but
the motivation for these decisions was then "erased" by the rigid 
distinction
between primitives and objects.  Valhalla seeks to put that choice back 
into the
user's hands.

### Getting the best of both worlds

Project Valhalla promises the best of both worlds: sufficiently constrained
entities can "code like a class and work like an int."  Classes that give up
object identity can get some of the runtime benefits of primitives, but 
to get
full heap flattening, we must embrace the two defining characteristics of
primitives described so far: non-nullability and safe uninitialized use.

Some candidates for value classes, such as `Complex`, are safe to use
uninitialized because the default (zero) value is a good initial value.  
Others,
like `LocalDate`, simply have no good default value (zero or otherwise), and
therefore need the initialzation protocol enabled by null-default object
references.  This distinction in inherent to the semantics of the 
domain; some
domains simply do not have reasonable default value, and this is a 
choice that
the class author must capture when the code is written.

There is a long list of classes that are candidates to be value classes; 
some
are like `Complex`, but many are more like `LocalDate`.  The latter 
group can
still benefit significantly from eliminating identity, but can't 
necessarily get
full heap flattening.  The former group, which are most like today's 
primitives,
can get all the benefits, including heap flattening -- when their 
instances are
non-null.

### Declaring value classes

As in previous iterations, a class can be declared as as _value class_:

```
value class LocalDate { ... }
```

A value class gives up identity and its consequences (e.g., mutability) 
-- and
that's it.  The resulting  `LocalDate` type is still a reference type, and
variables of type `LocalDate` are still nullable.  Instances can get 
significant
optimizations for on-stack use but are still usually represented in the 
heap via
indirections.

### Implicitly constructible value classes

In order to get the next group of benefits, a value class must additionally
attest that it can be used uninitialized.  Because this is a statement 
of how
instances of this class come into existence, modeling this as a special 
kind of
constructor seems natural:

```
value class Complex {
     private int re;
     private int im;

     public implicit Complex();
     public Complex(int re, int im) { ... }

     ...
}
```

These two constructors say that there are two ways a `Complex` instance 
comes
into existence: the first is via the traditional constructor that takes 
real and
imaginary values (`new Complex(1.0, 1.0)`), and the second is via the 
_implicit_
constructor that produces the instance used to initialize fields and array
elements to their default values.  That the implicit constructor cannot 
have a
body is a signal that the "zero default" is not something the class 
author can
fine-tune.  A value class with an implicit constructor is called an 
_implicitly
constructible_ value class.

Having an implicit constructor is a necessary but not sufficient 
condition for
heap flattening.  The other required condition is that variable that holds a
`Complex` needs to be non-nullable.  In the previous iteration, the 
`.val` type
was non-nullable for the same reason primitive types were, and therefore 
`.val`
types could be fully flattened.  However, after several rounds of 
teasing apart
the fundamental properties of primitives and value types, nullability has
finally sedimented to a place in the model where a sensible reunion between
value types and non-nullable types may be possible.

## Null exclusion

Non-nullable reference types have been a frequent request for Java for 
years,
having been explored in `C#`, Kotlin, and Scala.  The goals of non-nullable
types are sensible: richer types means safer programs.  It is a pervasive
problem in Java libraries that we are not able to express within the 
language
whether a returned object reference might be null, or is known never to 
be null,
and programmers can therefore easily make wrong assumptions about 
nullability.

To date, Project Valhalla has deliberately steered clear of non-nullable 
types
as a standalone feature. This is not only because the goals of Valhalla 
were too
ambitious to burden the project with another ambitious goal (though that is
true), but for a more fundamental reason: the assumptions one might make 
in a
vacuum about the semantics of non-nullable types would likely become hidden
sources of constraints for the value type design, which was already 
bordering on
over-constrained.  Now that the project has progressed sufficiently, we 
are more
confident that we can engage with the issue of null exclusion.

A _refinement type_ (or _restriction type_) is a type that is derived from
another type that excludes certain values from the derived type's value set,
such as "the non-negative integers". In the most general form, a 
refinement type
is defined by one or more predicates (Liquid Haskell and Clojure Spec are
examples of this); range types in Pascal are a more constrained form of
refinement type.  Non-nullable types ("bang" types) can similarly be 
viewed as a
constrained form of refinement type, characterized by the predicate `x 
!= null`.
(Note that the null-excluding refinement type `X!` of a reference type 
is still
a reference type.)

Rather than saying that primitive classes give rise to two types, 
`X.val` and
`X.ref`, we can observe the the null-excluding type `X!` of a
implicitly-constructible value class can have the same runtime 
characteristic as
the `.val` type in the previous round.  Both the declaration-site 
property that
a value class is implicitly constructible, and the use-site property that a
variable is null-excluding, are necessary to routinely get flattening.

Related to null exclusion is _null-adjunction_; this takes a 
non-nullable type
(such as `int`) or a type of indeterminate nullability (such as a type 
variable
`T` in a generic class that can be instantiated with either nullable or
non-nullable type parameters) and produces a type that is explicitly 
nullable
(`int?` or `T?`.)  In the current form of the design, there is only one 
place
where the null-adjoining type is strictly needed -- when generic code 
needs to
express "`T`, but might be null.  The canonical example of this is 
`Map::get`;
it wants to wants to return `V?`, to capture the fact that `Map` uses 
`null` to
represent "no mapping".

For a given class `C`, the type `C!` is clearly non-nullable, and the 
type `C?`
is clearly nullable.  What of the unadorned name `C`?  This has 
_unspecified_
nullability.  Unspecified nullability is analogous to raw types in 
generics (we
could call this "raw nullability"); we cannot be sure what the author had in
mind, and so must find a balance between the desire for greater null 
safety and
tolerance of ambiguity in author intent.

Readers who are familiar with explicitly nullable and non-nullable types in
other languages may be initially surprised at some of the choices made 
regarding
null-exclusion (and null-adjunction) types here.  The interpretation 
outlined
here is not necessarily the "obvious" one, because it is constrained 
both by the
needs of null-exclusion, of Valhalla, and the migration-compatibility
constraints needed for the ecosystem to make a successful transition to 
types
that have richer nullability information.

While the theory outlined here will allow all class types to have a
null-excluding refinement type, it is also possible that we will initially
restrict null-exclusion to implicitly constructible value types.  There are
several reasons to consider pursuing such an incremental path, including the
fact that we will be able to reify the non-nullability of implicitly
constructible value types in the JVM, whereas the null-exclusion types 
of other
classes such as `String` or of ordinary value classes such as 
`LocalDate` would
need to be done through erasure, increasing the possible sources of null
polluion.

### Goals

We adopt the following set of goals for adding null-excluding refinement 
types:

  - More complete unification of primitives with classes;
  - Flatness is an emergent property that can derive from more basic 
semantic
    constraints, such as identity-freedom, implicit constructibility, and
    non-nullity;
  - Merge the concept of "value companion" (`.val` type) into the 
null-restricted
    refinement type of implicitly constructible value classes;
  - Allow programmers to annotate type uses to explicitly exclude or 
affirm nulls
    in the value set;
  - Provide some degree of runtime nullness checking to detect null 
pollution;
  - Annotating an existing API (one based on identity classes) with 
additional
    nullness information should be binary- and source-compatible.

The last goal is a source of strong constraints, and not one to be taken
lightly.  If an existing API that specifies "this method never returns null"
cannot be compatibly migrated to one where this constraint is reflected 
in the
method declaration proper, the usefulness of null-exclusion types is greatly
reduced; library maintainers will be put to a bad choice of forgoing a 
feature
that will make their APIs safer, or making an incompatible change in 
order to do
so.  If we were building a new language from scratch, the considerations 
might
be different, but we do not have that luxury.  "Just copying" what other
languages have done here is a non-starter.

### Interoperation between nullable and non-nullable types

We enable conversions between a nullable type and a compatible 
null-excluding
refinement type by adding new widening and narrowing conversions between 
`T?`
and `T!` that have analogous semantics to the existing boxing and unboxing
conversions between `Integer` and `int`.  Just as with boxing and unboxing,
widening from a non-nullable type to a nullable type is unconditional 
and never
fails, and narrowing from a nullable type to a non-nullable type may fail by
throwing `NullPointerException`.  These conversions for null-excluding types
would be sensible in assignment context, cast context, and method invocation
context (both loose and strict, unlike boxing for primitives today.) 
This would
allow existing assignments, invocation, and overload applicability checks to
continue to work even after migrating one of the types involved, as 
required for
source-compatibility.

Checking for bad values can mirror the approach taken for generics.  When a
richer compile-time type system erases to a less-rich runtime type 
system, type
safety derives from a mix of compile-time type checking and synthetic 
runtime
checks.  In both cases, there is a possibility of pollution which can be
injected at the boundary between legacy and new code, by malicious code, or
through injudicious use of unchecked casts and raw types.  And like 
generics, we
would like to offer the possibility that if a program compiles in its 
entirety
with no unchecked warnings, null-excluding types will not be observed to 
contain
null.  To achieve this, we will need a combination of runtime checks, new
unchecked warnings, and possibly restrictions on initialization.

The intrusion on the type-checking of generics here is considerable; nullity
will have to be handled in type inference, bounds conformance, 
subtyping, etc.
In addition, there are new sources of heap pollution and new conditions 
under
which a varaible may be polluted.  The _Universal Generics_ JEP outlines a
number of unchecked warnings that must be issued in order to avoid null
pollution in type variables that might be instantiated either with a 
nullable or
null-excluding type.  While this work was designed for `ref` and `val` 
types,
much of it applies directly to null-excluding types.

The liberal use of conversion rather than subtyping here may be 
surprising to
readers who are familiar with other languages that support 
null-excluding types.
At first, it may appear to be "giving up all the benefit" of having 
annotated
APIs for nullness, since a nullable value may be assigned directly to a
non-nullable type without requiring a cast.  But the reality is that for the
first decade at least, we will at best be living in a mixed world where some
APIs are migrated to use nullness information and some will not, and forcing
users to modify code that uses these libraries (and then do so again and 
again
as more libraries migrate) would be an unnacceptable tax on Java users, 
and a
deterrent to libraries migrating to use these features.

Starting from `T! <: T?` -- and forcing explicit conversions when you 
want to go
from nullable to non-nullable values -- does seem an obvious choice if 
you have
the luxury of building a type system from scratch.  But if we want to make
migration to null-excluding types a source-compatible change for 
libraries and
clients, we cannot accept a strict subtyping approach.  (Even if we did, we
could still only use subtyping in one direction, and would have to add an
additional implicit conversion for the other direction -- a conversion 
that is
similar to the narrowing conversion proposed here.)

Further, primitives _already_ use boxing and unboxing conversions to go 
between
their nullable (box) and non-nullable (primitive) forms.  So choosing 
subtyping
for references (plus an unbalanced implicit conversion) and boxing/unboxing
conversion for primitives means our treatment of null-excluding types is
gratuitously different for primitives than for other classes.

Another consequence of wanting migration compatibility for annotating a 
library
with nullness constraints is that nullness constraints cannot affect 
overload
selection.  Compatibility is not just for clients, it is also for 
subclasses.

### Null exclusion for implicitly constructible value classes

Implicitly constructible value classes go particularly well with null 
exclusion,
because we can choose a memory representation that _cannot_ encode null,
enabling a more compact and direct representation.

The Valhalla JVM has support for such a representation, and so we 
describe the
null-exclusion type of an implicitly constructible value class as 
_strongly null
excluding_.  This means that its null exclusion is reified by the JVM.  
Such a
variable can never be seen to contain null, because null simply does not 
have a
runtime representation for these types.  This is only possible because these
classes are implicitly constructible; that the default zero value 
written by the
JVM is known to be a valid value of the domain.  As with primitives, 
these types
are explicitly safe to use uninitialized.

A strongly null-excluding type will have a type mirror, as type mirrors 
describe
reifiable types.

### Null exclusion for other classes

For identity classes and non-implicitly-constructible value classes, the 
story
is not quite as nice.  Since there is no JVM representation of "non-nullable
String", the best we can do is translate `String!` to `String` (a form of
erasure), and then try to keep the nulls at bay.  This means that we do 
not get
the flattening or density benefits, and null-excluding variables may 
still be
subject to heap pollution.   We can try to minimize this with a 
combination of
static type checking and generated runtime checks.  We refer to the
null-exclusion type of an identity or non-implicitly constructible value 
class
as _weakly null-excluding_.

There is an additional source of potential null pollution, aside from the
sources analogous to generic heap pollution: the JVM itself. The JVM
initializes references in the heap to null.  If `String!` erases to an 
ordinary
`String` reference, there is at least a small window in time when this
supposedly non-nullable field contains null.  We can erect barriers to 
reduce
the window in which this can be observed, but these barriers will not be
foolproof.  For example, the compiler could enforce that a field of type
`String!` either has an initializer or is definitely assigned in every
constructor.  However, if the receiver escapes during construction, all 
bets are
off, just as they are with initialization safety for final fields.

We have a similar problem with arrays of `String!`; newly created arrays
initialize their elements to the default value for the component type, 
which is
`null`, and we don't even have the option of requiring an initializer as we
would with fields.  (Since a `String![]` is also a `String[]`, one 
option is to
to outlaw the direct creation of arrays of weakly null-excluding types, 
instead
providing reflective API points which will safely create the array and
initialize all elements to a non-null value.)

A weakly null-excluding type will not have a type mirror, as the nullity
information is erased for these types.  Generic signatures would be 
extended to
represent null-exclusion, and similarly the `Type` hiearchy would 
reflect such
signatures.

Because of erasure and the new possibilities for pollution, allowing
null-exclusion types for identity classes introduces significant 
potential new
complexity.  For this reason, we may choose a staged approach where
null-restricted types are initially limited to the strongly null-restricted
ones.

### Null exclusion for other value classes

Value classes that are not implicitly constructible are similar to identity
classes in that their null-exclusion types are only weakly null-excluding.
These classes are the ones for which the author has explicitly decided 
that the
default zero value is not a valid member of the domain, so we must 
ensure that
in no case does this invalid value ever escape. This effectively means 
that we
must similarly erase these types to a nullable representation to ensure 
that the
zero value stays contained.  (There are limited heroics the VM can do with
alternate representations for null when these classes are small and have 
readily
identifiable slack bits, but this is merely a potential optimization for the
future.)

### Atomicity

Primitives additionally have the property that larger-than-32-bit primitives
(`long` and `double`) may tear under race.  The allowance for tearing was an
accomodation to the fact that numeric code is often 
performance-critical, and so
a tradeoff was made to allow for more performance at the cost of less 
safety for
incorrect programs.  The corresponding box types, as well as primitive 
variables
declared `volatile`, are guaranteed not to tear, even under race.  (See the
document entitled "Understanding non-atomicity and tearing" for more 
detail.)

Implicitly constructible value classes can be declared as "non-atomic" to
indicate that its null-exclusion type may tear under race (if not declared
`volatile`), just as with `long` and `double`.  The classes `Long` and 
`Double`
would be declared non-atomic (though most implementations still offer atomic
access for 64-bit primitives.)

### Flattening

Flattening in the heap is an emergent property, which is achieved when 
we give
up the degrees of freedom that would prevent flattening:

  - Identity prevents flattening entirely;
  - Nullability prevents flattening in the absence of heroics involving 
exotic
    representations for null;
  - The inability to use a class without initialization requires 
nullability at
    the VM representation level, undermining flattening;
  - Atomicity prevents flattening for larger value objects.

Putting this together, the null-exclusion type of implicitly 
constructible value
classes is flattenable in the heap when the class is non-atomic or the 
layout is
suitably small.  For ordinary value classes, we can still get flattening 
in the
calling convention: all identity-free types can be flattened on the stack,
regardless of layout size or nullability.

### Summarizing null-exclusion

The feature described so far is at the weak end of the spectrum of features
described by "non-nullable types".  We make tradeoffs to enable gradual
migration compatibility, moving checks to the boundary -- where in some 
cases
they might not happen due to erasure, separate compilation, or just 
dishonest
clients.

Users may choose to look at this as "glass X% full" or "glass (100-X)% 
empty".
We can now more clearly say what we mean, migrate incrementally towards more
explicit and safe code without forking the ecosystem, and catch many errors
earlier in time.  On the other hand, it is less explicit where we might
experience runtime failures, because autoboxing makes unboxing 
implicit.  And
some users will surely complain merely because this is not what their 
favorite
language does.  But it is the null-exclusion we can actually have, 
rather than
the one we wish we might have in an alternate universe.

This approach yields a significant payoff for the Valhalla story.  Valhalla
already had to deal with considerable new complexity to handle the 
relationship
between reference and value types -- but this new complexity applied only to
primitive classes.  For less incremental complexity, we can have a more 
uniform
treatment of null-exclusion across all class types.  The story is 
significantly
simpler and more unified than we had previously:

  - Everything, including the legacy primitives, is an object (an 
instance of
    some class);
  - Every type, including the legacy primitives, is derived from a class;
  - All types are reference types (they refer to objects), but some 
reference
    types (non-nullable references to implicitly constructible objects) 
exhibit
    the runtime behavior of primitives;
  - Some reference types exclude null, and some null-excluding reference 
types
    are reifiable with a known-good non-null default;
  - Every type can have a corresponding null-exclusion type.

## Planning for a null-free future (?)

Users prefer working with unnanotated types (e.g., `Foo`) rather than 
explicitly
annotated types (`Foo!`, `Foo?`), where possible.  The unannotated type 
`Foo`
could mean one of three things: an alias for `Foo!`, an alias for 
`Foo?`, or a
type of "raw" (unknown) nullity.   Investigations into null-excluding type
systems have shown that the better default would be to treat an 
unannotated name
as indicating non-nullability, and use explicitly nullable types (`T?`) to
indicate the presence of null, because returning or accepting null is 
generally
a less common case.  Of course, today `String` means "possibly nullable 
String"
in Java, meaning that, yet again, we seem to have chosen the wrong default.

Our friends in the `C#` community have explored the possibility of a
"flippening".  `C#` started with the Java defaults, and later provided a
compiler mode to flip the default on a per-module basis, with checking (or
pollution risk) at the boundary between modules with opposite defaults.  
This is
an interesting experiment and we look forward to seeing how this plays 
out in
the `C#` ecosystem.

Alternately, another possible approach for Java is to continue to treat the
unadorned name as having "raw" or "unknown" nullity, encouraging users to
annotate types with either `!` or `?`.  This approach has been partially
explored in the `JSpecify` project.  Within this approach is a range of 
options
for what the language will do with such types; there is a risk of 
flooding users
with warnings.  We may want to leave such analysis to extralinguistic type
checkers, at least initially -- but we would like to not foreclose on the
possibility of an eventual flippening.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-experts/attachments/20230531/899d59ed/attachment-0001.htm>


More information about the valhalla-spec-experts mailing list