<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<font size="4"><font face="monospace">As we've hinted at, we've made
some progress refining the essential differences between
primitive and reference types, which has enabled us to shed the
`.val` / `.ref` distinction and lean more heavily on
nullability. The following document outlines the observations
that have enabled this current turn of direction and some of its
consequences. <br>
<br>
This document is mostly to be interpreted in the context of the
Valhalla journey, and so talks about where we were a few months
ago and where we're heading now. <br>
<br>
<br>
<br>
# Rehabilitating primitive classes: a nullity-centric approach<br>
<br>
Over the course of Project Valhalla, we have observed that there
are two<br>
distinct groups of value types. We've tried stacking them in
various ways, but<br>
there are always two groups, which we've historically described
as "objects<br>
without identity" and "primitive classes", and which admit
different degrees of<br>
flattening. <br>
<br>
The first group, which we are now calling "value objects" or
"value classes",<br>
represent the minimal departure from traditional classes to
disavow object<br>
identity. The existing classes that are described as
"value-based", such as<br>
`Optional` or `LocalDate`, are candidate for migrating to value
classes. Such<br>
classes give up object identity; identity-sensitive behaviors
are either recast<br>
as state-based (such as for `==` and
`Objects::identityHashCode`) or partialized<br>
(`synchronized`, `WeakReference`), and such classes must live
without the<br>
affordances of identity (mutability, layout polymorphism.) In
return, they<br>
avoid being burdened by "accidental identity" which can be a
source of bugs, and<br>
gain significant optimization for stack-based values (e.g.,
scalarization in<br>
calling convention) and other JIT optimizations. <br>
<br>
The second group, which we had been calling "primitive classes"
(we are now<br>
moving away from that term), are those that are more like the
existing<br>
primitives, such as `Decimal` or `Complex`. Where ordinary
value classes, like<br>
identity classes, gave rise to a single (reference) type, these
classes gave<br>
rise to two types, a value type (`X.val`) and a reference type
(`X.ref`). This<br>
pair of types was directly analogous to legacy primitives and
their boxes. These<br>
classes come with more restrictions and more to think about, but
are rewarded<br>
with greater heap flattening. This model -- after several
iterations -- seemed<br>
to meet the goals for expressiveness and performance: we can
express the<br>
difference between `int`-like behavior and `Integer`-like
behavior, and get<br>
routine flattening for `int`-like types. But the result still
had many<br>
imbalances; the distinction was heavyweight, and a significant
fraction of the<br>
incremental specification complexity was centered only on these
types. We<br>
eventually concluded that the source of this was trying to model
the `int` /<br>
`Integer` distinction directly, and that this distinction, while
grounded in<br>
user experience, was just not "primitive" enough. <br>
<br>
In this document, we will break down the characteristics of
so-called "primitive<br>
classes" into more "primitive" (and hopefully less ad-hoc)
distinctions. This<br>
results in a simpler model, streamlines the syntactic baggage,
and enables us to<br>
finally reunite with an old friend, null-exclusion (bang)
types. Rather than<br>
treating "value types" and "reference types" as different
things, we can treat<br>
the existing primitives (and the "value projection" of
user-defined primitive<br>
classes) as being restricted references, whose restrictions
enable the desired<br>
runtime properties. <br>
<br>
## Primitives and objects<br>
<br>
In a previous edition of _State of Valhalla_, we outlined a host
of differences<br>
between primitives and objects:<br>
<br>
| Primitives |
Objects |<br>
| ------------------------------------------ |
----------------------------------------- |<br>
| No identity (pure values) |
Identity |<br>
| `==` compares state | `==` compares
object identity |<br>
| Built-in | Declared in
classes |<br>
| No members (fields, methods, constructors) | Members
(including mutable fields) |<br>
| No supertypes or subtypes | Class and
interface inheritance |<br>
| Represented directly in memory | Represented
indirectly through references |<br>
| Not nullable |
Nullable |<br>
| Default value is zero | Default value is
null |<br>
| Arrays are monomorphic | Arrays are
covariant |<br>
| May tear under race | Initialization
safety guarantees |<br>
| Have reference companions (boxes) | Don't need
reference companions |<br>
<br>
Over many iterations, we have chipped away at this list, mostly
by making<br>
classes richer: value classes can disavow identity (and thereby
opt into<br>
state-based `==` comparison); the lack of members and supertypes
are an<br>
accidental restriction that can go away with declarable value
classes; we can<br>
make primitive arrays covariant with arrays of their boxes; we
can let some<br>
class declarations opt into non-atomicity under race. That
leaves the<br>
following, condensed list of differences: <br>
<br>
| Primitives |
Objects |<br>
| --------------------------------- |
----------------------------------------- |<br>
| Represented directly in memory | Represented indirectly
through references |<br>
| Not nullable |
Nullable |<br>
| Default value is zero | Default value is
null |<br>
| Have reference companions (boxes) | Don't need reference
companions |<br>
<br>
The previous approach ("primitive classes") started with the
assumption that<br>
this is the list of things to be modeled by the value/reference
distinction. In<br>
this document we go further, by showing that flattening (direct
representation)<br>
is derived from more basic principles around nullity and
initialization<br>
requirements, and perhaps surprisingly, the concept of
"primitive type" can<br>
disappear almost completely, save only for historical vestiges
related to the<br>
existing eight primitives. The `.val` type can be replaced by
restricted<br>
references whose restrictions enable the desired
representational properties. As<br>
is consistent with the goals of Valhalla, flattenability is an
emergent<br>
property, gained by giving up those properties that would
undermine<br>
flattenability, rather than being a linguistic concept on its
own.<br>
<br>
### Initialization<br>
<br>
The key distinction between today's primitives and objects has
to do with<br>
_initialization requirements_. Primitives are designed to be
_used<br>
uninitialized_; if we declare a field `int count`, it is
reliably initialized to<br>
zero by the JVM before any code can access it. This initial
value is a<br>
perfectly good default, and it is not a bug to read or even
increment this field<br>
before it has been explicitly assigned a value by the program,
because it has<br>
_already_ been initialized to a known good value by the JVM.
The zero value<br>
pre-written by the JVM is not just a safety net; it is actually
part of the<br>
programming model that primitives start out life with "good
enough" defaults.<br>
This is part of what it means to be a primitive type.<br>
<br>
Objects, on the other hand, are not designed for uninitialized
use; they must be<br>
initialized via constructors before use. The default zero
values written to an<br>
object's fields by the JVM typically don't necessarily
constitute a valid state<br>
according to the classes specification, and, even if it did, is
rarely a good<br>
default value. Therefore, we require that class instances be
initialized by<br>
their constructors before they can be exposed to the rest of the
program. To<br>
ensure that this happens, objects are referenced exclusively
through _object<br>
references_, which _can_ be safely used uninitialized -- because
they reliably<br>
have the usable default value of `null`. (Some may quibble with
this use of<br>
"safely" and "usable", because null references are fairly
limited, but they do<br>
their limited job correctly: we can easily and safely test
whether a reference<br>
is null, and if we accidentally dereference a null reference, we
get a clear<br>
exception rather than accessing uninitialized object state.) <br>
<br>
> Primitives can be safely used without explicit
initialization; objects cannot.<br>
> Object references are nullable _precisely because_ objects
cannot be used<br>
> safely without explicit initialization. <br>
<br>
### Nullability<br>
<br>
A key difference between today's primitives and references is
that primitives<br>
are non-nullable and references are nullable. One might think
this was<br>
primarily a choice of convenience: null is useful for references
as a universal<br>
sentinel, and not all that useful for primitives (when we want
nullable<br>
primitives we can use the box classes -- but we usually don't.)
But the<br>
reality is not one of convenience, but of necessity: nullability
is _required_<br>
for the safety of objects, and usually _detrimental_ to the
performance of<br>
primitives.<br>
<br>
Nullability for object references is a forced move because null
is what is<br>
preventing us from accessing uninitialized object state.
Nullability for<br>
primitives is usually not needed, but that's not the only reason
primitives are<br>
non-nullable. If primitives were nullable, `null` would be
another state that<br>
would have to be represented in memory, and the costs would be
out of line with<br>
the benefits. Since a 64-bit `long` uses all of its bit
patterns, a nullable<br>
`long` would require at least 65 bits, and alignment
requirements would likely<br>
round this up to 128 bits, doubling memory usage. (The density
cost here is<br>
substantial, but it gets worse because most hardware today does
not have cheap<br>
atomic 128 bit loads and stores. Since tearing might conflate a
null value with<br>
a non-null value -- even worse than the usual consequences of
tearing -- this<br>
would push us strongly towards using an indirection instead.)
So<br>
non-nullability is a precondition for effective flattening and
density of<br>
primitives, and nullable primitives would involve giving up the
flatness and<br>
density that are the reason to have primitives in the first
place. <br>
<br>
> Nullability interferes with heap flattening.<br>
<br>
To summarize, the design of primitives and objects implicitly
stems from the<br>
following facts: <br>
<br>
- For most objects, the uninitialized (zeroed) state is either
invalid or not a<br>
good-enough default value;<br>
- For primitives, the uninitialized (zeroed) state is both
valid and a<br>
good-enough default value; <br>
- Having the uninitialized (zeroed) state be a good-enough
default is a<br>
precondition for reliable flattening;<br>
- Nullability is required when the the uninitialized (zeroed)
state is not a<br>
good-enough default; <br>
- Nullability not only has a footprint cost, but often is an
impediment to<br>
flattening.<br>
<br>
> Primitives exist in the first place because they can be
flattened to give us<br>
> better numeric performance; flattening requires giving up
nullity and<br>
> tolerance of uninitialized (zero) values.<br>
<br>
These observations were baked in to the language (and other
languages too), but<br>
the motivation for these decisions was then "erased" by the
rigid distinction<br>
between primitives and objects. Valhalla seeks to put that
choice back into the<br>
user's hands.<br>
<br>
### Getting the best of both worlds<br>
<br>
Project Valhalla promises the best of both worlds: sufficiently
constrained<br>
entities can "code like a class and work like an int." Classes
that give up<br>
object identity can get some of the runtime benefits of
primitives, but to get<br>
full heap flattening, we must embrace the two defining
characteristics of<br>
primitives described so far: non-nullability and safe
uninitialized use. <br>
<br>
Some candidates for value classes, such as `Complex`, are safe
to use<br>
uninitialized because the default (zero) value is a good initial
value. Others,<br>
like `LocalDate`, simply have no good default value (zero or
otherwise), and<br>
therefore need the initialzation protocol enabled by
null-default object<br>
references. This distinction in inherent to the semantics of
the domain; some<br>
domains simply do not have reasonable default value, and this is
a choice that<br>
the class author must capture when the code is written. <br>
<br>
There is a long list of classes that are candidates to be value
classes; some<br>
are like `Complex`, but many are more like `LocalDate`. The
latter group can<br>
still benefit significantly from eliminating identity, but can't
necessarily get<br>
full heap flattening. The former group, which are most like
today's primitives,<br>
can get all the benefits, including heap flattening -- when
their instances are<br>
non-null. <br>
<br>
### Declaring value classes<br>
<br>
As in previous iterations, a class can be declared as as _value
class_:<br>
<br>
```<br>
value class LocalDate { ... }<br>
```<br>
<br>
A value class gives up identity and its consequences (e.g.,
mutability) -- and<br>
that's it. The resulting `LocalDate` type is still a reference
type, and<br>
variables of type `LocalDate` are still nullable. Instances can
get significant<br>
optimizations for on-stack use but are still usually represented
in the heap via<br>
indirections. <br>
<br>
### Implicitly constructible value classes<br>
<br>
In order to get the next group of benefits, a value class must
additionally<br>
attest that it can be used uninitialized. Because this is a
statement of how<br>
instances of this class come into existence, modeling this as a
special kind of<br>
constructor seems natural:<br>
<br>
```<br>
value class Complex { <br>
private int re;<br>
private int im;<br>
<br>
public implicit Complex();<br>
public Complex(int re, int im) { ... }<br>
<br>
...<br>
}<br>
```<br>
<br>
These two constructors say that there are two ways a `Complex`
instance comes<br>
into existence: the first is via the traditional constructor
that takes real and<br>
imaginary values (`new Complex(1.0, 1.0)`), and the second is
via the _implicit_<br>
constructor that produces the instance used to initialize fields
and array<br>
elements to their default values. That the implicit constructor
cannot have a<br>
body is a signal that the "zero default" is not something the
class author can<br>
fine-tune. A value class with an implicit constructor is called
an _implicitly<br>
constructible_ value class.<br>
<br>
Having an implicit constructor is a necessary but not sufficient
condition for<br>
heap flattening. The other required condition is that variable
that holds a<br>
`Complex` needs to be non-nullable. In the previous iteration,
the `.val` type<br>
was non-nullable for the same reason primitive types were, and
therefore `.val`<br>
types could be fully flattened. However, after several rounds
of teasing apart<br>
the fundamental properties of primitives and value types,
nullability has<br>
finally sedimented to a place in the model where a sensible
reunion between<br>
value types and non-nullable types may be possible. <br>
<br>
## Null exclusion <br>
<br>
Non-nullable reference types have been a frequent request for
Java for years,<br>
having been explored in `C#`, Kotlin, and Scala. The goals of
non-nullable<br>
types are sensible: richer types means safer programs. It is a
pervasive<br>
problem in Java libraries that we are not able to express within
the language<br>
whether a returned object reference might be null, or is known
never to be null,<br>
and programmers can therefore easily make wrong assumptions
about nullability. <br>
<br>
To date, Project Valhalla has deliberately steered clear of
non-nullable types<br>
as a standalone feature. This is not only because the goals of
Valhalla were too<br>
ambitious to burden the project with another ambitious goal
(though that is<br>
true), but for a more fundamental reason: the assumptions one
might make in a<br>
vacuum about the semantics of non-nullable types would likely
become hidden<br>
sources of constraints for the value type design, which was
already bordering on<br>
over-constrained. Now that the project has progressed
sufficiently, we are more<br>
confident that we can engage with the issue of null exclusion.<br>
<br>
A _refinement type_ (or _restriction type_) is a type that is
derived from<br>
another type that excludes certain values from the derived
type's value set,<br>
such as "the non-negative integers". In the most general form, a
refinement type<br>
is defined by one or more predicates (Liquid Haskell and Clojure
Spec are<br>
examples of this); range types in Pascal are a more constrained
form of<br>
refinement type. Non-nullable types ("bang" types) can
similarly be viewed as a<br>
constrained form of refinement type, characterized by the
predicate `x != null`.<br>
(Note that the null-excluding refinement type `X!` of a
reference type is still<br>
a reference type.)<br>
<br>
Rather than saying that primitive classes give rise to two
types, `X.val` and<br>
`X.ref`, we can observe the the null-excluding type `X!` of a<br>
implicitly-constructible value class can have the same runtime
characteristic as<br>
the `.val` type in the previous round. Both the
declaration-site property that<br>
a value class is implicitly constructible, and the use-site
property that a<br>
variable is null-excluding, are necessary to routinely get
flattening. <br>
<br>
Related to null exclusion is _null-adjunction_; this takes a
non-nullable type<br>
(such as `int`) or a type of indeterminate nullability (such as
a type variable<br>
`T` in a generic class that can be instantiated with either
nullable or<br>
non-nullable type parameters) and produces a type that is
explicitly nullable<br>
(`int?` or `T?`.) In the current form of the design, there is
only one place<br>
where the null-adjoining type is strictly needed -- when generic
code needs to<br>
express "`T`, but might be null. The canonical example of this
is `Map::get`;<br>
it wants to wants to return `V?`, to capture the fact that `Map`
uses `null` to<br>
represent "no mapping".<br>
<br>
For a given class `C`, the type `C!` is clearly non-nullable,
and the type `C?`<br>
is clearly nullable. What of the unadorned name `C`? This has
_unspecified_<br>
nullability. Unspecified nullability is analogous to raw types
in generics (we<br>
could call this "raw nullability"); we cannot be sure what the
author had in<br>
mind, and so must find a balance between the desire for greater
null safety and<br>
tolerance of ambiguity in author intent.<br>
<br>
Readers who are familiar with explicitly nullable and
non-nullable types in<br>
other languages may be initially surprised at some of the
choices made regarding<br>
null-exclusion (and null-adjunction) types here. The
interpretation outlined<br>
here is not necessarily the "obvious" one, because it is
constrained both by the<br>
needs of null-exclusion, of Valhalla, and the
migration-compatibility<br>
constraints needed for the ecosystem to make a successful
transition to types<br>
that have richer nullability information. <br>
<br>
While the theory outlined here will allow all class types to
have a<br>
null-excluding refinement type, it is also possible that we will
initially<br>
restrict null-exclusion to implicitly constructible value
types. There are<br>
several reasons to consider pursuing such an incremental path,
including the<br>
fact that we will be able to reify the non-nullability of
implicitly<br>
constructible value types in the JVM, whereas the null-exclusion
types of other<br>
classes such as `String` or of ordinary value classes such as
`LocalDate` would<br>
need to be done through erasure, increasing the possible sources
of null<br>
polluion. <br>
<br>
### Goals<br>
<br>
We adopt the following set of goals for adding null-excluding
refinement types: <br>
<br>
- More complete unification of primitives with classes;<br>
- Flatness is an emergent property that can derive from more
basic semantic<br>
constraints, such as identity-freedom, implicit
constructibility, and<br>
non-nullity;<br>
- Merge the concept of "value companion" (`.val` type) into the
null-restricted<br>
refinement type of implicitly constructible value classes;<br>
- Allow programmers to annotate type uses to explicitly exclude
or affirm nulls<br>
in the value set;<br>
- Provide some degree of runtime nullness checking to detect
null pollution;<br>
- Annotating an existing API (one based on identity classes)
with additional<br>
nullness information should be binary- and source-compatible.<br>
<br>
The last goal is a source of strong constraints, and not one to
be taken<br>
lightly. If an existing API that specifies "this method never
returns null"<br>
cannot be compatibly migrated to one where this constraint is
reflected in the<br>
method declaration proper, the usefulness of null-exclusion
types is greatly<br>
reduced; library maintainers will be put to a bad choice of
forgoing a feature<br>
that will make their APIs safer, or making an incompatible
change in order to do<br>
so. If we were building a new language from scratch, the
considerations might<br>
be different, but we do not have that luxury. "Just copying"
what other<br>
languages have done here is a non-starter. <br>
<br>
### Interoperation between nullable and non-nullable types<br>
<br>
We enable conversions between a nullable type and a compatible
null-excluding<br>
refinement type by adding new widening and narrowing conversions
between `T?`<br>
and `T!` that have analogous semantics to the existing boxing
and unboxing<br>
conversions between `Integer` and `int`. Just as with boxing
and unboxing,<br>
widening from a non-nullable type to a nullable type is
unconditional and never<br>
fails, and narrowing from a nullable type to a non-nullable type
may fail by<br>
throwing `NullPointerException`. These conversions for
null-excluding types<br>
would be sensible in assignment context, cast context, and
method invocation<br>
context (both loose and strict, unlike boxing for primitives
today.) This would<br>
allow existing assignments, invocation, and overload
applicability checks to<br>
continue to work even after migrating one of the types involved,
as required for<br>
source-compatibility.<br>
<br>
Checking for bad values can mirror the approach taken for
generics. When a<br>
richer compile-time type system erases to a less-rich runtime
type system, type<br>
safety derives from a mix of compile-time type checking and
synthetic runtime<br>
checks. In both cases, there is a possibility of pollution
which can be<br>
injected at the boundary between legacy and new code, by
malicious code, or<br>
through injudicious use of unchecked casts and raw types. And
like generics, we<br>
would like to offer the possibility that if a program compiles
in its entirety<br>
with no unchecked warnings, null-excluding types will not be
observed to contain<br>
null. To achieve this, we will need a combination of runtime
checks, new<br>
unchecked warnings, and possibly restrictions on initialization.
<br>
<br>
The intrusion on the type-checking of generics here is
considerable; nullity<br>
will have to be handled in type inference, bounds conformance,
subtyping, etc.<br>
In addition, there are new sources of heap pollution and new
conditions under<br>
which a varaible may be polluted. The _Universal Generics_ JEP
outlines a<br>
number of unchecked warnings that must be issued in order to
avoid null<br>
pollution in type variables that might be instantiated either
with a nullable or<br>
null-excluding type. While this work was designed for `ref` and
`val` types,<br>
much of it applies directly to null-excluding types.<br>
<br>
The liberal use of conversion rather than subtyping here may be
surprising to<br>
readers who are familiar with other languages that support
null-excluding types.<br>
At first, it may appear to be "giving up all the benefit" of
having annotated<br>
APIs for nullness, since a nullable value may be assigned
directly to a<br>
non-nullable type without requiring a cast. But the reality is
that for the<br>
first decade at least, we will at best be living in a mixed
world where some<br>
APIs are migrated to use nullness information and some will not,
and forcing<br>
users to modify code that uses these libraries (and then do so
again and again<br>
as more libraries migrate) would be an unnacceptable tax on Java
users, and a<br>
deterrent to libraries migrating to use these features. <br>
<br>
Starting from `T! <: T?` -- and forcing explicit conversions
when you want to go<br>
from nullable to non-nullable values -- does seem an obvious
choice if you have<br>
the luxury of building a type system from scratch. But if we
want to make<br>
migration to null-excluding types a source-compatible change for
libraries and<br>
clients, we cannot accept a strict subtyping approach. (Even if
we did, we<br>
could still only use subtyping in one direction, and would have
to add an<br>
additional implicit conversion for the other direction -- a
conversion that is<br>
similar to the narrowing conversion proposed here.)<br>
<br>
Further, primitives _already_ use boxing and unboxing
conversions to go between<br>
their nullable (box) and non-nullable (primitive) forms. So
choosing subtyping<br>
for references (plus an unbalanced implicit conversion) and
boxing/unboxing<br>
conversion for primitives means our treatment of null-excluding
types is<br>
gratuitously different for primitives than for other classes.<br>
<br>
Another consequence of wanting migration compatibility for
annotating a library<br>
with nullness constraints is that nullness constraints cannot
affect overload<br>
selection. Compatibility is not just for clients, it is also
for subclasses.<br>
<br>
### Null exclusion for implicitly constructible value classes<br>
<br>
Implicitly constructible value classes go particularly well with
null exclusion,<br>
because we can choose a memory representation that _cannot_
encode null,<br>
enabling a more compact and direct representation. <br>
<br>
The Valhalla JVM has support for such a representation, and so
we describe the<br>
null-exclusion type of an implicitly constructible value class
as _strongly null<br>
excluding_. This means that its null exclusion is reified by
the JVM. Such a<br>
variable can never be seen to contain null, because null simply
does not have a<br>
runtime representation for these types. This is only possible
because these<br>
classes are implicitly constructible; that the default zero
value written by the<br>
JVM is known to be a valid value of the domain. As with
primitives, these types<br>
are explicitly safe to use uninitialized. <br>
<br>
A strongly null-excluding type will have a type mirror, as type
mirrors describe<br>
reifiable types. <br>
<br>
### Null exclusion for other classes<br>
<br>
For identity classes and non-implicitly-constructible value
classes, the story<br>
is not quite as nice. Since there is no JVM representation of
"non-nullable<br>
String", the best we can do is translate `String!` to `String`
(a form of<br>
erasure), and then try to keep the nulls at bay. This means
that we do not get<br>
the flattening or density benefits, and null-excluding variables
may still be<br>
subject to heap pollution. We can try to minimize this with a
combination of<br>
static type checking and generated runtime checks. We refer to
the<br>
null-exclusion type of an identity or non-implicitly
constructible value class<br>
as _weakly null-excluding_.<br>
<br>
There is an additional source of potential null pollution, aside
from the<br>
sources analogous to generic heap pollution: the JVM itself.
The JVM<br>
initializes references in the heap to null. If `String!` erases
to an ordinary<br>
`String` reference, there is at least a small window in time
when this<br>
supposedly non-nullable field contains null. We can erect
barriers to reduce<br>
the window in which this can be observed, but these barriers
will not be<br>
foolproof. For example, the compiler could enforce that a field
of type<br>
`String!` either has an initializer or is definitely assigned in
every<br>
constructor. However, if the receiver escapes during
construction, all bets are<br>
off, just as they are with initialization safety for final
fields.<br>
<br>
We have a similar problem with arrays of `String!`; newly
created arrays<br>
initialize their elements to the default value for the component
type, which is<br>
`null`, and we don't even have the option of requiring an
initializer as we<br>
would with fields. (Since a `String![]` is also a `String[]`,
one option is to<br>
to outlaw the direct creation of arrays of weakly null-excluding
types, instead<br>
providing reflective API points which will safely create the
array and<br>
initialize all elements to a non-null value.)<br>
<br>
A weakly null-excluding type will not have a type mirror, as the
nullity<br>
information is erased for these types. Generic signatures would
be extended to<br>
represent null-exclusion, and similarly the `Type` hiearchy
would reflect such<br>
signatures. <br>
<br>
Because of erasure and the new possibilities for pollution,
allowing<br>
null-exclusion types for identity classes introduces significant
potential new<br>
complexity. For this reason, we may choose a staged approach
where<br>
null-restricted types are initially limited to the strongly
null-restricted<br>
ones.<br>
<br>
### Null exclusion for other value classes<br>
<br>
Value classes that are not implicitly constructible are similar
to identity<br>
classes in that their null-exclusion types are only weakly
null-excluding.<br>
These classes are the ones for which the author has explicitly
decided that the<br>
default zero value is not a valid member of the domain, so we
must ensure that<br>
in no case does this invalid value ever escape. This effectively
means that we<br>
must similarly erase these types to a nullable representation to
ensure that the<br>
zero value stays contained. (There are limited heroics the VM
can do with<br>
alternate representations for null when these classes are small
and have readily<br>
identifiable slack bits, but this is merely a potential
optimization for the<br>
future.) <br>
<br>
### Atomicity<br>
<br>
Primitives additionally have the property that
larger-than-32-bit primitives<br>
(`long` and `double`) may tear under race. The allowance for
tearing was an<br>
accomodation to the fact that numeric code is often
performance-critical, and so<br>
a tradeoff was made to allow for more performance at the cost of
less safety for<br>
incorrect programs. The corresponding box types, as well as
primitive variables<br>
declared `volatile`, are guaranteed not to tear, even under
race. (See the<br>
document entitled "Understanding non-atomicity and tearing" for
more detail.)<br>
<br>
Implicitly constructible value classes can be declared as
"non-atomic" to<br>
indicate that its null-exclusion type may tear under race (if
not declared<br>
`volatile`), just as with `long` and `double`. The classes
`Long` and `Double`<br>
would be declared non-atomic (though most implementations still
offer atomic<br>
access for 64-bit primitives.)<br>
<br>
### Flattening<br>
<br>
Flattening in the heap is an emergent property, which is
achieved when we give<br>
up the degrees of freedom that would prevent flattening:<br>
<br>
- Identity prevents flattening entirely;<br>
- Nullability prevents flattening in the absence of heroics
involving exotic<br>
representations for null; <br>
- The inability to use a class without initialization requires
nullability at<br>
the VM representation level, undermining flattening;<br>
- Atomicity prevents flattening for larger value objects.<br>
<br>
Putting this together, the null-exclusion type of implicitly
constructible value<br>
classes is flattenable in the heap when the class is non-atomic
or the layout is<br>
suitably small. For ordinary value classes, we can still get
flattening in the<br>
calling convention: all identity-free types can be flattened on
the stack,<br>
regardless of layout size or nullability.<br>
<br>
### Summarizing null-exclusion<br>
<br>
The feature described so far is at the weak end of the spectrum
of features<br>
described by "non-nullable types". We make tradeoffs to enable
gradual<br>
migration compatibility, moving checks to the boundary -- where
in some cases<br>
they might not happen due to erasure, separate compilation, or
just dishonest<br>
clients. <br>
<br>
Users may choose to look at this as "glass X% full" or "glass
(100-X)% empty".<br>
We can now more clearly say what we mean, migrate incrementally
towards more<br>
explicit and safe code without forking the ecosystem, and catch
many errors<br>
earlier in time. On the other hand, it is less explicit where
we might<br>
experience runtime failures, because autoboxing makes unboxing
implicit. And<br>
some users will surely complain merely because this is not what
their favorite<br>
language does. But it is the null-exclusion we can actually
have, rather than<br>
the one we wish we might have in an alternate universe. <br>
<br>
This approach yields a significant payoff for the Valhalla
story. Valhalla<br>
already had to deal with considerable new complexity to handle
the relationship<br>
between reference and value types -- but this new complexity
applied only to<br>
primitive classes. For less incremental complexity, we can have
a more uniform<br>
treatment of null-exclusion across all class types. The story
is significantly<br>
simpler and more unified than we had previously: <br>
<br>
- Everything, including the legacy primitives, is an object (an
instance of<br>
some class);<br>
- Every type, including the legacy primitives, is derived from
a class;<br>
- All types are reference types (they refer to objects), but
some reference<br>
types (non-nullable references to implicitly constructible
objects) exhibit<br>
the runtime behavior of primitives;<br>
- Some reference types exclude null, and some null-excluding
reference types<br>
are reifiable with a known-good non-null default;<br>
- Every type can have a corresponding null-exclusion type.<br>
<br>
## Planning for a null-free future (?)<br>
<br>
Users prefer working with unnanotated types (e.g., `Foo`) rather
than explicitly<br>
annotated types (`Foo!`, `Foo?`), where possible. The
unannotated type `Foo`<br>
could mean one of three things: an alias for `Foo!`, an alias
for `Foo?`, or a<br>
type of "raw" (unknown) nullity. Investigations into
null-excluding type<br>
systems have shown that the better default would be to treat an
unannotated name<br>
as indicating non-nullability, and use explicitly nullable types
(`T?`) to<br>
indicate the presence of null, because returning or accepting
null is generally<br>
a less common case. Of course, today `String` means "possibly
nullable String"<br>
in Java, meaning that, yet again, we seem to have chosen the
wrong default. <br>
<br>
Our friends in the `C#` community have explored the possibility
of a<br>
"flippening". `C#` started with the Java defaults, and later
provided a<br>
compiler mode to flip the default on a per-module basis, with
checking (or<br>
pollution risk) at the boundary between modules with opposite
defaults. This is<br>
an interesting experiment and we look forward to seeing how this
plays out in<br>
the `C#` ecosystem. <br>
<br>
Alternately, another possible approach for Java is to continue
to treat the<br>
unadorned name as having "raw" or "unknown" nullity, encouraging
users to<br>
annotate types with either `!` or `?`. This approach has been
partially<br>
explored in the `JSpecify` project. Within this approach is a
range of options<br>
for what the language will do with such types; there is a risk
of flooding users<br>
with warnings. We may want to leave such analysis to
extralinguistic type<br>
checkers, at least initially -- but we would like to not
foreclose on the<br>
possibility of an eventual flippening.<br>
<br>
</font></font>
</body>
</html>