Embrace nulls, was Re: Typed variants of primitives

Stephen Colebourne scolebourne at joda.org
Fri Dec 4 00:04:04 UTC 2020

On Thu, 3 Dec 2020 at 17:07, Brian Goetz <brian.goetz at oracle.com> wrote:
> Secondarily, let me share an observation that may be obvious in
> hindsight, but was not obvious (to me, at least) at the start of this
> exercise.  Which is: nullability is a property of _object references_,
> not the a type itself.  This was not obvious because Java hasn't
> previously ever given us a way to deal with objects (like String)
> directly; we deal with them only through references. But key to
> understanding Valhalla is that for a given class (say String), there is
> a universe of instances of String, but they are not the members of the
> value set of the type String!  The value set of the type String consists
> of _references to_ instances of String, plus the special reference null.

Yes, an object reference really is just another value, not much
different to an `int`. This issue is that the language hides this from
us instead of making it visible via a type `ref<T>`:

 public class Name {
   private final ref<String> forename;
   private final ref<String> surname;
   private final int age;

The only special thing with an object reference (`ref<T>`) is that the
content bits of a reference value is a pointer to an instance that the
JVM understands, a pointer that can only be accessed via a reference
value, and cannot be directly observed except in that way. And as you
say, the thing at the end of the pointer can't be null.

Since an object reference is actually a value, that means Java
*already has* values that can be null!!! Despite this fact, there is a
mythology that nulls are somehow special and must only be associated
with reference values. But apart from history and JVM sunk cost I
don't think there is any conceptual reason why that has to be so,
especially when user-defined values could make great use of treating
an all-zero bit pattern as a "null".

> Essentially, by introducing null, you've made them all reference types.

Ultimately "null" is just a name we give to a particular action taken
when the JVM finds the all-zero bit pattern. Sometimes we call the
all-zero bit pattern "null" with an action that throws NPE and
sometimes we call it "zero" or "false" with actions that can use it
fully. There really isn't any conceptual need to say that reference
values are the only ones allowed to use the name and action associated
with  "null", it is just history that we do.

(I do suspect it would require introducing something approaching an
actual `ref<T>` type in the JVM to which the throw NPE behaviour could
be attached. Also exposing it as a real type in the Java language
would provide syntax for sparse arrays of values for example).

> But another reason why "nullify all the things" is a questionable approach is: int and long will never be able to play.

We already have nullable int - Integer. What I'm suggesting doesn't
change that. Users would always code the nullable value type
(potentially with wasteful memory usage) but can still choose to write
the non-nullable one as well (satisfying your use cases and unifying
int). The change from the current valhalla approach is that Integer
would be a flattenable value type with a different size in memory to
int, not at the other end of a reference pointer. Bigger migration
problem for Integer/LocalDate of course.

> what's the motivation?

- I disagree with John that nullability and no good default are rare.
- I believe many if not most end-user classes have the potential to be
value types, not just a few special numeric classes in libraries.
- I don't believe an outcome where a memory hop is required to access
non-numeric values is acceptable
- I don't think developers benefit from having two exposed types for
values in most cases (this is only useful for numeric values where
zero has meaning)
- Controlling what fields/variables are or are not null is an
orthogonal problem, not something to be mixed into valhalla
- It is not because I like nulls, but because I believe they are the
best available option for uninitialized memory for non-numeric values

Anyway, I'll stop here - I'm well aware that this approach requires a
big leap relative to where valhalla currently is.

More information about the valhalla-dev mailing list