Embrace nulls, was Re: Typed variants of primitives

Thu Dec 3 17:06:47 UTC 2020

All good thoughts.  Comments inline.

> Anyway, I've been meaning to throw an observation about nulls out for
> a few weeks: In recent language/library changes there has been a
> pushback to block or restrict null (streams, switch, patterns, etc),
> which eventually was rejected with a mantra of "embrace nulls"
> instead. Perhaps that same mantra of embracing nulls needs to be
> adopted by valhalla?

I see what you mean, but let me put some more nuance on the recent 
flurry of "embrace your inner null!" in other contexts.  This is not so 
much about loving nulls so much that we want all things to be nullable, 
as much as not trying to make nullable things into something they are 
not.  The domain of reference types includes null; things that work on 
references (e.g., Stream<T>, pattern matching on reference types) should 
not make unnecessary assumptions about the domain, lest we create sharp 
edges that inhibit otherwise sensible program transformations.

So this is not so much "everything should be nullable", as much as "of 
the things that are nullable, let them be what they are."  It's a 
message of tolerance :)

Secondarily, let me share an observation that may be obvious in 
hindsight, but was not obvious (to me, at least) at the start of this 
exercise.  Which is: nullability is a property of _object references_, 
not the a type itself.  This was not obvious because Java hasn't 
previously ever given us a way to deal with objects (like String) 
directly; we deal with them only through references. But key to 
understanding Valhalla is that for a given class (say String), there is 
a universe of instances of String, but they are not the members of the 
value set of the type String!  The value set of the type String consists 
of _references to_ instances of String, plus the special reference null.

This may sound like language mumbo-jumbo, but it is key to understanding 
the difference between Point.ref and Point.val.  It is _not_ the case 
that the value set of Point.ref is the value set of Point.val, plus 
null; the two value sets are in fact disjoint!  The value set of 
Point.ref is:

     { null } union { ref(x) : x in Point.val }

(Mathematically, this is called an _adjunction_, where there is an 
isomorphism between a set X and a subset of some other set Y; the 
isomorphism takes care of the "reference to" part.)

And, this is a story that has been baked into the JVM from day one: JVMS 
2.2 ("Data types") says:

> There are, correspondingly, two
> kinds of values that can be stored in variables, passed as arguments, 
> returned by
> methods, and operated upon: primitive values and reference values

> By embracing nulls, I mean that valhalla could actively seek to ensure
> *every* value type has an associated first-class null form. Even those
> like int128. And without trying to hide the null under the carpet as
> the current reference type approach does (with the side effect of
> ruining the potential performance benefits by not flattening in many
> cases). Instead of null being an enemy to be dealt with, why not treat
> it as a welcome and useful friend?

"Why not" is a good question, but it has some good answers.  I think 
many people think the fact that `int` always refers to an actual 
integer, rather than possibly being null, is a feature, not a bug (at 
least for many uses of int.)   I'm not sure that everyone would 
appreciate nulls being injected everywhere "for consistency."  And, as 
you observe, not all types have spare bit patterns, which means that 
there's a real cost to this nullity, which might not have been wanted in 
the first place.

> Group 1: Many potential value types have a spare bit pattern (eg.
> LocalDate). Asking them to arrange their bits so that all-zeroes is
> not used internally seems like an acceptable trade-off for the
> performance benefits. (I'm sure you've already explored mechanisms and
> syntaxes to do this). Group 1 would have just one exposed type, which
> would be nullable, avoiding the meaningless default problem (no
> LocalDate.val).

This is all true, though it's not only a matter of identifying the bit 
pattern corresponding to null; you also have to ensure that an NPE is 
thrown on the same kinds of access.

John made an impassioned plea for "No New Nulls", which is relevant; if 
we're going to have nullity, it should be a real null.
https://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2018-November/000784.html

> Group 2: For those value types that do not have a spare bit pattern,
> one would need to be added. This is the real "embrace nulls" moment,
> and I understand the pain of potentially forcing an Int128 class to
> take up 192 bits in memory. I do think it will rarely actually happen
> in practice though (see group 3). Group 2 is really just the same as
> group 1 but with wasteful memory usage, with one exposed nullable
> type.

For these especially, we have to be clear on why we would be embracing 
nulls, since the cost is felt so strongly here.

But another reason why "nullify all the things" is a questionable 
approach is: int and long will never be able to play.  So we will 
forever be left with a rift in the type system we cannot heal.  We can 
of course make all value types nullable, but it starts to undermine some 
of the reasons (among them, density) to have them in the first place.

> I also think this reduces the schism between valhalla types and
> reference-like. All new and migrated value types (eg Int128, Optional,
> LocalDate) would be UpperCamelCase and nullable, just like reference
> types.

I think you're falling into the same trap we fell into earlier, which is 
that the value set of a "primitive" type and the value set of a 
reference type are _different kinds of things_.  Essentially, by 
introducing null, you've made them all reference types.  That's not 
necessarily terrible, but it does add a constraint to types that may not 
want it.  As you point out, we can then try to optimize around it, but 
that means we can provide fewer guarantees.  Which brings me back to: 
what's the motivation?

Under the current approach, the default is that values are, well, 
values, but if you want nullability, you lift them into the domains 
where nulls already live -- reference types.  Point.ref is a reference 
type, so it is automatically nullable; Point.val consists of instances, 
not references, and null is not an instance.  So if you want 
nullability, you ask for it by saying what you really want: references.

This means that we can actually integrate primitives in a much more 
disciplined way.  In fact, while the original motivation of Valhalla was 
largely performance-driven -- flatness and density -- for many, the real 
benefit will be the unification of the type system, where we provide a 
rational basis for bridging primitives and classes, and the existing 
primitives are just "built-in" inline classes.