Embrace nulls, was Re: Typed variants of primitives
Brian Goetz
brian.goetz at oracle.com
Thu Dec 3 17:06:47 UTC 2020
All good thoughts. Comments inline.
> Anyway, I've been meaning to throw an observation about nulls out for
> a few weeks: In recent language/library changes there has been a
> pushback to block or restrict null (streams, switch, patterns, etc),
> which eventually was rejected with a mantra of "embrace nulls"
> instead. Perhaps that same mantra of embracing nulls needs to be
> adopted by valhalla?
I see what you mean, but let me put some more nuance on the recent
flurry of "embrace your inner null!" in other contexts. This is not so
much about loving nulls so much that we want all things to be nullable,
as much as not trying to make nullable things into something they are
not. The domain of reference types includes null; things that work on
references (e.g., Stream<T>, pattern matching on reference types) should
not make unnecessary assumptions about the domain, lest we create sharp
edges that inhibit otherwise sensible program transformations.
So this is not so much "everything should be nullable", as much as "of
the things that are nullable, let them be what they are." It's a
message of tolerance :)
Secondarily, let me share an observation that may be obvious in
hindsight, but was not obvious (to me, at least) at the start of this
exercise. Which is: nullability is a property of _object references_,
not the a type itself. This was not obvious because Java hasn't
previously ever given us a way to deal with objects (like String)
directly; we deal with them only through references. But key to
understanding Valhalla is that for a given class (say String), there is
a universe of instances of String, but they are not the members of the
value set of the type String! The value set of the type String consists
of _references to_ instances of String, plus the special reference null.
This may sound like language mumbo-jumbo, but it is key to understanding
the difference between Point.ref and Point.val. It is _not_ the case
that the value set of Point.ref is the value set of Point.val, plus
null; the two value sets are in fact disjoint! The value set of
Point.ref is:
{ null } union { ref(x) : x in Point.val }
(Mathematically, this is called an _adjunction_, where there is an
isomorphism between a set X and a subset of some other set Y; the
isomorphism takes care of the "reference to" part.)
And, this is a story that has been baked into the JVM from day one: JVMS
2.2 ("Data types") says:
> There are, correspondingly, two
> kinds of values that can be stored in variables, passed as arguments,
> returned by
> methods, and operated upon: primitive values and reference values
> By embracing nulls, I mean that valhalla could actively seek to ensure
> *every* value type has an associated first-class null form. Even those
> like int128. And without trying to hide the null under the carpet as
> the current reference type approach does (with the side effect of
> ruining the potential performance benefits by not flattening in many
> cases). Instead of null being an enemy to be dealt with, why not treat
> it as a welcome and useful friend?
"Why not" is a good question, but it has some good answers. I think
many people think the fact that `int` always refers to an actual
integer, rather than possibly being null, is a feature, not a bug (at
least for many uses of int.) I'm not sure that everyone would
appreciate nulls being injected everywhere "for consistency." And, as
you observe, not all types have spare bit patterns, which means that
there's a real cost to this nullity, which might not have been wanted in
the first place.
> Group 1: Many potential value types have a spare bit pattern (eg.
> LocalDate). Asking them to arrange their bits so that all-zeroes is
> not used internally seems like an acceptable trade-off for the
> performance benefits. (I'm sure you've already explored mechanisms and
> syntaxes to do this). Group 1 would have just one exposed type, which
> would be nullable, avoiding the meaningless default problem (no
> LocalDate.val).
This is all true, though it's not only a matter of identifying the bit
pattern corresponding to null; you also have to ensure that an NPE is
thrown on the same kinds of access.
John made an impassioned plea for "No New Nulls", which is relevant; if
we're going to have nullity, it should be a real null.
https://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2018-November/000784.html
> Group 2: For those value types that do not have a spare bit pattern,
> one would need to be added. This is the real "embrace nulls" moment,
> and I understand the pain of potentially forcing an Int128 class to
> take up 192 bits in memory. I do think it will rarely actually happen
> in practice though (see group 3). Group 2 is really just the same as
> group 1 but with wasteful memory usage, with one exposed nullable
> type.
For these especially, we have to be clear on why we would be embracing
nulls, since the cost is felt so strongly here.
But another reason why "nullify all the things" is a questionable
approach is: int and long will never be able to play. So we will
forever be left with a rift in the type system we cannot heal. We can
of course make all value types nullable, but it starts to undermine some
of the reasons (among them, density) to have them in the first place.
> I also think this reduces the schism between valhalla types and
> reference-like. All new and migrated value types (eg Int128, Optional,
> LocalDate) would be UpperCamelCase and nullable, just like reference
> types.
I think you're falling into the same trap we fell into earlier, which is
that the value set of a "primitive" type and the value set of a
reference type are _different kinds of things_. Essentially, by
introducing null, you've made them all reference types. That's not
necessarily terrible, but it does add a constraint to types that may not
want it. As you point out, we can then try to optimize around it, but
that means we can provide fewer guarantees. Which brings me back to:
what's the motivation?
Under the current approach, the default is that values are, well,
values, but if you want nullability, you lift them into the domains
where nulls already live -- reference types. Point.ref is a reference
type, so it is automatically nullable; Point.val consists of instances,
not references, and null is not an instance. So if you want
nullability, you ask for it by saying what you really want: references.
This means that we can actually integrate primitives in a much more
disciplined way. In fact, while the original motivation of Valhalla was
largely performance-driven -- flatness and density -- for many, the real
benefit will be the unification of the type system, where we provide a
rational basis for bridging primitives and classes, and the existing
primitives are just "built-in" inline classes.
More information about the valhalla-dev
mailing list