Embrace nulls, was Re: Typed variants of primitives

Thu Dec 3 01:02:06 UTC 2020

Oops, my throwaway comment on nulls has ended up in a thread...

Anyway, I've been meaning to throw an observation about nulls out for
a few weeks: In recent language/library changes there has been a
pushback to block or restrict null (streams, switch, patterns, etc),
which eventually was rejected with a mantra of "embrace nulls"
instead. Perhaps that same mantra of embracing nulls needs to be
adopted by valhalla?

By embracing nulls, I mean that valhalla could actively seek to ensure
*every* value type has an associated first-class null form. Even those
like int128. And without trying to hide the null under the carpet as
the current reference type approach does (with the side effect of
ruining the potential performance benefits by not flattening in many
cases). Instead of null being an enemy to be dealt with, why not treat
it as a welcome and useful friend?

Group 1: Many potential value types have a spare bit pattern (eg.
LocalDate). Asking them to arrange their bits so that all-zeroes is
not used internally seems like an acceptable trade-off for the
performance benefits. (I'm sure you've already explored mechanisms and
syntaxes to do this). Group 1 would have just one exposed type, which
would be nullable, avoiding the meaningless default problem (no
LocalDate.val).

Group 2: For those value types that do not have a spare bit pattern,
one would need to be added. This is the real "embrace nulls" moment,
and I understand the pain of potentially forcing an Int128 class to
take up 192 bits in memory. I do think it will rarely actually happen
in practice though (see group 3). Group 2 is really just the same as
group 1 but with wasteful memory usage, with one exposed nullable
type.

Group 3:  Most of the cases where there is no spare bit pattern are
also cases where there is an acceptable zero value. This is where
having two exposed types is actually useful, a nullable one and a
non-nullable one, typically with the nullable one needing more bits in
memory. The good name (Int128) would always go to the nullable form
and the other name (Int128.val or int128) would go to the
primitive-like zero-by-default form.

I also think this reduces the schism between valhalla types and
reference-like. All new and migrated value types (eg Int128, Optional,
LocalDate) would be UpperCamelCase and nullable, just like reference
types. The remaining points of difference around identity (==,
synchronization, etc) are generally less observable by developers.
Forcing use-site opt-in to the real valhalla primitives (Int128.val or
int128 or vInt128) would make it more obvious and visible to code
readers that these types are more primitive-like.

I think this could be allocated number 5 on your list. It embraces
nulls because it flips valhalla on its head by focussing on making
everything nullable, with primitive-like values as an optimisation. It
doesn't treat the primitive-like value types as the real deal and
nullability as an after-thought, it does the exact opposite where
nullable value types are the norm and primitive-like ones are
special/weird.

BTW I don't think developers will forget to check for null any more
than they do today - these new nullable value types won't look or
behave that differently to regular reference types wrt null. (And if
not checking for null is a problem, it should be fixed with holistic
nullable type tracking across references and values anyway.)

Summary of an "embrace nulls" approach:
- All value types have a standard form that is always nullable (embrace nulls)
- The standard form has an UpperCamelCase name
- Value type authors must ensure the all-zeroes bit pattern is not
used, and must add an extra "useless" field if necessary
- Value type authors can optionally choose to declare a second
non-nullable type where all-zeroes is given meaning (eg. zero)
- If they do, the second form always gets a non-standard name, eg
(Int128.val or int128 or vInt128)
- Both forms of value type would be fully flattenable

thanks
Stephen

On Wed, 2 Dec 2020 at 14:33, Brian Goetz <brian.goetz at oracle.com> wrote:
> > I don't think so, it all looks possible though no doubt very long
> > term. The only point I'd note is that Year.default is not a valid
> > year, thus more akin to null, but nulls are a whole other topic.
>
> The topic of "inline classes with no good default" is indeed a thorny
> one, and we don't yet have a good set of recommendations here.  Possible
> moves include:
>
> 1.  Just don't make it an inline class.
> 2.  Pick an arbitrary default (Jan 1, 1972.)
> 3.  Invent a sentinel, try to make using it fail-fast (like a new kind
> of null), and make users check it (which they'll forget to do.)
> 4.  Use a "reference-default" inline class (one for which the unadorned
> name corresponds to the reference projection), meaning that the default
> value will truly be `null`.
>
> None of these are great, but (4) seems to be the least-bad of the
> options identified so far.  In this world, you declare:
>
>      __mumble_ref_mumble__ inline class Year { ... }
>
> which, like any other inline class declaration, gives rise to three
> types: Year.ref (a true reference type, whose value set consists of
> references to instances of Year, and null), Year.val (whose value set
> consists of instances of Year), and Year.  The only difference is that
> Year becomes an alias for Year.ref rather than Year.val. So:
>
>      Year y = date.getYear()  // might be null
>
> What do you give up?  Well, refs are pointers, so Year is not flattened
> in layout or calling convention.  But Year.val is, so implementations
> can use Year.val for representation, and will get flattening in
> layouts.  It's not ideal, but it's a glass half-full, and the null you
> get is a real null, rather than an ad-hoc one. Work in progress.