Valhalla -- finding the primitives
Brian Goetz
brian.goetz at oracle.com
Tue Feb 18 19:01:50 UTC 2020
I think its worth reflecting on how far we've come in Valhalla, both for
the specific designs in the VM and language, and the clarity of the
basic concepts.
In the early model (Q World), the idea was that we would declare a class
as either a value class or a "regular" class, and we would derive
various properties based on that:
regular classes have identity, value classes do not
regular classes are nullable, value classes are not
regular classes are reference types, value classes are not
This model was derived from the current relationship between `int` and
`Integer`. But, to interoperate with dynamically typed code (such as
reflection) and erased generics, we needed reference types, so each
value class got a "box" type (denotable as LV at the VM level) which was
a reference type. Any interfaces declared on the value class were
superinterfaces of the box. There was one runtime class, and
`getClass()` returned that, regardless of whether invoked on a value or
a boxed reference.
This worked, but there were many aspects which were either confusing or
unsatisfying. Value classes were neither reference types nor
primitives, so we had gone from a type system split cleanly in two
(which many people dislike) to one split uncleanly in three (for
example, values had the Object methods, but didn't derive from Object,
complicating code that was supposed to be generic across values and
references alike.) Some chafed at the notion that value types could
never be nullable; others didn't like that the all-zero value was always
a member of the value set, whether or not it had semantic meaning. Some
reference types had significant identity, and others (boxes) didn't.
And we didn't have a clean story for migration.
In the second iteration (L World), we addressed (at the VM level) the
need to box in order to access reference-related functionality, by
making `QV` a subtype of `LObject`, rationalizing the subtyping
relationships between arrays, and replacing the box with a
null-adjunction type (LV). This reduced the pressure on migration
substantially, but we still hadn't addressed most of the user model
issues, including the tripartite nature of the type system, and we
created quite a few problems as a result (such as the relationship
between the two class mirrors for QV/LV.)
For example, to address initialization safety (where the zero value is
outside the domain), we explored the notion of zero-default vs
null-default inline classes, which involved treating the all-zero value
as a null for some value classes but as a zero for others. But we kept
finding that we were having too many "flavors" of everything, because,
in hindsight, the various aspects were not yet cleanly factored down to
their primitives. In the end, it turned out we were conflating a number
of distinctions, and kept trying to use one as a proxy for another:
- nullable vs non-nullable
- pass-by-reference vs pass-by-value / flattened
- reference type vs value type
- identity-ful vs identity-free
For example, we wanted to call classes like String "reference types" and
classes like Point "value types", but when we got to types like Object
and interfaces, they had one foot in each camp. It turns out, that in
the "find the primitive" game, "reference type" wasn't the primitive.
Classes. The user declares _classes_ ("public class Foo { }"); we
derive _types_ from class declarations (Foo, Foo[], etc.) The primitive
that Valhalla introduces into class declaration is whether the instances
of the class _have identity or not_. Traditional classes are now
revealed to be "identity classes"; the new kind (identity-free) are
called "inline classes". (This might not be the final word on the
subject.)
Types and values. In the type system we have now, some types contain
primitive values, and other types contain _references to objects_. What
messed us up for a while is that the type types -- Object and interfaces
-- can contain both. A big AHA of the recent iterations is that it
makes sense to talk about both _values of_ inline classes and
_references to_ those values. Reference type has (almost) nothing to do
with inline vs identity -- it has to do with whether the value set of
the type contains values, or references.
For an identity class C, we derive one type: C, which consists of
references to instances of C. For an inline class V, we derive two
types: `V.ref`, which is a reference type (and therefore nullable), and
contains references to the instances of V, and `V.val`, which is not a
reference type, and whose values are "raw" instances of V.
With this understanding, the nullity problem becomes a simpler one:
nullity is a property of _reference types_. So `V.ref` is nullable, and
`V.val` is not; we don't need a way to say "nullable value" or different
ways to interpret the default value. We derive flattening and calling
conventions in the same way; for reference types, we always store / pass
as-if-by reference, but for "val" types, we store / pass as-if-by value.
It is this refined understanding that has brought me back to the ref/val
notation _for the types_. "Inline" is a way of saying "identity free"
when declaring classes, but it doesn't say anything (yet) about the
semantics of how we represent variables on the heap or pass them on the
stack. For this, we need an additional property of the type, and ref vs
val seems to ideally describe what we mean -- that the value set of the
type consists of either references or values, and the
representation/calling conventions behave as if we are storing/passing
references or values. (Having come to this clarity about the types, we
are free to pick a word other than "inline" if we think there is a
better way to say "identity-free", though I don't think going back to
"value" is necessarily right.)
With this distinction in place, some previously nasty problems (such as
nullity) become trivial (if you want "nullable values", use references),
and some previously impossible problems (such as unifying primitives
with values) become tractible.
More information about the valhalla-spec-observers
mailing list