Values and erased generics
Brian Goetz
brian.goetz at oracle.com
Fri Oct 5 15:15:45 UTC 2018
Here’s a summary of the story we came up with for erased generics over
values. It builds on the typing story outlined in John’s “Q-Types in
L-World” writeup.
Background
In MVT, there were separate carrier types (Q and L) for values and
references. The Q carriers were not nullable, and an explicit conversion
was required between L and Q types. This offered perfect nullity
information to the JVM, but little ability to abstract over both values
and references. This showed up in lots of places:
* Values themselves could not implement interfaces, only their
companion box type could.
* Could not operate on values with |a*| instructions.
* Could not store values in |Object| variables.
Each of these conflicted with the desire for genericity (whether
specialized or erased). In Q-world, we couldn’t have erased generics
over values, because erased code could not operate on values for
multiple reasons (wrong carriers, wrong bytecodes). And looking ahead to
specialized generics, having separate bytecodes for references and
values increased the complexity of the specialization transform.
L-World started out as an experiment to validate the following hypothesis:
We can, without significant performance compromises, reuse the |L|
carrier and |a*| bytecodes for value types, and allow them to be
proper subtypes of their interfaces (and |Object|).
In LW1, we took a hybrid approach; nullability and flattenability become
properties of the variable (field, array element, stack slot, local
variable), rather than a property of the type itself. This means we
could be tolerant of nulls in local variables and stack slots, only
rejecting nulls when they hit the heap, and we then relied on the
translation strategy to insert null checks to prevent introduction of
nulls. This allowed value-oblivious code to act as a “conduit” for
values between value-aware code and a value-aware heap layout.
One of the conclusions of the LW1 experiment was that the JIT very much
wants better nullity information than this; without enhancing our
ability to prove that a given use of an L-type is null-free, we cannot
fully optimize calling conventions.
Erased generics in LW1
One of the motivations for L-World was that using L-carriers for values
would likely better interoperate with generics by providing a common
carrier and by allowing the |a*| bytecodes to operate uniformly on both
values and references. However, the assumption that a type variable |T|
is nullable interacts poorly here; given that values are proper L-types
in LW1, it seems tempting to allow users to parameterize erased generics
with values:
|List<Point> points = new ArrayList<Point>(); |
with the compiler translating |Point| as |LPoint|. Existing erased
generics would “just work” … mostly. However, there are several sharp
edges:
* There are some API points that deliberately use nulls as sentinels,
such as returning |null| from |Map::get| to signal that the key is
not in the map. This would then NPE in the compiler-inserted null
check when we tried to assign the result of |get(k)| to a
value-typed |V|.
* Some generic classes may accidentally try to convert the default
(null) value of an uninitialized |Object| field or |Object[]| array
element to a |T|, which would again NPE when it crossed the boundary
from erased generic code to value-aware code.
* If value arrays are subtypes of |Object[]|, and a |V[]| is passed to
code that expects an |Object[]|, attempt to store a null in that
array would NPE.
The return of the Q
MVT had explicit L-types and Q-types; LW1 has only L-types, relying on
the |ValueTypes| attribute to determine whether a given L-type describes
a value or not.
In LW2, we will back off slightly from this unification, so as to
provide the VM with end-to-end information about the flow of values and
their potential nullity; for a given value class |V|, one can denote
both the non-nullable type |QV;| and the nullable type |LV;|, where |QV
<: LV|. The value set of |LV| is that of |QV|, plus |null|; both share
the |L| carrier. No conversion is needed from |QV| to |LV|; |checkcast|
and |instanceof| perform a null check when converting from |LV| to |QV|.
|Q*| fields and array elements will be flattenable; |L*| will not. (As a
side benefit, the |ValueTypes| attribute is no longer needed, as
descriptors fully capture their nullability constraints.)
This gives language compilers some options; we can translate uses of a
value type to either the |L| or |Q| variants. Of course, we don’t want
to blindly translate value types as L-types, as this would sacrifice the
main goal (flattenability), but we could use them where values meet
erased generics.
Meet the new box (not the same as the old box)
Essentially, the L-value types can be thought of as the “new boxes”,
serving the interop role that primitive boxes do. (Fortunately, they are
cheaper than primitive boxes; the boxing conversion is a no-op, and the
unboxing conversion is a null check.)
Just as the JVM wants to be able to separately denote “non-nullable
value” and “nullable value”, so does the language. In general, we want
for values to be non-nullable, but there are exceptions:
* When dealing with erased generic code, since an erased type variable
|T| is nullable;
* When dealing with legacy code involving a value-based class that is
migrated to a value type, since existing code may treat it as nullable.
So, let’s say that a value class |V| gives rise to two types: |V.Val|
and |V.Box|. The former translates to |QV|; the latter to |LV|. The
former is non-nullable; the latter is nullable. And there exists a
boxing/unboxing conversion between them, just like with |int| and
|Integer| — but in this case, the cost of the “boxing” conversion is
much lower.
Erased generics over boxes
Now, erased generics fall out for free: we just require clients to
generify over the box type. This is no different from how we deal with
primitives today — generify over the box. “Works like an int.”
|ArrayList<Integer> ints = new ArrayList<>(); ArrayList<Point.Box> points
= new ArrayList<>(); |
Since |V.Box| is nullable, we have no problem with returning null from
|Map::get|.
Migration considerations
Nullability also plays into migration concerns. A baseline goal is that
migrating a value-based class to a value type, or migrating an erased
generic class to specialized, should be source and binary compatible.
That means, we don’t want to perturb the meaning of |Foo<V>| in clients
or subtypes when either |V| or |Foo| migrates.
For existing value-based classes, such as |LocalDate|, there are plenty
of existing locutions such as:
|LocalDate d = null; if (d == null) { ... } ArrayList<LocalDate> dates =
... dates.add(null); |
If we want migration of |LocalDate| to a value type to be source
compatible, then this constrains us to translate |LocalDate| to
|LLocalDate| forever. This suggests that |LocalDate| should be an alias
for |LocalDate.Box|; otherwise the meaning of existing code would change.
On the other hand, for a newly written value type which was never a VBC,
we want the opposite. If |Point| is not an alias for |Point.Val|, users
will have to say |Point.Val| everywhere they want flattening, which is
cumbersome and easy to forget. And since flattening is the whole point
of value types to begin with, this seems like it would be letting the
migration tail wag the dog.
Taken together, this means we want some sort of contextual decision as
to whether to interpret |Foo| as |Foo.Box| or |Foo.Val|. This could be
based on the provenance of |Foo| (was it migrated from a VBC or not), or
could be some sort of aliased import (|import Foo as Foo.Val|).
The declaration-site approach seems preferable to me; it gives the
author of the class the choice of which face of their class to present
to the world. Classes for which migration compatibility is of primary
concern (e.g., |Optional|) get compatibility at the cost of biasing
towards boxing; those for which flattening is of primary concern (e.g.,
|Complex|) get flattening at the cost of compatibility. In this
approach, we put the pain on clients of migrated classes — they have to
take an extra step to get flattening. In the long run, there will likely
be more born-as-value classes than migrated-to-value classes, so this
seems the right place to put the pain.
Note that the |Box| syntactic convention scales nicely to type
variables; we can write specialized generic code like:
|<T> T.Box box(T t) { } |
Whatever syntactic convention we use (e.g., |T?|) would want to have
similar behavior. (Another consideration in the choice of denotation is
the number of potential type operators we may need. Our work in
specialized generics suggests there may be at least a few more coming.)
Primitives as values — a sketch
We would like a path to treating primitives and values uniformly,
especially as we get to specialized generics; we don’t want to have deal
with the 1-slot vs 2-slot distinction when we specialize, nor do we want
to deal with using |iload| vs |aload|.
We can extend our lightweight boxing approach to allow us to heal the
primitive/value divide. For each of our primitive types, we hand-code
(or generate) a value class:
|value class IntWrapper { int x; } |
We introduce a bidirectional conversion between |int| and |IntWrapper|;
when the user goes to generify over |int|, we instead generify over
|IntWrapper|, and add appropriate conversions at the boundary (like the
casts we currently insert in erased generics.) We can then translate
|int.Box| as |LIntWrapper;|, and we can support erased generics over the
lighter value-boxes rather than the heavy legacy boxes.
Unfortunately, now we have three types that perform the role of boxes:
the new value wrappers like |IntWrapper|, their lightweight box type
|IntWrapper.Box|, and the legacy heavy box |java.lang.Integer|. To keep
our boxes straight, we could call them:
* Box — the legacy heavy box (|java.lang.Integer| and friends)
* Lox — the new lightweight value boxes (L-types of value classes)
* Pox — the primitive wrapper value classes
So |X.Box| denotes the lox for values (and probably X itself for
reference classes), and the lox-of-a-pox for primitives (making it
total.) When we get to specialized generics, when we instantiate a
generic class with a primitive type, we silently wrap them with their
pox on the way in, which is a value, and we’ve reduced
generics-over-primitives to generics-over-values. This is a huge
complexity reducer for the specializer, as it need not deal with the
fact that long and double take two slots, or with changing a* bytecodes
to the corresponding primitive bytecodes.
It is an open question how aggressively we can deprecate or denigrate
the legacy boxes (probably not much, but hope springs eternal.)
Open issues
There were a few issues we left for further study; more on these to follow.
*Array covariance*. There was some degree of discomfort in pushing array
covariance into |aaload| now. When we have specialized generics, we’ll
be able to handle this through interfaces; it seems a shame to
permanently weigh down intrinsic array access with potential
megamorphism. We’re going to try to avoid plunking for array covariance
now, and see how painful that is.
*Equality.* There was some discomfort with the user-model consequences
of disallowing |==| on values. It is likely that we’d translate |val==|
as a substitutibility test; if we’re going to do that, it’s not obvious
whether we shouldn’t just lump this on |acmp|.
*Locking.* The same generics-reuse arguments as we made for nullability
support also could be applied to locking on loxes. No one really likes
the idea of supporting locking here, but just as surprise NPEs were
sharp edge, surprise IMSEs might be as well.
*Construction.* We have not yet outlined either the language-level
construction constraints or the translation of constructors to bytecode.
More information about the valhalla-spec-observers
mailing list