Values and erased generics

Fri Oct 5 15:15:45 UTC 2018

Here’s a summary of the story we came up with for erased generics over 
values. It builds on the typing story outlined in John’s “Q-Types in 
L-World” writeup.

    Background

In MVT, there were separate carrier types (Q and L) for values and 
references. The Q carriers were not nullable, and an explicit conversion 
was required between L and Q types. This offered perfect nullity 
information to the JVM, but little ability to abstract over both values 
and references. This showed up in lots of places:

  * Values themselves could not implement interfaces, only their
    companion box type could.
  * Could not operate on values with |a*| instructions.
  * Could not store values in |Object| variables.

Each of these conflicted with the desire for genericity (whether 
specialized or erased). In Q-world, we couldn’t have erased generics 
over values, because erased code could not operate on values for 
multiple reasons (wrong carriers, wrong bytecodes). And looking ahead to 
specialized generics, having separate bytecodes for references and 
values increased the complexity of the specialization transform.

L-World started out as an experiment to validate the following hypothesis:

    We can, without significant performance compromises, reuse the |L|
    carrier and |a*| bytecodes for value types, and allow them to be
    proper subtypes of their interfaces (and |Object|).

In LW1, we took a hybrid approach; nullability and flattenability become 
properties of the variable (field, array element, stack slot, local 
variable), rather than a property of the type itself. This means we 
could be tolerant of nulls in local variables and stack slots, only 
rejecting nulls when they hit the heap, and we then relied on the 
translation strategy to insert null checks to prevent introduction of 
nulls. This allowed value-oblivious code to act as a “conduit” for 
values between value-aware code and a value-aware heap layout.

One of the conclusions of the LW1 experiment was that the JIT very much 
wants better nullity information than this; without enhancing our 
ability to prove that a given use of an L-type is null-free, we cannot 
fully optimize calling conventions.

        Erased generics in LW1

One of the motivations for L-World was that using L-carriers for values 
would likely better interoperate with generics by providing a common 
carrier and by allowing the |a*| bytecodes to operate uniformly on both 
values and references. However, the assumption that a type variable |T| 
is nullable interacts poorly here; given that values are proper L-types 
in LW1, it seems tempting to allow users to parameterize erased generics 
with values:

|List<Point> points = new ArrayList<Point>(); |

with the compiler translating |Point| as |LPoint|. Existing erased 
generics would “just work” … mostly. However, there are several sharp 
edges:

  * There are some API points that deliberately use nulls as sentinels,
    such as returning |null| from |Map::get| to signal that the key is
    not in the map. This would then NPE in the compiler-inserted null
    check when we tried to assign the result of |get(k)| to a
    value-typed |V|.
  * Some generic classes may accidentally try to convert the default
    (null) value of an uninitialized |Object| field or |Object[]| array
    element to a |T|, which would again NPE when it crossed the boundary
    from erased generic code to value-aware code.
  * If value arrays are subtypes of |Object[]|, and a |V[]| is passed to
    code that expects an |Object[]|, attempt to store a null in that
    array would NPE.

        The return of the Q

MVT had explicit L-types and Q-types; LW1 has only L-types, relying on 
the |ValueTypes| attribute to determine whether a given L-type describes 
a value or not.

In LW2, we will back off slightly from this unification, so as to 
provide the VM with end-to-end information about the flow of values and 
their potential nullity; for a given value class |V|, one can denote 
both the non-nullable type |QV;| and the nullable type |LV;|, where |QV 
<: LV|. The value set of |LV| is that of |QV|, plus |null|; both share 
the |L| carrier. No conversion is needed from |QV| to |LV|; |checkcast| 
and |instanceof| perform a null check when converting from |LV| to |QV|. 
|Q*| fields and array elements will be flattenable; |L*| will not. (As a 
side benefit, the |ValueTypes| attribute is no longer needed, as 
descriptors fully capture their nullability constraints.)

This gives language compilers some options; we can translate uses of a 
value type to either the |L| or |Q| variants. Of course, we don’t want 
to blindly translate value types as L-types, as this would sacrifice the 
main goal (flattenability), but we could use them where values meet 
erased generics.

    Meet the new box (not the same as the old box)

Essentially, the L-value types can be thought of as the “new boxes”, 
serving the interop role that primitive boxes do. (Fortunately, they are 
cheaper than primitive boxes; the boxing conversion is a no-op, and the 
unboxing conversion is a null check.)

Just as the JVM wants to be able to separately denote “non-nullable 
value” and “nullable value”, so does the language. In general, we want 
for values to be non-nullable, but there are exceptions:

  * When dealing with erased generic code, since an erased type variable
    |T| is nullable;
  * When dealing with legacy code involving a value-based class that is
    migrated to a value type, since existing code may treat it as nullable.

        Erased generics over boxes

Now, erased generics fall out for free: we just require clients to 
generify over the box type. This is no different from how we deal with 
primitives today — generify over the box. “Works like an int.”

|ArrayList<Integer> ints = new ArrayList<>(); ArrayList<Point.Box> points 
= new ArrayList<>(); |

Since |V.Box| is nullable, we have no problem with returning null from 
|Map::get|.

        Migration considerations

Nullability also plays into migration concerns. A baseline goal is that 
migrating a value-based class to a value type, or migrating an erased 
generic class to specialized, should be source and binary compatible. 
That means, we don’t want to perturb the meaning of |Foo<V>| in clients 
or subtypes when either |V| or |Foo| migrates.

For existing value-based classes, such as |LocalDate|, there are plenty 
of existing locutions such as:

|LocalDate d = null; if (d == null) { ... } ArrayList<LocalDate> dates = 
... dates.add(null); |

On the other hand, for a newly written value type which was never a VBC, 
we want the opposite. If |Point| is not an alias for |Point.Val|, users 
will have to say |Point.Val| everywhere they want flattening, which is 
cumbersome and easy to forget. And since flattening is the whole point 
of value types to begin with, this seems like it would be letting the 
migration tail wag the dog.

Taken together, this means we want some sort of contextual decision as 
to whether to interpret |Foo| as |Foo.Box| or |Foo.Val|. This could be 
based on the provenance of |Foo| (was it migrated from a VBC or not), or 
could be some sort of aliased import (|import Foo as Foo.Val|).

The declaration-site approach seems preferable to me; it gives the 
author of the class the choice of which face of their class to present 
to the world. Classes for which migration compatibility is of primary 
concern (e.g., |Optional|) get compatibility at the cost of biasing 
towards boxing; those for which flattening is of primary concern (e.g., 
|Complex|) get flattening at the cost of compatibility. In this 
approach, we put the pain on clients of migrated classes — they have to 
take an extra step to get flattening. In the long run, there will likely 
be more born-as-value classes than migrated-to-value classes, so this 
seems the right place to put the pain.

Note that the |Box| syntactic convention scales nicely to type 
variables; we can write specialized generic code like:

|<T> T.Box box(T t) { } |

Whatever syntactic convention we use (e.g., |T?|) would want to have 
similar behavior. (Another consideration in the choice of denotation is 
the number of potential type operators we may need. Our work in 
specialized generics suggests there may be at least a few more coming.)

        Primitives as values — a sketch

We would like a path to treating primitives and values uniformly, 
especially as we get to specialized generics; we don’t want to have deal 
with the 1-slot vs 2-slot distinction when we specialize, nor do we want 
to deal with using |iload| vs |aload|.

We can extend our lightweight boxing approach to allow us to heal the 
primitive/value divide. For each of our primitive types, we hand-code 
(or generate) a value class:

|value class IntWrapper { int x; } |

  * Box — the legacy heavy box (|java.lang.Integer| and friends)
  * Lox — the new lightweight value boxes (L-types of value classes)
  * Pox — the primitive wrapper value classes

So |X.Box| denotes the lox for values (and probably X itself for 
reference classes), and the lox-of-a-pox for primitives (making it 
total.) When we get to specialized generics, when we instantiate a 
generic class with a primitive type, we silently wrap them with their 
pox on the way in, which is a value, and we’ve reduced 
generics-over-primitives to generics-over-values. This is a huge 
complexity reducer for the specializer, as it need not deal with the 
fact that long and double take two slots, or with changing a* bytecodes 
to the corresponding primitive bytecodes.

It is an open question how aggressively we can deprecate or denigrate 
the legacy boxes (probably not much, but hope springs eternal.)

    Open issues

There were a few issues we left for further study; more on these to follow.

*Array covariance*. There was some degree of discomfort in pushing array 
covariance into |aaload| now. When we have specialized generics, we’ll 
be able to handle this through interfaces; it seems a shame to 
permanently weigh down intrinsic array access with potential 
megamorphism. We’re going to try to avoid plunking for array covariance 
now, and see how painful that is.

*Equality.* There was some discomfort with the user-model consequences 
of disallowing |==| on values. It is likely that we’d translate |val==| 
as a substitutibility test; if we’re going to do that, it’s not obvious 
whether we shouldn’t just lump this on |acmp|.

*Locking.* The same generics-reuse arguments as we made for nullability 
support also could be applied to locking on loxes. No one really likes 
the idea of supporting locking here, but just as surprise NPEs were 
sharp edge, surprise IMSEs might be as well.

*Construction.* We have not yet outlined either the language-level 
construction constraints or the translation of constructors to bytecode.