[LW100] Specialized generics -- translation and binary compatibility issues
Brian Goetz
brian.goetz at oracle.com
Wed Oct 17 19:38:44 UTC 2018
Number 2 of 100 in a series of “What we learned in Phase I of Project
Valhalla.” This one focuses on the challenges of evolving a class to be
any-generic, while interacting with existing erased code. No solutions
here, just recaps of problems and challenges.
Let’s imagine a class today:
|interface Boxy<T> { T get(); void set(T t); } class Foo<T> implements
Boxy<T> { public T t; public T[] tArray; public Foo(T t) { set(t); }
public static<T> Foo<T> of(T t) { return new Foo(t); } T get() { return
t; } void set(T t) { this.t = t; this.tArray = (T[]) new Object[1] { t
}; } } |
and client code
|Foo<String> fs = new Foo<>("boo"); println(fs.t); println(fs.tArray);
println(fs.get()); Foo<?> wc = fs; if (wc instanceof Foo) { ... } |
When we compile this code, we’ll encounter |LFoo;| or
|Constant_class[Foo]| or just plain |Foo| in the following contexts:
* Foo extends Bar
* instanceof/checkcast Foo
* new Foo
* anewarray Foo[]
* getfield Foo.t:Object
* invokevirtual Foo.get():Object
* Method descriptors of |Foo::of|
We translate raw |Foo|, |Foo<String>|, and |Foo<?>| all the same way
today — |LFoo|.
Tentative simplification: reference instantiations are always erased
The specialization transform takes a template class and a set of type
parameters and produces a specialized class. This can cause member (and
supertype) signatures to change; for example, if we have
|T get() |
which erases to
|Object get() |
when we specialize with T=int, we’ll have
|int get() |
In theory, there’s nothing to stop us from specializing Listwith
T=String. However, in the earlier exploration, we settled on the
tentative simplification of always erasing reference instantiations, and
only specializing value instantiations. This is a tradeoff; we’re still
throwing away potentially useful type information (erasure haters will
be disappointed), in exchange for much greater sharing, and avoiding
some compatibility issues (existing generic code is rife with tricks
like “casting through wildcards” to coerce a |Foo<A>| to |Foo<B>|, which
only works as long as we erase; dirty tricks like this are often
necessary as there are some things that are hard to express in the
generic type system, even though the programmer knows them.)
Ignoring multiple type parameters for the moment, when |Foo| becomes
specializable, our model is that it will have an /erased/ species — call
it |Foo<erased>|. (If you ask it what its type parameters are, it will
say “erased”. That is, we reify the fact that it is erased…) While
migrating from erased to specialized generics requires source changes
and recompilation at the generic class declaration, it should not
require any changes or recompilation for clients. That means that legacy
client classfiles that talk about |Foo| must be considered to be talking
about |Foo<erased>|. (Hierarchies can be specialized from the top down,
so it is OK to specialize |Bar| before |Foo|, but not the other way around.)
While the generic specialization machinery will have no problem with
specializing to L-types, I think its a simplification we should hold on
to, that we treat all L type parameters as “erased” for purposes of
specialization.
Additional simplification: let’s not worry about primitives
In Burlington, we concluded that as long as there’s a Pox class for each
primitive, we can convert primitives to/from poxes through source
compiler transforms, and not worry about specializing over primitives.
Instead, when the user wants to specialize List, we instead specialize
for int’s pox. Except for those pesky arrays … more on that later.
Assumption: wild means wild
On the other hand, one of the non-simplifying assumptions we want to
make is that a wildcard type — |Foo<?>| — should describe any
instantiation of |Foo|, even when the wildcard-using code doesn’t know
about specialization. (Same with raw usages of |Foo|.) For example, if
the user has written a method:
|takeFoo(Foo<?> anyFoo) { anyFoo.m(); } |
in legacy (erased) code, we should be able to call |takeFoo()| with both
erased and specialized instances of |Foo|. As we’ll see, this
complicates member access, and really complicates arrays.
We will find utterances like
|invokevirtual Foo.get()Object getfield Foo.m:Object |
in legacy code; we want these to work against any specialization of |Foo|.
In the case where the instance is erased, things obviously have a decent
chance of lining up properly, as the erased members will not have been
specialized away. If our receiver is a specialized |Foo|, it gets
harder, as the member signatures will have changed due to specialization.
Starting in Model 2, we handled this with bridge methods; for each
specialized method, we also had an erased bridge. This is possible
because there’s an easy coercion from |QPoint| to |LObject|. (There are
other ways to get there besides bridges.)
Where this completely runs out of gas is in field access; there’s no
such thing as a “bridge field”. So legacy code that does |getfield
Foo.t:Object| will fall over at link time, since the type of field |t|
in a specialized |Foo| might not be |Object|.
Another place this falls short is when a signature has |T[]| in it. Even
with bridge methods, without either array covariance (this is what I
meant when I said it might come back) or a willingness to copy, a legacy
client that invokes a method that returns a |T[]| will invoke it
expecting an |Object[]|, but without array covariance, a |Point.Val[]|
is not an |Object[]|. (Note that relatively few methods actually expose
|T[]| parameters, so its possible there are other dodges here.)
Wildcards
One of the central challenges of pushing specialization into the VM is
how we’re going to handle wildcards. Given a generic class |Foo|, the
wildcard type |Foo<?>| is a supertype of any instantiation |Foo<X>| of
|Foo|. The wildcard type also erases to |LFoo|.
In Model 2, we modeled wildcards as interfaces, with lots and lots of
bridges, but this still fell short in a number of ways: no support for
non-public methods or for fields, and we had to deal with fields by
hoisting them into virtual bridges on the interface.
Note that the wildcard subtyping also matters to the verifier, in
addition to handling bytecodes; the verifier must know that any
specialization of |Foo| is a subtype of the wildcard |LFoo|.
But what does |LFoo| mean?
Careful readers will notice that we’ve been playing fast and loose with
the meaning of |Foo|; sometimes it means the class, sometimes the
wildcard, and sometimes the erased species.
The best intuition we’ve been able to come up with is:
* There are /classes/ and /crasses/.
* A crass describes a single runtime type; it has a layout, methods,
constructors, etc.
* A (template) class describes a family of runtime types.
* A (template) class is like an abstract type; it has members and
subtypes, but can’t be instantiated directly.
* All the crasses derived from a class are subtypes of the class.
* For purposes of instantiation, we interpret |new Foo| as creating an
instance of the erased species, and a similar game with |<init>|
methods.
Model 3 classfile extensions
In Model 3, we extended the constant pool with some new entries:
*TypeVar[n, erasure].* This is a use of a type variable, identified by
its index /n/. (There was a table-of-contents attribute listing all the
type variables declared in a generic class or method, including those
declared in enclosing generic classes or methods.) Since the erasure of
a type variable is not merely a property of the type variable, but in
fact a property of how it is used, each use of a type variable carries
around its own erasure. For field whose type is |T|, the |NameAndType|
points not to |Object|, but to |TypeVar[0, Object]|.
When specializing a type variable to |erased|, any uses of that type
variable are replaced with the erasure in the |TypeVar| entry.
*MethodType[D,T…].* This is largely a syntactic mechanism, allowing us
to represent method descriptors with holes (but also had the benefit of
compressing the constant pool somewhat.) The parameter |D| was a method
type descriptor, except that in addition to the existing types, one
could specify |#| to indicate a hole; the |T...| parameters are CP
indexes to other types (which could be UTF8 strings, or |TypeVar|, or
the other type CP entries listed below.)
For example, a method
|int size(T t) |
would have a signature
|#1 = TypeVar[0, Object] #2 = MethodType[(#)I, #1] |
When specializing a |MethodType|, its parameters are recursively
specialized, and then the resulting strings concatenated.
*ParamType[C,T…].* This represents a parameterized type, where |C| is a
class name, and |T...| are the type parameters. So |List<int>| would be
represented as |ParamType[List,I]|, and |List<T>| would be represented
as |ParamType[List,TypeVar[0,e]]|.
When specializing a |ParamType|, its parameters are recursively
specialized, and then the resulting instantiation is computed.
*ArrayType[T,rank].* This represents an array of given rank.
The type parameters of a |ParamType|, |ArrayType|, or |MethodType| can
themselves be a |TypeVar|, |ParamType|, or |ArrayType|, as well as a UTF8.
We found that as a template language, these types allowed exactly the
sort of expressiveness needed, and specialized efficiently down to
concrete descriptors (though in the M3 prototype, we had concrete
descriptors of the form |List$0=I| to describe |List<int>|, obviously we
don’t want that here.) But these designs captured all the complexity we
needed (especially that of erasure), and allowed a mechanical
translation int Java 8 classfiles.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/attachments/20181017/8ec55de6/attachment-0001.html>
More information about the valhalla-spec-experts
mailing list