Wildcards -- Models 4 and 5
Brian Goetz
brian.goetz at oracle.com
Fri May 20 18:33:00 UTC 2016
In the 4/20 mail “Wildcards and raw types: story so far”, we outlined
our explorations for fitting wildcard types into the first several
prototypes. The summary was:
*
Model 1: no wildcards at all
*
Model 2: A pale implementation of wildcards, with lots of problems
that stem from trying to fake wildcards via interfaces
*
Model 3: basically the same as Model 2, except members are accessed
via indy (which mitigated some of the problems but not all)
The conclusion was: compiler-driven translation tricks are not going
to cut it (as we suspected all along). We’ve since explored two
other models (call them 4 and 5) which explore a range of options
for VM support for wildcards. The below is a preliminary analysis of
these options.
Reflection, classes, and runtime types
While it may not be immediately obvious that this subject is deeply
connected to reflection, consider a typical implementation of |equals()|:
|class Box<T> { T t; public boolean equals(Object o) { if (!(o instanceof
Box)) return false; Box other = (Box) o; return (t == null && other.t ==
null) || t.equals(other.t); } } |
Some implementations use raw types (|Box|) for the |instanceof| and cast
target; others use wildcards (|Box<?>|). While the latter is
recommended, both are widely used in circulation. In any case, as
observed in the last mail, were we to interpret |Box| or |Box<?>| as
only including erased boxes, then this code would silently break.
The term “class” is horribly overloaded, used to describe the source
class (|class Foo { ... }|), the binary classfile, the runtime type
derived from the classfile, and the reflective mirror for that runtime
type. In the past these existed in 1:1 correspondence, but no more — a
single source class now gives rise to a number of runtime types. Having
poor terminology causes confusion, so let’s refine these terms:
* /class/ refers to a source-level class declaration
* /classfile/ refers to the binary classfile
* /template/ refers to the runtime representation of a classfile
* /runtime type/ refers to a primitive, value, class, or interface
type managed by the VM
So historically, all objects had a class, which equally described the
source class, the classfile, and the runtime type. Going forward, the
class and the runtime type of an object are distinct concepts. So an
|ArrayList<int>| has a /class/ of |ArrayList|, but a /runtime type/ of
|ArrayList<int>|. Our code name for runtime type is /crass/ (obviously a
better name is needed, but we’ll paint that bikeshed later.)
This allows us to untangle a question that’s been bugging us: what
should |Object.getClass()| return on an |ArrayList<int>|? If we return
|ArrayList|, then we can’t distinguish between an erased and a
specialized object (bad); if we return |ArrayList<int>|, then existing
code that depends on |(x.getClass() == List.class)| may break (bad).
The answer is, of course, that there are two questions the user can ask
an object: what is your /class/, and what is your /crass/, and they need
to be detangled. The existing method |getClass()| will continue to
return the class mirror; a new method (|getCrass()|) will return a
runtime type mirror of some form for the runtime type. Similarly, a
class literal will evaluate to a class, and some other form of literal /
reflective lookup will be needed for crass.
The reflective features built into the language (|instanceof|, casting,
class literals, |getClass()|) are mostly tilted towards classes, not
types. (Some exceptions: you can use a wildcard type in an |instanceof|,
and you can do unchecked static casts to generic types, which are
erased.) We need to extend these to deal in both classes /and/ crasses.
For |getClass()| and literals, there’s an obvious path: have two forms.
For casting, we are mostly there (except for the treatment of raw types
for any-generic classes — which we need to work out separately.) For
instanceof, it seems a forced move that |instanceof Foo| is interpreted
as “an instance of any runtime type projected from class Foo”, but we
also would want to apply it to any reifiable type as well.
Wildcard types
In Model 3, we express a parameterized type with a |ParamType| constant,
which names a template class and a set of type parameters, which include
both valid runtime types as well as the special type parameter token
|erased|. One natural way to express a wildcard type is to introduce a
new special type parameter token, |wild|, so we’d translate |Foo<any>|
as |ParamType[Foo,wild]|.
In order for wildcard types to work seamlessly, the minimum
functionality we’d need from the VM is to manage subtyping (which is
used by the VM for |instanceof|, |checkcast|, verification, array store
checks, and array covariance.) The wildcard must be seen to be a “top”
type for all parameterizations:
|ParamType[Foo,T] <: ParamType[Foo,wild] // for all valid T |
And, wildcard parameterizations must be seen to be subtypes of of their
wildcard-parameterized supertypes. If we have
|class Foo<any T> extends Bar<T> implements I<T> { ... } class Moo<any T>
extends Goo { } |
then we expect
|ParamType[Foo,wild] <: ParamType[Bar,wild] ParamType[Foo,wild] <:
ParamType[I,wild] ParamType[Moo,wild] <: Goo |
Wildcards must also support method invocation and field access to the
members that are in the intersection of the members of all
parameterizations (these are the total members (those not restricted to
particular instantiations) whose member descriptors do not contain any
type variables.) We can continue to implement member access via
invokedynamic (as we do in Model 3, or alternately, the VM can support
|invoke*| bytecodes on wildcard receivers.)
We can apply these wildcard behaviors to any of the wildcard models
(i.e., retrofit them onto Model 2/3.)
Partial wildcards
With multiple type variables, the rules for wildcards generalize
cleanly, but the number of wildcard types that are a supertype of any
given parameterized type grows exponentially in the number of type
variables. We are considering adopting the simplification of erasing all
partial wildcards in the source type system to a total wildcard in the
runtime type system (the costs of this are: some additional boxing on
access paths where boxing might not be necessary, and unchecked casts
when casting a broader wildcard to a narrower one.)
Model 4
A constraint we are under is: existing binaries translate the types
|Foo| (raw type), |Foo<String>| (erased parameterization), and |Foo<?>|
all as |LFoo;| (or its equivalent, |CONSTANT_Class[Foo]|); since
existing code treats this as meaning an erased class, the natural path
would be to continue to interpret |LFoo;| as an erased class.
Model 4 asks the question: “can we reinterpret legacy |LFoo;| in
classfiles, and |Foo<?>| in source files, as |any Foo|“ (restoring the
interpretation of |Foo<?>| to be more in line with user intuition.)
Not surprisingly, the cost of reinterpreting the binaries is extensive.
Many bytecodes would have to be reinterpreted, including |new|,
|{get,put}field|, |invoke*|, to make up the difference between the
legacy meaning of these constructs and the desired new meaning. Worse,
while boxing provides us a means to have a common representation of
signatures involving |T| (T’s bound), in order to get to a common
representation for signatures involving |T[]|, we’d need to either (a)
make |int[]| a subtype of |Object[]| or (b) have a “boxing conversion”
from |int[]| to |Object[]| (which would be a proxy box; the data would
still live in the original |int[]|.) Both are intrusive into the
|aaload| and |aastore| bytecodes and still are not anomaly-free.
So, overall, while this seems possible, the implementation cost is very
high, all of which is for the sake of migration, which will remain as
legacy constraints long after the old code has been migrated.
Model 5
Model 5 asks the simpler question: can we continue to interpret |LFoo;|
as erased in legacy classfiles, but upgrade to treating |Foo<?>| as is
expected in source code? This entails changing the compilation
translation of |Foo<?>| from “erased foo” to |ParamType[Foo,wild]|.
This is far less intrusive into the bytecode behavior — legacy code
would continue to mean what it did at compile time. It does require some
migration support for handling the fact that field and method
descriptors have changed (but this is a problem we’re already working on
for managing the migration of reference classes to value classes.) There
are also some possible source incompatibilities in the face of separate
compilation (to be quantified separately).
Model 5 allows users to keep their |Foo<?>| and have it mean what they
think it should mean. So we don’t need to introduce a confusing
|Foo<any>| wildcard, but we will need a way of saying “erased Foo”,
which might be |Foo<? extends Object>| or might be something more
compact like |Foo<erased>|.
Comparison
Comparing the three models for wildcards (2, 4, 5):
* Model 2 defines the source construct |Foo<?>| to permanently mean
|Foo<erased ref>|, even when |Foo| is anyfied, and introduces a new
wildcard |Foo<any>| — but maintains source and binary compatibility.
* Model 4 let’s us keep |Foo<?>|, and retroactively redefines bytecode
behavior — so an old binary can still interoperate with a reified
generic instance, and will think a |Foo<int>| is really a
|Foo<Integer>|.
* Model 5 redefines the /source/ meaning of |Foo<?>| to be what users
expect, but because we don’t reinterpret old binaries, allows some
source incompatibility during migration.
I think this pretty much explores the solution space. Our choices are:
break the user model of what |Foo<?>| means, take a probably prohibitive
hit to distort the VM to apply new semantics to old bytecode, or accept
some limited source incompatibility under separate compilation but
rescue the source form that users want.
In my opinion, the Model 5 direction offers the best balance of costs
and benefits — while there is some short-term migration pain (in
relatively limited cases, and can be mitigated with compiler help), in
the long run, it gets us to the world we want without permanently
burdening either the language (creating confusion between |Foo<?>| and
|Foo<any>|) or the VM implementation.
In all these cases, we still haven’t defined the semantics of /raw
types/. Raw types existed for migration between pre-generic and generic
code; we still have that migration problem, plus the new migration
problems of generic to any-generic, and of pre-generic to any-generic.
So in any case, we’re going to need to define suitable semantics for raw
types corresponding to any-generic classes.
More information about the valhalla-spec-observers
mailing list