Model 3 classfile design document

Wed Feb 17 17:09:03 UTC 2016

Having discussed the classfile representation and sketched out some 
plausibility arguments about how the VM can efficiently manage 
specialization, let's step back and look at the consequences for what 
this means for the language (both Java and other languages.)

Type -> Class mapping.  With erased generics, all parameterizations 
Foo<T> map to a single class Foo.  In the Model 3 model, the classfile 
for Foo is essentially a template; we can request parameterizations of 
Foo via the ParamType constant (the Class constant Class[Foo] becomes 
retconned to mean ParamType[Foo, erased].)

Reflection.  In the current prototype, Foo<int> and Foo<String> are 
distinct classes; each will respond with distinct .getClass() results.  
We don't yet have a means to express that Foo<int> and Foo<String> are 
different "species" of Foo; instead each get their own class mirror. 
Reflective operations like Foo<int>.class.getName() currently yield ugly 
results.  Lots of open questions here.

Reification. The question on everyone's mind will be: are we "finally 
getting reified generics"?  And the answer is: sort of.  (This question 
also comes with a lot of baggage; there are a lot of people who assume 
that erasure is somehow "smelly" and therefore bad, and so of course 
reification must be better. But erasure is a pragmatic compromise, and 
the alternative is not always better.  Let's try and leave the baggage 
at the door for now.)

To add to the confusion, not everyone means the same thing by "reified 
generics".  To some, reification means "types are checked at runtime"; 
to others, it may merely mean "types are reflectively available at 
runtime."  Even within the first category, there's a range of what sort 
of type checking we might mean, since the VM type system may not be 
exactly the same type system as the language-level type system -- and 
for good reason.  (What if we ask for a reified ArrayList<? extends 
List<? extends Foo> & Serializable>?  Do we get runtime subtype checking 
for wildcards and intersections every time we try to put something in 
this List?  Would we even want that?  Are we sure such checks are 
decidable?)

In Model 3, specialization is clearly a form of reification; when we 
specialize ArrayList to E=int, the backing store is an int[], and 
therefore we get all the type checking that entails.  We can clearly 
layer additional support for reflectively exposing the bindings of type 
parameters in a number of ways.

The Model 3 classfile design explicitly admits both reified and erased 
generics at the VM level, by allowing a concrete type descriptor *or* 
the 'erased' token as a type parameter to a ParameterizedType.  (Note 
that 'erased' is not a type, it is merely an allowed type 
parameterization -- similar to wildcards in in the Java language.)  
There is nothing in the classfile design that encodes the rule 
"reference parameterizations are erased"; that's the choice of the 
language compiler.  In this way, we can consider any non-erased 
parameterization to be reified; a ParamType[ArrayList, LString] will 
throw ArrayStoreException at runtime if you try to cram something other 
than a String into it.

So, does that mean generics are reified?  Sort of...  For multiple 
reasons (including, but not exclusively compatibility), the current plan 
is for the Java language to continue to use erasure for reference 
parameterizations of generics.  But other languages are free to use full 
reification where it suits them (and if their Java interop requirements 
let them.)  If someone uses reflection to reflect over a List<String> 
and ask for its type parameter, it will come back as "erased" 
(reflection has to support this answer anyway, if only for compatibility 
with legacy code.)

So the punchline is, at the Java language, generics are erased *and* 
reified; generics over references are erased (as they are today) and 
generics over values are reified.  I suspect people will be about as 
jarred by this as they were by erasure in the first place; I expect 
we'll get some degree of "You idiots, you ran 99 yards only to fumble 
the ball on the 1 yard line."  But looking past this (which is mostly 
the above-mentioned baggage), the model seems sound enough; existing 
reference generics work as they always have, and new value generics work 
"better" (in that there are additional things you can do with them.)

In fact, it gives us a chance to be more honest about erasure, because 
"erased" can appear as a first-class member of the programming model.  I 
believe much of the complaints about erasure stem from the fact that it 
is inevitably a surprise when you first discover it.

On 1/22/2016 11:52 AM, Brian Goetz wrote:
> Please find a document here:
>
> http://cr.openjdk.java.net/~briangoetz/valhalla/eg-attachments/model3-01.html 
>
>
> that describes our current thinking for evolving the classfile format 
> to clearly and efficiently represent parametric polymorphism.  The 
> early concepts of this approach were outlined in my talk at JVMLS last 
> year; this represents a refinement of those ideas, and a reasonable 
> "stake in the ground" description of what seems the most sensible way 
> to balance preserving parametric information in the classfile without 
> imposing excessive runtime costs for loading specializations.