Model 3 classfile design document

Tue Feb 2 13:46:00 UTC 2016

This is not a small question!  (Actually, depending on the interpretation of ParamType[List, String], it’s one of two questions; I’ll answer them both.)  

What does ParameterizedType[LFoo, String] mean?  Could be one of three things.

1.  Specialize Foo with T=String; this produces a fully reified Foo<String>.  
2.  Recognize that String is a ref type, and produce an erased Foo<String>.  
3.  Recognize that String is a ref type, and produce an erased Foo<String>, but with metadata that allows the types to be recovered through reflection.  

If your interpretation is #1 (which is what our interpretation is), then your question is: Why not “just” do reified generics?  

Alternately, your question might be: why not do #2 or #3, and retain the type information for longer, to expand the range of implementation choices.  I’ll answer this one first.  We’d like to minimize the intrusion of Java’s generic type system on the JVM.  Rules like “these types are erased, but these types are reified” are choices that should be left to the language compiler.  Just because Java decides to erase, doesn’t mean Kotlin should be required to; you should have the choice.  And this simplifies the VM implementation too — the language compiler asks for erasure or reification, and the VM responds accordingly.  (I don’t think this is your question, and I suspect you agree with all this.)

Another thing you could be asking is: why does the VM need to know about “erased” at all?  And the reason here is fairly simple (if unfortunate); erasure is noncompositional enough that the compiler cannot simply erase early and ask the VM to propagate and substitute thereafter; doing so would lead to incompatible translations, and we take it as a requirement that we be compatible with existing uses of reference generics.  It took us a long time to come to such a simple model for how to capture erasure!  

Which brings us to the question that I think is your real question: given that we now *can* reify generics over references types, why wouldn’t we always do so?  There are many reasons, including compatibility, expressibility, and footprint.  

Compatibility.  If we “just” reified List<String>, then existing code would be neither source- nor binary- compatible.  (When .NET switched to reified generics, you had to switch all of your libraries from the old libraries to the new reified libraries.)  That’s a non-starter for us; existing uses of generic classes (both clients and subclasses) should be source and binary compatible after the classes are anyfied.  (Additionally, plenty of code has assumptions about the result of reflective operations like .getClass() on generics, that could break if we reified all reference parameterizations.)  That means that reference instantiations need to continue to be erased.  

Some may have a hard time with this conclusion.  If you dig at this unease, I think the most likely explanation is the assumption that “well, reified generics are just better!”  But this isn’t true — both erasure and reification have pros and cons.  Erasure was not a “mistake” to be fixed by reification; it is a compromise, and I think a highly pragmatic one.  

(Some may ask “could we make reification an option, say at use site (e.g., “new List<reified String>”.)  We could, but I suspect that having a mix of reified List<String> and erased List<String> coexisting in the same heap would be an endless source of bugs and corner cases.)

Expressibility.  Our preference for erasure is not simply based on compatibility.  Real-world generic code is full of “dirty tricks” that involve casting through raw; sometimes this is just sloppiness or lack of expertise with generics, but sometimes this is the only practical way to achieve the desired result without incurring massive copying costs.  Truly reifying generics would mean that all this code would break and have to be rewritten.  

Footprint.  Erasure means that we can share a single class to represent all instantiations of a type; Map<String, String>, Map<Foot, Shoe>, etc.  Having separate types for each of these would involve more class loading, more class metadata,etc. Yes, there are techniques for minimizing this (.NET reifies a parameterization token but erases at code-gen time), but there is some cost.  

My point is simply, reification is far, far from free, and erasure is not simply a mistake or a hack to be undone at the first opportunity.  

So, to answer your direct question: the java compiler chooses to represent List<String> as ParamType[List, erased], rather than ParamType[List, String], for the above reasons — but Kotlin could make the opposite choice (at some Java interop cost.)  

On Feb 1, 2016, at 5:33 AM, Andrey Breslav <andrey.breslav at jetbrains.com> wrote:

> A question about these examples:
> R(Foo<raw>) = Class["Foo"] or ParameterizedType['L', "Foo", "_"]
> R(Foo<String>) = Class["Foo"] orParameterizedType['L', "Foo", "_"]
> R(Foo<int[]>) =ParameterizedType['L', "Foo", ArrayType[1, "I"]]
> Apparently, we want to preserve the information about int[], while we don't care about String. Why? Isn't int[] just a class, like String?
> 
> On Fri, Jan 22, 2016 at 7:53 PM Brian Goetz <brian.goetz at oracle.com> wrote:
> Please find a document here:
> 
> http://cr.openjdk.java.net/~briangoetz/valhalla/eg-attachments/model3-01.html
> 
> that describes our current thinking for evolving the classfile format to
> clearly and efficiently represent parametric polymorphism.  The early
> concepts of this approach were outlined in my talk at JVMLS last year;
> this represents a refinement of those ideas, and a reasonable "stake in
> the ground" description of what seems the most sensible way to balance
> preserving parametric information in the classfile without imposing
> excessive runtime costs for loading specializations.
> 
> We're working on an updated compiler prototype which people will be able
> to play with soon (along with a formal model.)
> 
> Please ask questions!
> 
> Some things this document does not address yet:
>   - How we deal with types implicit in the bytecodes (aload vs iload)
> and how they get specialized;
>   - How we represent restricted methods in the classfile;
>   - How we represent the wildcard type Foo<any>
> 
> 
> -- 
> Andrey Breslav
> Project Lead of Kotlin
> JetBrains
> http://kotlinlang.org/
> The Drive to Develop

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/attachments/20160202/b640376a/attachment.html>