From karen.kinnear at oracle.com Fri Apr 8 12:07:51 2016 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Fri, 8 Apr 2016 08:07:51 -0400 Subject: Conditional members In-Reply-To: <56FADD1A.5070508@oracle.com> References: <56FADD1A.5070508@oracle.com> Message-ID: <5BDC8D49-2B18-447E-B6B5-BDA44F1F83DA@oracle.com> Brian, Request from the VM - it would make implementations seriously easier to make consistent if we can ensure that the JVMS does not contain the term ?arbitrary?. So yes, please, I would recommend that the static compiler filter out such situations of duplicate equally applicable members and it would be an error if there were no clear more specific candidate member. thanks, Karen > On Mar 29, 2016, at 3:52 PM, Brian Goetz wrote: > > Yet another in a series of disconnected, bottom-up (starting at the VM) memos laying the groundwork for the enhanced generics model. > > Basic Problem > ============= > > It may be desirable, for purposes of expressiveness or migration compatibility, to declare class members that are only members of a specific subset of parameterizations of a generic class. Examples include: > > - Reference-specific API assumptions. In our analysis of the Collection classes, we identified various methods that fail to make the jump to any-generics for various reasons. These include methods like Collection.toArray(), whose signature makes no sense for primitive parameterizations, or Map.get(), which uses `null` (not in the domain of primitives) to indicate "not present." We can't take these methods away from reference instantiations, but we don't want to propagate them into primitive instantiations. > > - Better implementations enabled by known type parameters. Generic classes will provide generic implementations, but sometimes better implementations are possible when concrete types are known. In this case, an implementation would provide a generic implementation and zero or more implementations that are restricted to more specific implementations. > > - Functionality available only on specific implementations. For example, List could have a sum() method even though sum() does not make sense on all instantiations. (This is the declaration-site version of what C# enables at the use site with extension methods -- allowing methods to be injected into types, rather than classes.) > > > We've not yet spent a lot of time identifying the proper way to surface this in the language. For methods, one possibility is to use receiver parameters (added in Java SE 8) to qualify the receiver type: > > int sum(List this) { ... } > > This gets the point across clearly enough (and is analogous to how C# does extension methods), but has several drawbacks: doesn't scale to fields, nor does it scale well to a conditional-membership model that is anything other than "I am a member of parameterization X". (Where this might fall down, for example, would be when we want members declared as "I am *not* a member of parameterization X".) > > Note that in the second motivating example, there will be two members signatures with the same name and signature; we want one to take precedence over the other. > > We call these "conditional" or "restricted" members. > > > Classfile Strawman > ================== > > Here's a strawman of how we might represent this at the VM level. > > We define a new attribute, `Where`, which can be applied to instance fields, instance methods, and constructors: > > Where { > u2 name_index; > u4 length; > u2 restrictionDomain; // refers to a ParamType constant > } > > The restriction domain indicates the parameterization to which this member is restricted; in the absence of Where attribute, it is assumed to be ThisClass. > > When loading a parameterization of a generic class, we perform an applicability check for each member as we encounter it; in the model outlined here, this is a straight subtyping check of the current parameterization against the restriction domain. > > It is possible there could be duplicate applicable methods; this arises when we have a specialization-specific "override", as in: > > class Foo { > // total method m(T) > void m(T t) { } > > // Specialization of m(T) for T=int > void m(Foo this, int i) { ... } > } > > When we find a duplicate applicable member, we perform a "more specific" check comparing the restriction domains; in this case, the second method has a restriction domain of Foo, which is more specific than the (implicit) Foo restriction domain of the generic method, so we prefer the second member. > > This procedure is strictly linear; as each member is read from the classfile, we can make a quick determination as to whether to keep or discard it; if we keep it, we might replace it later with a more specific one as we find it. Modulo cases where there are multiple applicable overloads that are equally specific, it is also deterministic; whether we find the generic version of m() or its specialization first, we'll end up with the same set of members. > > If there are duplicate applicable members in a classfile where neither's restriction domain is more specific than the other's, then the VM is permitted to make an arbitrary choice (as they are both applicable and equally specific.) The static compiler can work to filter out such situations, if desired, such as imposing a "meet rule"; if we had: > > void foo(Foo this) > void foo(Foo this) > > a meet rule would require the additional overload > > void foo(Foo this) > > > From brian.goetz at oracle.com Fri Apr 8 23:20:30 2016 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 8 Apr 2016 19:20:30 -0400 Subject: Conditional members In-Reply-To: <201604082055.u38Ktp07011333@d03av05.boulder.ibm.com> References: <201604082055.u38Ktp07011333@d03av05.boulder.ibm.com> Message-ID: <57083CBE.4080909@oracle.com> On 4/8/2016 4:55 PM, Bjorn B Vardal wrote: > > Applicability check > > > When loading a parameterization of a generic class, we perform an > applicability check for each member as we > > > encounter it; in the model outlined here, this is a straight > subtyping check of the current parameterization against > > > the restriction domain. > > In order to support the subtyping check, the applicability check > should happen in the specializer, and not when loading the > specialization. Both the type information and the class hierarchy are > more easily accessible at that point. > Agree that this is a specialization-time decision. But I'm not sure what "the specializer" is; in our prototype, we have a class that takes a byte[] and a set of bindings and produces a new byte[] which we call the specializer, but that's just one (bad) implementation strategy. So I'm not sure that "a specializer" is a necessary part of the story, but yes, this is a decision made at specialization time. > > If there are duplicate applicable members in a classfile where > neither's restriction domain is more specific than the > > > other's, then the VM is permitted to make an arbitrary choice. > > Seconding Karen's comment, we'd like to avoid "arbitrary" choices, as > both users and JVM implementers need to know how to get consistent > behaviour. Unspecified behaviour may change, and it may also have > corner cases that are treated differently by different JVM > implementations. > > Would it be better to reject a specialization where there are multiple > maximally specific applicable members, or to reject templates that > would allow such scenarios? > We plan to reject these at compile time in any case. The only question is, if some other compiler produces a classfile where there are multiple applicable specializations, do we want to reject it on principle? It's more work to reject the classfile than to make an arbitrary choice. Consider this classfile: Where[T=String] void m() {} Where[U=String] void m() {} Where[T=String, U=String] void m() {} This classfile is valid. But we don't know that until we've read all the way to the bottom; when we hit the second m(), we would have to record "crap, if I don't see an m() that is better than both the first one and second one by the end, I'll have to bail." That means accumulating state as we read the classfile that then has to be validated at the end. Whereas, if we do the purely local thing, we avoid this check. It's your call, but I didn't want to specify something that had a cost to prevent something mostly harmless that happens rarely. > Reflection > > We need to specify what the reflection behaviour will be for > conditional members, as it may depend on how each JVM implementation > decides to represent species internally. The current reflection > behaviour is not well specified, and adding conditional members may > add more inconsistencies. > Yep.... > > JVMTI / class redefinition / class retransformation > > This applies conditional members specifically, and also to > specializations in general. > > What happens when a generic class is redefined? Will the whole > specialization nest require redefinition, or will the redefinition be > limited to redefined specialization? What about changes to a generic > class (template)? What happens if the restriction domain of a > conditional members changes? > We need to be careful with the terminology, which is hard because we don't have good terminology yet. We have "source classes" that are compiled into "class files", each of which may define more than one "runtime type", which can be reflected over with "runtime mirrors". All of these are called "classes" :( Since there's no artifact for a specialization, I don't think we will support redefinition for a specialization; we'd support redefinition for a class FILE. And I think the logical consequence is that we then have to redefine all extant specializations of that classFILE, since the change could potentially affect all specializations. > *Any-interface* > Will only non-conditional methods be in the any-interface? Or will > conditional methods have a default implementation (e.g. throw > UnsupportedOperationException)? Yes; the any-interface represents only total members. > > Motivation > > I think the API migration concern is compelling. But to handle that, > it's sufficient to be able to restrict members to the all-erased > specialization (or else require them to be total). This mechanism > could be very simple, and the resulting API differences seem to be > well justified by the compatibility requirements. > If that were only the case .... :( Yes, the most egregious examples will be when a method is simply unsuitable for non-reference parameters, and indeed a simpler "where all-erased" criteria would fit the bill here, and it's a pretty compelling place to want to stop. Here's another migration concern. We want to migrate Streams such that IntStream and friends can be deprecated. Pipelines like: List strings = ... strings.stream().map(String::length) can now result in a Stream rather than a Stream (yay!), but in order to retire IntStream, there are some methods, like sum/min/max, that are pretty hard to let go of. Ideally I'd rather slice along a more abstract dimension (e.g., "where T extends Arithmable"), but that's a whole new bag of problems that I'd like to not couple to this one. Let's say that we agree that the conditionality selectors should be "as simple as possible" and we're going to work within the target use cases to determine exactly what that means.... The current proposal is pretty simple, in that it is a pure subtyping test, is amenable to a meet-rule, and doesn't include not-combinators. But I agree its not the only place we could land, and we're open to exploring this further. > In general I like the idea of a facility that allows for method > implementations to be specialized for known types. It can help to get > performance in cases where otherwise some abstraction would get in the > way by forcing us to treat things uniformly. And the spirit of such > specialization is that it should be (at least mostly) transparent, so > users shouldn't usually need to think about how the implementation is > selected in this case. > > However, at the Java language level, conditional members have a > significant limitation here. Erasure means that it's only possible to > specialize for primitive types. There's no way to specialize for > String, for example. > > Then there is type-specific functionality such as List.sum(). > This doesn't strike me as something that belongs in List, any more > than these do: > > - List.append() > > - List>.append() > > - List>.compose() > > But due to erasure, these wouldn't be expressible. This kind of API > extension is limited to primitive types. (Later it could be done for > value types more generally, but I don't think it would be good to > allow users to special-case their own APIs for user-defined value > types, but not for T=String.) > Actually, this isn't true (but your general argument about "Is this the language feature we're looking for" is still entirely valid.) We can express this *up to erasure*, just as we can with everything else. If we have, at the source level, a member conditioned on an erased parameterization: void append(T t) { ... } this is fine. We erase String to 'erased' in the classfile (so it becomes "where erased T"), but the compiler can enforce that it is only invoked when T=String, just as we do with: T m(T t) { .... } In the classfile, the arg and return types are Object, but the compiler will reject the call if T != String. Where this runs out of gas is overloads that are erasure-equivalent: void append(T t) { ... } void append(T t) { ... } which the compiler will reject with the familiar, if frustrating, "can't overload these methods, they have the same erasure" error. So I think we can (if we want) extend this treatment to reference types, up to where erasure gets in the way. > We would get the fluent style of call "(...).sum()", but I don't think > adding methods to List is the right way to get that, especially if it > will only work for primitive types, and if it means that users need to > think about sometimes methods of List more often than necessary. > I would, in fact, be disinclined to add such methods to List. But Stream -- which is about *computation*, not *data*, seems a different case. In fact, in addition to the terrible performance that boxed streams have, telling Java developers that they should sum a stream with ...reduce(0, (x,y) -> x+y) was likely, we felt, to engender this response (warning, NSFW): http://s.mlkshk-cdn.com/r/FTAF Still open to better ideas, though. From kevinb at google.com Mon Apr 11 20:53:03 2016 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 11 Apr 2016 13:53:03 -0700 Subject: Value types questions & comments Message-ID: Hi all, Okay, this is a bit overdue, but I'm finally digging into "state of the values" again. Not even going to dig into the specialization stuff yet in this email. My perspective on this: Since Java has had only two "kinds of types" for 20+ years -- and since the tension between those two is already a major source of confusion and bugs for intermediate programmers -- adding a third kind now is a *Very Big Deal*. It needs to be as simple as possible to understand this new kind, as nothing but a "natural" hybrid of the other two. The fewer asterisks we need to put on that simple model, the better. I've gathered that it's like a reference type in that it's a named, user-defined type that can have fields (of any kind), methods and constructors (getting eq/hc/ts for free, kind of like enum classes get valueOf for free, I assume), and can implement interfaces. But I think it resembles a primitive in pretty much every other way? - No identity (so mutability isn't even a question) - Can't be null - Can't have subtype or supertype (excepion: as above, value types can implement interfaces) - Does not extend Object, so synchronization/wait/notify not possible - Not heap-allocated (locals on the stack, fields and array values inlined) - Can be boxed to an Object ... although *boxing works differently So first off: have I got all that right? Next I just have a laundry list of random questions. Conceptual question: is a user-defined value type a "class"? A "yes" and a "no" answer both seem defensible, and of course we have to choose one and defend it. And notably, whichever way we decide it, users are going to have to rethink their preconceived notions of what a "class" is no matter what. (This gets back to my statement that what we're doing here is a Very Big Deal. These are bedrock concepts we're tampering with.) On the one hand, classes are things that have fields and methods, so yes, a value type is a class. On the other hand, one expects classes to have "instances"/"objects", pointed to by references, which these don't. Also, you expect to be able to call getClass() and get something useful back (that knows what methods are present, what interfaces are implemented) and that doesn't seem possible in the general case here (but could maybe(?) be faked in cases where the static type of the value is known to the compiler). It's nice that a value type can implement interfaces. But I get confused when I try to think through the implications of this. I get that when referring to it as the interface type, boxing *may* occur. I'd expect eq/hc/ts on the box to pass through to the value itself (two different boxes of equal values are equal). But... maybe most of my confusion is just stemming from getClass() again. What would it return? Could the returned Class possibly have all the metadata a user might expect? I think not? Re: "Large groups of component values should usually be modeled as plain classes", I'd VERY much like avoid putting that responsibility onto the user if at all possible. Is there a reason why the VM can't simply decide "this is past my threshold, so I'm gonna box it instead of putting it all on the stack" and not make the developer worry about it? Re: "Cloning a value type does not strictly make sense," well, *technically* when a value includes fields of Cloneable reference types, you might want a deep-clone of that. However I lean toward thinking this is too weird to bother supporting. Users should really be dissuaded from including references to *mutable* types in their value definition in the first place. Which actually raises another issue. If my value type has no way to include an int[] *unmodifiably*, that would be extremely sad, right? Going to stop there for now! -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Mon Apr 11 23:13:56 2016 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 11 Apr 2016 19:13:56 -0400 Subject: Value types questions & comments In-Reply-To: References: Message-ID: <2127B453-99E7-4879-9097-F46BE46DB0C3@oracle.com> Thanks for pulling these together. Some quick answers inline. Don?t believe me ? challenge the answers. > My perspective on this: Since Java has had only two "kinds of types" for 20+ years -- and since the tension between those two is already a major source of confusion and bugs for intermediate programmers -- adding a third kind now is a Very Big Deal. It needs to be as simple as possible to understand this new kind, as nothing but a "natural" hybrid of the other two. The fewer asterisks we need to put on that simple model, the better. Total agreement. We view values as generalizations of primitives, where primitives are ?values with legacy baggage?, and hopefully as little baggage as possible. So hopefully in the end we still have two things, references and values (with some values ?more equal than others? for historical reasons.) The biggest baggage is probably surrounding the bespoke box types that we?re probably stuck with. > I've gathered that it's like a reference type in that it's a named, user-defined type that can have fields (of any kind), methods ad constructors (getting eq/hc/ts for free, kind of like enum classes get valueOf for free, I assume), and can implement interfaces. Right. > But I think it resembles a primitive in pretty much every other way? > No identity (so mutability isn't even a question) > Can't be null > Can't have subtype or supertype (excepion: as above, value types can implement interfaces) > Does not extend Object, so synchronization/wait/notify not possible > Not heap-allocated (locals on the stack, fields and array values inlined) > Can be boxed to an Object ... although *boxing works differently > So first off: have I got all that right? Yes. And, some of these asterisks can be erased. For example, I see no reason why `int` can?t implement Comparable or Serializable (though seeing 1.compareTo(2) might make some developer?s heads explode, so we might dial back on how much we close up this gap ? TBD.) As you say, the biggest asterisk is how we handle boxing. The box types for values will be derived from the class file and have nice clean properties, whereas the box types for primitives will likely remain some sort of bespoke bag of smelly stuff. We might even take this further ? by actually describing `int` with a source file (public native class int implements Comparable { ? }) which might try and smooth out some of the differences, but I wouldn?t hold out a lot of hope for this being super successful. Mostly this is just moving the magic around, but its possible this will seem less overall magic to some. Another asterisk: the semantics of operators are predefined on primitives, and not at all on values. Its possible we can close up this gap too, but I?ve been deliberately avoiding opening this Pandora?s Can Of Worms, strictly for scope-management reasons. (But given that one motivating example for values is alternate numerics, calls for operator overloading won?t be far behind.) Though, bottom line, I think users will be able to recognize that the primitives are special cases of this new value thingie. They behave so similarly, they have all the same restrictions, then can be used in all the same places. > Conceptual question: is a user-defined value type a "class"? A "yes" and a "no" answer both seem defensible, and of course we have to choose one and defend it. And notably, whichever way we decide it, users are going to have to rethink their preconceived notions of what a "class" is no matter what. (This gets back to my statement that what we're doing here is a Very Big Deal. These are bedrock concepts we're tampering with.) Yes, there?s gonna be some adjustment of mental models required. (Additionally, enhanced generics also put a lot of pressure on the deeply overloaded word ?class?, since we will have multiple runtime parameterizations of a given generic ?class?.) A class is used to describe a source file, a binary file, a runtime type, something you load, a type mirror ?. Early in Java?s lifetime, these entities were in strict 1:1 correspondence, but no more. We have classes at the source level ? this will probably expand to include value types. We have class files ? this will probably similarly expand. I don?t think these will be controversial. But I think we need to call the runtime entities something else ? like TYPE and TYPE MIRROR. The meaning of ?class? is already too overloaded. Again, though, the game here is to frame the old reality as a lower-dimensional projection of the new reality, and this doesn?t seem impossible. > On the one hand, classes are things that have fields and methods, so yes, a value type is a class. On the other hand, one expects classes to have "instances"/"objects", pointed to by references, which these don't. Also, you expect to be able to call getClass() and get something useful back (that knows what methods are present, what interfaces are implemented) and that doesn't seem possible in the general case here (but could maybe(?) be faked in cases where the static type of the value is known to the compiler). I think this one isn?t so bad. Java has TYPES today, reference types and primitive types. Instances of reference types are object references, and instances of primitive types are values. So the notion of types whose members are not references is not new. Because value types are not polymorphic, there?s no case where you have a value when you don?t know its type by the time the bytecode moving it / describing it is executed. This means that the ?general case? here doesn?t exist. In any case, there needs to be reflection over values, but its not clear whether it has to be spelled ?.getClass()?, nor is it clear that what is returned must be a java.lang.Class. (But, because all values can be boxed, we may be able to get away with just returning the type mirror for the box type from .getClass() on values, and calling it good? (almost) everything in reflection is boxed anyway.) The Scala type system has some magic types that help capture these differences (AnyRef and AnyVal are the roots for reference and value types, respectively, and both extend Any). Not clear that we want to copy this, but I think we put things in reasonable context here (especially if we?re willing to tolerate expressions like 1.toString() and such ? then every type has members, some types are value types, some are reference types, no big mystery.) > It's nice that a value type can implement interfaces. But I get confused when I try to think through the implications of this. I get that when referring to it as the interface type, boxing may occur. I'd expect eq/hc/ts on the box to pass through to the value itself (two different boxes of equal values are equal). But... maybe most of my confusion is just stemming from getClass() again. What would it return? Could the returned Class possibly have all the metadata a user might expect? I think not? The story here is actually pretty straightforward. There?s a slight mental gymnastic you have to do to generalize your notion of ?implements interface.? Hitherto, ?C implements I? meant two things together: - C has all the methods that I has; - C is a subtype of I With values in the mix, we have to slightly redefine this in a backward compatible way. For each type T, there exists a reference type Ref[T], where Ref[T] <: Object. For all reference types R, Ref[R] = R. For all value types V, Ref[V] = V?s box type. Now, we redefine ?C implements I? as follows: - C has all the methods that I has; - Ref[C] is a subtype of I Note that this fully describes reality, we just didn?t know that there were types for which Ref[T] was not just T. If I have a Decimal, where Decimal implements Comparable: Decimal d = ?, e = ... if (d.compareTo(e) < 0) { ?. } Since I know the receiver is a Decimal, I?ll generate the bytecode: vload_1 // push d vload_2 // push e invokedirect Decimal.compareTo(Decimal):Z I don?t need to go through the interface, so no boxing. On the other hand, if I convert it: Comparable c = d then I will box d to D?s box (which implements Comparable). Similarly, if I do Object o = d I will also box. So boxing happens when you assign a value either to Object or to an interface. Otherwise I can invoke the methods directly, not unlike how the compiler selects invokevirtual over invokeinterface when it has sharp enough static types. > Re: "Large groups of component values should usually be modeled as plain classes", I'd VERY much like avoid putting that responsibility onto the user if at all possible. Is there a reason why the VM can't simply decide "this is past my threshold, so I'm gonna box it instead of putting it all on the stack" and not make the developer worry about it? The VM will definitely do this, based on some internal, machine-dependent threshold. However, because the semantics of values is different from references, the user should still pick the right tool, or might face performance consequences. If I have an XY point, and I want to vary the X component and not the Y, I might well be happy to do: Point p = ? Point q = p.withX(0) With small values, this will all stay in registers, it?ll be fast, everything will be fine. But if I have boxed my values onto the heap, mutation (unless I can prove non-aliasing) I will have to allocate a new object for the new Point. If I plan to do something ?mutation happy?, I?m probably better off with a real object that supports mutation, even though I can model it as a value. > > Re: "Cloning a value type does not strictly make sense," well, technically when a value includes fields of Cloneable reference types, you might want a deep-clone of that. However I lean toward thinking this is too weird to bother supporting. Users should really be dissuaded from including references to mutable types in their value definition in the first place. Agree on clone() ? best to let it rot, it?s halfway there already. But here?s a value type I could imagine writing all the time: value class Cursor { private T[] array; private int offset; } I can use a Cursor as a garbage-free Iterator. But it refers into mutable objects (here, an array.) (But note that the fields are private.) I think the logical definition of equality here is ?do they point at the same array, and at the same location.? > > Which actually raises another issue. If my value type has no way to include an int[] unmodifiably, that would be extremely sad, right? Array mutability is a persistent thorn in our side. We?re investigating an idea called /frozen arrays/ which will allow an array to be marked readonly (and, if not aliased, efficiently so.) Which we?d love to backlit onto varargs?. > > Going to stop there for now! > Keep ?em coming! From kevinb at google.com Tue Apr 12 20:07:59 2016 From: kevinb at google.com (Kevin Bourrillion) Date: Tue, 12 Apr 2016 13:07:59 -0700 Subject: Value types questions & comments In-Reply-To: <2127B453-99E7-4879-9097-F46BE46DB0C3@oracle.com> References: <2127B453-99E7-4879-9097-F46BE46DB0C3@oracle.com> Message-ID: On Mon, Apr 11, 2016 at 4:13 PM, Brian Goetz wrote: Thanks for pulling these together. Some quick answers inline. Don?t > believe me ? challenge the answers. > > My perspective on this: Since Java has had only two "kinds of types" for > 20+ years -- and since the tension between those two is already a major > source of confusion and bugs for intermediate programmers -- adding a third > kind now is a *Very Big Deal*. It needs to be as simple as possible to > understand this new kind, as nothing but a "natural" hybrid of the other > two. The fewer asterisks we need to put on that simple model, the better. > > Total agreement. We view values as generalizations of primitives, where > primitives are ?values with legacy baggage?, and hopefully as little > baggage as possible. So hopefully in the end we still have two things, > references and values (with some values ?more equal than others? for > historical reasons.) The biggest baggage is probably surrounding the > bespoke box types that we?re probably stuck with. > Ok, there is one difference in how we are conceptualizing this, and the difference does concern me. I'm convinced that ending up at a place where it still feels like there are only two kinds of types is not attainable, and we should not be under the impression that that's what we're going to accomplish here. There WILL be three kinds of types. Developers will have to learn all three. Value types will *not* be just generalized primitives, however, as we both agree, we want to make the two as similar as we can. Just the fact that they are named, user-defined aggregate types is enough to make them different from primitives. The fact that they box differently is of even greater concern, since a large share of existing confusion between the two kinds we already have already centers around boxing. But I think it resembles a primitive in pretty much every other way? > > - No identity (so mutability isn't even a question) > - Can't be null > - Can't have subtype or supertype (excepion: as above, value types can > implement interfaces) > - Does not extend Object, so synchronization/wait/notify not possible > - Not heap-allocated (locals on the stack, fields and array values > inlined) > - Can be boxed to an Object ... although *boxing works differently > > So first off: have I got all that right? > > > Yes. And, some of these asterisks can be erased. For example, I see no > reason why `int` can?t implement Comparable or Serializable (though seeing > 1.compareTo(2) might make some developer?s heads explode, so we might dial > back on how much we close up this gap ? TBD.) As you say, the biggest > asterisk is how we handle boxing. The box types for values will be derived > from the class file and have nice clean properties, whereas the box types > for primitives will likely remain some sort of bespoke bag of smelly stuff. > I would assume we're not actually changing anything about primitive boxing, here...? FWIW, `int` implementing Comparable seems like it would neither help nor harm the situation. We might even take this further ? by actually describing `int` with a > source file (public native class int implements Comparable { ? }) which > might try and smooth out some of the differences, but I wouldn?t hold out a > lot of hope for this being super successful. Mostly this is just moving > the magic around, but its possible this will seem less overall magic to > some. > Yeah, I also don't see that really helping; best to leave primitives completely alone. > Another asterisk: the semantics of operators are predefined on primitives, > and not at all on values. Its possible we can close up this gap too, but > I?ve been deliberately avoiding opening this Pandora?s Can Of Worms, > strictly for scope-management reasons. (But given that one motivating > example for values is alternate numerics, calls for operator overloading > won?t be far behind.) > Cool, keep holding out against that... > Though, bottom line, I think users will be able to recognize that the > primitives are special cases of this new value thingie. They behave so > similarly, they have all the same restrictions, then can be used in all the > same places. > My rewording of this is that "bottom line, we hope users will be able to recognize that the new hybrid type is really very very similar to primitives (and primitives continue to be what they always were), but resemble reference types instead in a few obviously useful ways, and there are really only two or three asterisks they'll have to watch out for." > Conceptual question: is a user-defined value type a "class"? A "yes" and a > "no" answer both seem defensible, and of course we have to choose one and > defend it. And notably, whichever way we decide it, users are going to have > to rethink their preconceived notions of what a "class" is no matter what. > (This gets back to my statement that what we're doing here is a Very Big > Deal. These are bedrock concepts we're tampering with.) > > Yes, there?s gonna be some adjustment of mental models required. > (Additionally, enhanced generics also put a lot of pressure on the deeply > overloaded word ?class?, since we will have multiple runtime > parameterizations of a given generic ?class?.) A class is used to describe > a source file, a binary file, a runtime type, something you load, a type > mirror ?. Early in Java?s lifetime, these entities were in strict 1:1 > correspondence, but no more. > > We have classes at the source level ? this will probably expand to include > value types. We have class files ? this will probably similarly expand. I > don?t think these will be controversial. But I think we need to call the > runtime entities something else ? like TYPE and TYPE MIRROR. The meaning > of ?class? is already too overloaded. Again, though, the game here is to > frame the old reality as a lower-dimensional projection of the new reality, > and this doesn?t seem impossible. > "Is a class from the source/bytecode perspective, isn't a class from the runtime perspective" is worth shooting for, but it seems difficult to even get it down to something that simple. I mean, at runtime this is still a thing that gets loaded and initialized by a class loader, yes? I fear we will never find a clean way to address this. On the one hand, classes are things that have fields and methods, so yes, a > value type is a class. On the other hand, one expects classes to have > "instances"/"objects", pointed to by references, which these don't. Also, > you expect to be able to call getClass() and get something useful back > (that knows what methods are present, what interfaces are implemented) and > that doesn't seem possible in the general case here (but could maybe(?) be > faked in cases where the static type of the value is known to the compiler). > > I think this one isn?t so bad. Java has TYPES today, reference types and > primitive types. Instances of reference types are object references, and > instances of primitive types are values. So the notion of types whose > members are not references is not new. > I think the fact that we are now talking about user-defined named types with fields, methods, constructors, and implemented interfaces makes this something very different. Now, we redefine ?C implements I? as follows: > - C has all the methods that I has; > - Ref[C] is a subtype of I > > Ah, I think this helps some. Maybe. So a layperson explanation is: When writing a value type, you can declare interfaces, but you are actually declaring which interfaces the *boxed* form of the value will implement, not the value itself. But then if all you do myValue.myInterfaceMethod() it will just skip boxing behind the scenes. Something like that? I will also box. So boxing happens when you assign a value either to > Object or to an interface. Otherwise I can invoke the methods directly, > not unlike how the compiler selects invokevirtual over invokeinterface when > it has sharp enough static types. > > Re: "Large groups of component values should usually be modeled as plain > classes", I'd VERY much like avoid putting that responsibility onto the > user if at all possible. Is there a reason why the VM can't simply decide > "this is past my threshold, so I'm gonna box it instead of putting it all > on the stack" and not make the developer worry about it? > > > The VM will definitely do this, based on some internal, machine-dependent > threshold. However, because the semantics of values is different from > references, the user should still pick the right tool, or might face > performance consequences. If I have an XY point, and I want to vary the X > component and not the Y, I might well be happy to do: > > Point p = ? > Point q = p.withX(0) > > With small values, this will all stay in registers, it?ll be fast, > everything will be fine. But if I have boxed my values onto the heap, > mutation (unless I can prove non-aliasing) I will have to allocate a new > object for the new Point. If I plan to do something ?mutation happy?, I?m > probably better off with a real object that supports mutation, even though > I can model it as a value. > Okay, I think I'm good here. I am fine with developers in "mutation happy" situations deciding to avoid value types for "bigger" things (yet give themselves a pass for smaller things). For non-mutating cases that's where I'd hope that a vague size threshold doesn't have to come into it, so if you say the VM will make elective boxing decisions that's great I think. Re: "Cloning a value type does not strictly make sense," well, *technically* > when a value includes fields of Cloneable reference types, you might want a > deep-clone of that. However I lean toward thinking this is too weird to > bother supporting. Users should really be dissuaded from including > references to *mutable* types in their value definition in the first > place. > > > Agree on clone() ? best to let it rot, it?s halfway there already. But > here?s a value type I could imagine writing all the time: > > value class Cursor { > private T[] array; > private int offset; > } > > I can use a Cursor as a garbage-free Iterator. But it refers into mutable > objects (here, an array.) (But note that the fields are private.) I think > the logical definition of equality here is ?do they point at the same > array, and at the same location.? > Using a value type for something that *isn't a value* raises alarm bells for me. At the minimum I would expect this user to have to implement eq/hc by hand, because the default behavior users want 99% of the time is (deep) content-based equality. Gratuitous aside about language syntax even though it is not actually important right now: since we write "enum Foo" not "enum class Foo", I would be quite surprised if we used "value class" here, since between the two only enums are the ones that are real classes in every sense of the word. From brian.goetz at oracle.com Tue Apr 12 20:51:25 2016 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 12 Apr 2016 16:51:25 -0400 Subject: Value types questions & comments In-Reply-To: References: <2127B453-99E7-4879-9097-F46BE46DB0C3@oracle.com> Message-ID: > I would assume we're not actually changing anything about primitive boxing, here...? So, this is rife with tradeoffs?. The legacy boxes are inferior to the new boxes, for a number of reasons. The association between QComplex; and LComplex; is mechanical and simple, whereas the association between int and Integer is ad-hoc and complex. And since the new boxes are new, they can be defined from the get-go to have relaxed identity semantics, enabling optimizations and defending against possible bugs (e.g., they could throw when synchronized upon.) Whereas its valid now to synchronize on a j.l.Integer, and existing code does this (shame, shame), meaning that we can?t necessarily take liberties with the identity of the box for optimization purposes. So it would be great if we could get away with having new mechanically generated primitve box classes, and deprecate Integer, but I have deep doubts we?ll be able to get away with that. So, probably right that we?re stuck with primitive boxing mostly as is. > > We have classes at the source level ? this will probably expand to include value types. We have class files ? this will probably similarly expand. I don?t think these will be controversial. But I think we need to call the runtime entities something else ? like TYPE and TYPE MIRROR. The meaning of ?class? is already too overloaded. Again, though, the game here is to frame the old reality as a lower-dimensional projection of the new reality, and this doesn?t seem impossible. > > "Is a class from the source/bytecode perspective, isn't a class from the runtime perspective" is worth shooting for, but it seems difficult to even get it down to something that simple. I mean, at runtime this is still a thing that gets loaded and initialized by a class loader, yes? I fear we will never find a clean way to address this. One terminology we?ve been experimenting with is having ?class? and ?species? (think back to middle school: kingdom, phylum, class, order, family, genus, species.) List is a class; List and List are species of List. Similarly, the boxed projection and the value projection of Complex are both species of class Complex. Not clear whether this is the right terminology, but it gives users a way to to keep thinking that List is a class, while recognizing that the beasts List and List are at the same time both of class List and also of different species. > I think the fact that we are now talking about user-defined named types with fields, methods, constructors, and implemented interfaces makes this something very different. So, how about: - Java has always had values - Primitives are the BUILT-IN values - Java now gets USER-DEFINED values in addition to USER-DEFINED classes - USER-DEFINED values and classes can have fields, methods, constructors, and implement interfaces Does this stacking make it sound less radical? I agree that there?s a real pedagogical challenge here, but I think it can be made to seem like less of a hurdle. > > Now, we redefine ?C implements I? as follows: > - C has all the methods that I has; > - Ref[C] is a subtype of I > > > Ah, I think this helps some. Maybe. So a layperson explanation is: When writing a value type, you can declare interfaces, but you are actually declaring which interfaces the boxed form of the value will implement, not the value itself. But then if all you do myValue.myInterfaceMethod() it will just skip boxing behind the scenes. Something like that? That?s exactly how it works, yes. And, you could put the ?skip boxing behind the scenes? part in a smaller font, since that?s just an optimization (and, even when you explicitly box and then access a box member, there?s some chance that the box will still be elided due to escape analysis.) > Using a value type for something that isn't a value raises alarm bells for me. At the minimum I would expect this user to have to implement eq/hc by hand, because the default behavior users want 99% of the time is (deep) content-based equality. This may be the reality-distortion field speaking, but in my view a reference *is* a kind of value ? albeit a very special kind. They?re immutable, like other values. Almost all their state is encapsulated (they can be compared by identity, that?s it). They can only be constructed by privileged factories (we call these constructors.) But, ultimately, they behave like values ? they are passed by value, they have no identity of their own. For the Cursor class, the natural definition of equals *is* the componentwise one ? two cursors are the same if they refer into the same source at the same position. But yes, there are cases where we?d want to hand-override equals (which is allowed) to do a deeper comparison (generally when our components are value-like references, like strings or dates or big decimals.) Another place where references to mutable objects will show up in values is if we use values as a substrate for multiple return / tuples. Here, the value is just an ad-hoc container for multiple values ? and object references are entirely reasonable to use in this context. > Gratuitous aside about language syntax even though it is not actually important right now: since we write "enum Foo" not "enum class Foo", I would be quite surprised if we used "value class" here, since between the two only enums are the ones that are real classes in every sense of the word. Sure, this is one of our tools for helping frame the correct mental model. If we decide that the terminology falls out as ?classes are the entities that have fields, methods, and constructors?, then ?value class? reinforces that. But we could go other ways too. From ali.ebrahimi1781 at gmail.com Tue Apr 12 21:48:31 2016 From: ali.ebrahimi1781 at gmail.com (Ali Ebrahimi) Date: Wed, 13 Apr 2016 02:18:31 +0430 Subject: Value types questions & comments In-Reply-To: References: <2127B453-99E7-4879-9097-F46BE46DB0C3@oracle.com> Message-ID: Hi On Wed, Apr 13, 2016 at 1:21 AM, Brian Goetz wrote: > > I would assume we're not actually changing anything about primitive > boxing, here...? > > So, this is rife with tradeoffs?. > > The legacy boxes are inferior to the new boxes, for a number of reasons. > The association between QComplex; and LComplex; is mechanical and simple, > whereas the association between int and Integer is ad-hoc and complex. And > since the new boxes are new, they can be defined from the get-go to have > relaxed identity semantics, enabling optimizations and defending against > possible bugs (e.g., they could throw when synchronized upon.) Whereas its > valid now to synchronize on a j.l.Integer, and existing code does this > (shame, shame), meaning that we can?t necessarily take liberties with the > identity of the box for optimization purposes. > > So it would be great if we could get away with having new mechanically > generated primitve box classes, and deprecate Integer, but I have deep > doubts we?ll be able to get away with that. So, probably right that we?re > stuck with primitive boxing mostly as is. > > > > > We have classes at the source level ? this will probably expand to > include value types. We have class files ? this will probably similarly > expand. I don?t think these will be controversial. But I think we need to > call the runtime entities something else ? like TYPE and TYPE MIRROR. The > meaning of ?class? is already too overloaded. Again, though, the game here > is to frame the old reality as a lower-dimensional projection of the new > reality, and this doesn?t seem impossible. > > > > "Is a class from the source/bytecode perspective, isn't a class from the > runtime perspective" is worth shooting for, but it seems difficult to even > get it down to something that simple. I mean, at runtime this is still a > thing that gets loaded and initialized by a class loader, yes? I fear we > will never find a clean way to address this. > > One terminology we?ve been experimenting with is having ?class? and > ?species? (think back to middle school: kingdom, phylum, class, order, > family, genus, species.) List is a class; List and List are > species of List. Similarly, the boxed projection and the value projection > of Complex are both species of class Complex. > > Not clear whether this is the right terminology, but it gives users a way > to to keep thinking that List is a class, while recognizing that the beasts > List and List are at the same time both of class List and also > of different species. By assuming, we able to reduce specialization to* Constant Pool* specialization, can we say a class List would have multiple *Run Time Constant Pool* in Valhalla JVM. So, can we say List and List are at the same time both of class List and have different *Constant Pool.* This is so ideal! -- Best Regards, Ali Ebrahimi From kevinb at google.com Thu Apr 14 22:18:51 2016 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 14 Apr 2016 15:18:51 -0700 Subject: Value types questions & comments In-Reply-To: References: <2127B453-99E7-4879-9097-F46BE46DB0C3@oracle.com> Message-ID: On Tue, Apr 12, 2016 at 1:51 PM, Brian Goetz wrote: One terminology we?ve been experimenting with is having ?class? and > ?species? (think back to middle school: kingdom, phylum, class, order, > family, genus, species.) List is a class; List and List are > species of List. Similarly, the boxed projection and the value projection > of Complex are both species of class Complex. > > Not clear whether this is the right terminology, but it gives users a way > to to keep thinking that List is a class, while recognizing that the beasts > List and List are at the same time both of class List and also > of different species. > Just saying: the need to introduce "species" is an example of how this proposal makes Java fundamentally more complex and confusing, in a way that will affect everyone. So, how about: > - Java has always had values > - Primitives are the BUILT-IN values > - Java now gets USER-DEFINED values in addition to USER-DEFINED classes > - USER-DEFINED values and classes can have fields, methods, constructors, > and implement interfaces > > Does this stacking make it sound less radical? > I still think this loses compared to simply admitting that there are now three kinds of types. I can tell that we *want* that to not be true, but we will not achieve that. Using a value type for something that *isn't a value* raises alarm bells > for me. At the minimum I would expect this user to have to implement eq/hc > by hand, because the default behavior users want 99% of the time is (deep) > content-based equality. > > This may be the reality-distortion field speaking, but in my view a > reference *is* a kind of value ? albeit a very special kind. They?re > immutable, like other values. Almost all their state is encapsulated (they > can be compared by identity, that?s it). They can only be constructed by > privileged factories (we call these constructors.) But, ultimately, they > behave like values ? they are passed by value, they have no identity of > their own. > Well, FYI, this confuses and worries me. For the Cursor class, the natural definition of equals *is* the > componentwise one ? two cursors are the same if they refer into the same > source at the same position. But yes, there are cases where we?d want to > hand-override equals (which is allowed) to do a deeper comparison > (generally when our components are value-like references, like strings or > dates or big decimals.) > And again, I think this orientation is very wrong. Deep content-based equality is what most users will expect, and will want most of the time. If we don't do that, I think we may be fairly accused of thinking too much about the kind of code *people like us* write, not the kind of code that most Java users write. This actually gets back to a much broader point. This whole project is motivated by performance. However, it would be very sad if it does not solve a second problem at the same time, because it can. Even Java developers who are content with the performance foibles of their existing "value-based classes" are constantly irritated by the burden and bug-prone nature of writing/maintaining such classes. At Google we have resigned ourselves to *code-generating* these things, which we are currently doing 16,500 times and climbing. Few of this resemble your stated use cases (cursors, tuples, numerics, etc.). They are just the everyday data that our applications pass around. imho, it really should be an explicit goal of this project to make hacks like that obsolete. A lot of people would cheer that feature even if these still compiled to regular old classes! Another place where references to mutable objects will show up in values is > if we use values as a substrate for multiple return / tuples. Here, the > value is just an ad-hoc container for multiple values ? and object > references are entirely reasonable to use in this context. > > Gratuitous aside about language syntax even though it is not actually > important right now: since we write "enum Foo" not "enum class Foo", I > would be quite surprised if we used "value class" here, since between the > two only enums are the ones that are real classes in every sense of the > word. > > Sure, this is one of our tools for helping frame the correct mental > model. If we decide that the terminology falls out as ?classes are the > entities that have fields, methods, and constructors?, then ?value class? > reinforces that. But we could go other ways too. > Just saying, it would be odd to use a rationale for "value class" that equally well argues for "enum class". This is a tangent though. -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From rschmitt at pobox.com Fri Apr 15 15:47:09 2016 From: rschmitt at pobox.com (Ryan Schmitt) Date: Fri, 15 Apr 2016 08:47:09 -0700 Subject: Value types questions & comments In-Reply-To: References: <2127B453-99E7-4879-9097-F46BE46DB0C3@oracle.com> Message-ID: >> This may be the reality-distortion field speaking, but in my view a >> reference *is* a kind of value ? albeit a very special kind. They?re >> immutable, like other values. Almost all their state is encapsulated (they >> can be compared by identity, that?s it). They can only be constructed by >> privileged factories (we call these constructors.) But, ultimately, they >> behave like values ? they are passed by value, they have no identity of >> their own. >> > Well, FYI, this confuses and worries me. This isn't a new concept. Anyone who has mutated a `final ArrayList` is familiar with the concept of an immutable reference to mutable data. I think the fundamental problem here, inherited from Java 1.0, is that there is no language-level mechanism that can be used to distinguish immutable objects like String from mutable objects like ArrayList, and so one of two things has to happen in order to have useful default equals() and hashCode() methods on value types: (1) The language designers need to special-case certain legacy reference types like String and Option in order to achieve the "deep equality" behavior you expect to see by default (2) The language designers need to retcon a mechanism for indicating such types From forax at univ-mlv.fr Mon Apr 18 17:33:55 2016 From: forax at univ-mlv.fr (Remi Forax) Date: Mon, 18 Apr 2016 19:33:55 +0200 (CEST) Subject: Value types questions & comments In-Reply-To: References: <2127B453-99E7-4879-9097-F46BE46DB0C3@oracle.com> Message-ID: <884151669.1044003.1461000835108.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Kevin Bourrillion" > ?: "Brian Goetz" > Cc: valhalla-spec-experts at openjdk.java.net > Envoy?: Vendredi 15 Avril 2016 00:18:51 > Objet: Re: Value types questions & comments > > On Tue, Apr 12, 2016 at 1:51 PM, Brian Goetz wrote: > > One terminology we?ve been experimenting with is having ?class? and > > ?species? (think back to middle school: kingdom, phylum, class, order, > > family, genus, species.) List is a class; List and List are > > species of List. Similarly, the boxed projection and the value projection > > of Complex are both species of class Complex. > > > > Not clear whether this is the right terminology, but it gives users a way > > to to keep thinking that List is a class, while recognizing that the beasts > > List and List are at the same time both of class List and also > > of different species. > > > > Just saying: the need to introduce "species" is an example of how this > proposal makes Java fundamentally more complex and confusing, in a way that > will affect everyone. I try to understand your point, you mean that even if specialization implies reification of generics (for primitive and value-types) for the runtime, you don't want that 'detail' to surface in the language/API, i.e. have no way to ask at runtime for the species of an instance. That's an interesting idea. BTW, i find 'species' to be an horrible name because it's always sound like a plural in English, no ? > > > So, how about: > > - Java has always had values > > - Primitives are the BUILT-IN values > > - Java now gets USER-DEFINED values in addition to USER-DEFINED classes > > - USER-DEFINED values and classes can have fields, methods, constructors, > > and implement interfaces > > > > Does this stacking make it sound less radical? > > > > I still think this loses compared to simply admitting that there are now > three kinds of types. I can tell that we *want* that to not be true, but we > will not achieve that. I agree with Kevin about the 3 kinds of types. At least until, there is a way to make primitive behave like value type (have methods, implements interface, etc.). > > Using a value type for something that *isn't a value* raises alarm bells > > for me. At the minimum I would expect this user to have to implement eq/hc > > by hand, because the default behavior users want 99% of the time is (deep) > > content-based equality. > > > > This may be the reality-distortion field speaking, but in my view a > > reference *is* a kind of value ? albeit a very special kind. They?re > > immutable, like other values. Almost all their state is encapsulated (they > > can be compared by identity, that?s it). They can only be constructed by > > privileged factories (we call these constructors.) But, ultimately, they > > behave like values ? they are passed by value, they have no identity of > > their own. > > > > Well, FYI, this confuses and worries me. > > > For the Cursor class, the natural definition of equals *is* the > > componentwise one ? two cursors are the same if they refer into the same > > source at the same position. But yes, there are cases where we?d want to > > hand-override equals (which is allowed) to do a deeper comparison > > (generally when our components are value-like references, like strings or > > dates or big decimals.) > > > > And again, I think this orientation is very wrong. Deep content-based > equality is what most users will expect, and will want most of the time. If > we don't do that, I think we may be fairly accused of thinking too much > about the kind of code *people like us* write, not the kind of code that > most Java users write. > > This actually gets back to a much broader point. This whole project is > motivated by performance. However, it would be very sad if it does not > solve a second problem at the same time, because it can. Even Java > developers who are content with the performance foibles of their existing > "value-based classes" are constantly irritated by the burden and bug-prone > nature of writing/maintaining such classes. At Google we have resigned > ourselves > > to *code-generating* these things, which we are currently doing 16,500 > times and climbing. Few of this resemble your stated use cases (cursors, > tuples, numerics, etc.). They are just the everyday data that our > applications pass around. imho, it really should be an explicit goal of > this project to make hacks like that obsolete. A lot of people would cheer > that feature even if these still compiled to regular old classes! The main issue i see with deep-equals is that you may have cycle because a value type can embody a reference to itself. So for me it's a separate feature. Maybe this feature can be introduced at the same time as value type. > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com > R?mi From brian.goetz at oracle.com Wed Apr 20 21:55:16 2016 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 20 Apr 2016 17:55:16 -0400 Subject: Wildcards and raw types: story so far Message-ID: <5717FAC4.6050108@oracle.com> Valhalla's treatment of wildcards and raw types have taken a somewhat circuitous path. Here's a brief history. This is mostly at the level of compiler translation and classfile representation; the surface syntax and language type system are only briefly touched on. Notation: - R(T) = D indicates that the compiler represents a compile-time type T using the runtime type descriptor D - Class[C] represents a CONSTANT_CLASS constant pool entry (or its descriptor equivalent, "LC;") - Foo -- the language-level type "raw Foo", written this way for clarity. For example, prior to Valhalla, R(Foo) = R(Foo) = R(Foo) = Class[Foo]. Model 1 ------- Model 1 had no support for wildcards at all. The argument was that List and List were totally different types, and mapped to totally different runtime classes. This approach is not absurd on its surface; C++ and C# do this. In this model, the existing wildcard and raw types were frozen at their current meaning: Foo is interpreted as Foo (as it always has been), so we could continue to use wildcards / raw types in combination with erasure, but didn't extend them beyond that point. This approach had one significant advantage: it was possible to build a specialization prototype on the VM we actually had -- which was no small thing. But the disadvantages soon became obvious: - It was a poor match for existing generic code, which is full of sloppy "cast through raw" to get around limitations (sometimes of the code itself, sometimes of the type system.) In particular, attempting to port Collections was pretty much a failure. - It was unpopular. Despite wildcards being one of the Most Hated things about Java, apparently the only thing hated more was threatening to take away the wildcards. - It was confusing. People expect Foo to mean "any instantiation of Foo", but that's not what it meant. For binary compatibility, we were tied to maintaining the same R-mapping for existing types (Foo, Foo, Foo), but we can use a different mapping for new types. For specialized types, we used (for simplicity of prototyping) a name-mangling scheme, where R(Foo) = Class[Foo${0=I}]. Model 2 ------- Model 2 built on the Model 1 translation approach, but added support for some new wildcards. The existing types Foo and Foo remained frozen at their current meaning; a new wildcard Foo was added. This approach simulated the wildcard type Foo with an interface. So for a class Foo, in addition to generating the classfile Foo.class, it also generated an interface Foo$any.class, with R(Foo) = Class[Foo$any]. Wildcards exist as a top type for all possible parameterizations of a generic type, so for a class Foo extends Bar, we need: Foo <: Foo for all T Foo <: Bar The classfiles generated by the compiler reflected these relationships (mostly). Each member of Foo had a corresponding member in Foo$any. For methods, we took the "anyrasure" of the method signature, where: anyrasure(T) = erasure(T) anyrasure(Foo) = Foo anyrasure(T[]) = Arrayish Arrayish is a new type that is injected as a supertype of existing array types. (This was not implemented with Model 2; just a plan on paper at the time.) For fields, Foo$any acquired getters (and for non-final fields, setters) whose signature were similarly transformed through anyrasure. So, for a class: class Foo extends Bar { T t; T a() { ... } Foo b() { ... } T[] c() { ... } } we would generate: interface Foo$any extends Bar$any { Object get$t(); void set$t(Object o); Object a(); Foo$any b(); Arrayish$any c(); } class Foo implements Foo$any { T t; bridge Object get$t() { return maybeBox(t); } bridge set$t(Object o) { t = maybeUnbox(o); } T a() { ... } bridge Object a() { return maybeBox(a()); } Foo b() { ... } bridge Foo$any b() { return (Foo$any) b(); } T[] c() { ... } bridge Arrayish$any c() { return (Arrayish$any) c(); } } The bridge methods implement the corresponding members in Foo$any. This approach worked enough that we could anyfy Collections and Streams acceptably well. In the happy cases, this worked well: - Subtyping relationships in the language are properly mirrored as subclassing relationships in the JVM, so that checkcast, instanceof, reflection, and verification "just work". - Methods without avars in their signature require no boxing when invoked through a wildcard/raw receiver (though still pay the itable overhead.) However, representing wildcards as interfaces had a number of drawbacks when we get to the less happy cases: Impersonation. There's nothing to stop someone from just implementing Foo$any, thereby impersonating some instantiation of Foo, but might not be seen to obey Foo's invariants. (This is even worse if Foo is a final class.) Nonpublic members. Interface methods are public; classes can have methods of any accessibility. (Private members are even worse than protected/public as private members are not inherited; modeling them as interface members could create strange shadowing artifacts. Similarly, virtualizing fields risks certain shadowing anomalies.) Non-any superclasses. In the following case: class Bar { } class Foo extends Bar { } We can lift the members of Bar onto Foo$any, but we won't be able to model the subtyping relationship that Foo <: Bar for all T. This flows into array subtyping; the user will reasonably expect that Foo[] <: Bar[], but we don't have a way to model this. Multiple avars. If a class has multiple avars, then there are theoretically O(2^n) wildcard types and each method could require O(2^n) bridges. This has a big footprint cost, as well as burdening startup with code that will be rarely used. (The Model 2 prototype erased all partial wildcards to a total wildcard, reducing the overhead back to a constant, at the cost of some potentially unnecessary boxing.) Each of these has a potential answer, but the abstraction is starting to get pretty leaky. Model 3 ------- Model 3 is a complete overhaul of the translation story (replacing the ad-hoc and highly complex and brittle Model 1 story with constant pool forms that allow us to express parameterization, including erasure), but only an incremental improvement to the wildcard story. Specifically, it improves member-access; rather than lifting the methods and field accessors onto methods of Foo$any, we instead access them through indy. (We can do this because there is no existing code that uses Foo.) This means the wildcard interfaces are super-simple (no members), the bridge method explosion is eliminated, and some of the interface-imposed restrictions (notably, those having to do with accessibility) can be handled more directly. Still, the resulting story is unsatisfying. We still need additional VM help (impersonation, non-any-generic superclasses, etc.) And we still haven't addressed the mismatch of having separate meanings for Foo and Foo. Not only is this confusing, but when we fold this in with instanceof/cast, we get something really bad... The obvious interpretation of "x instanceof Foo" is "x instanceof Foo". (Because, the next thing the user is going to do, is cast to Foo.) But this means that the following code will break when you anyfy the declaration and don't adjust the implementation: class Box { T t; public boolean equals(Object o) { if (!(o instanceof Box)) return false; Box other = (Box) o; return (t == null && other.t == null) || t.equals(other.t); } } If we add an "any" in front of T and do nothing else, then the equals method will silently fail for (say) Box. This is terrible. So, we are (still) not there yet. Looking Ahead ------------- Our conclusion (which we mostly suspected from the beginning, but the experiments have borne out the details) is: simulating wildcard types with the classfile tools we have today will yield a decidedly dissatisfying simulation. If we want to support wildcards, we'll need some VM help. And we need to make some progress towards bringing non-erased instantiations into the raw type / wildcard fold. This is not unreasonable; if we're adding parametric polymorphism to the VM, and we want a type system that supports wildcard/raw types, the VM should understand this type too -- all the defects above come from trying to "fake out" the VM. The success of Model 3 was about not faking out the VM, but providing a means of discussing parameterized types within the VM type system -- success for wildcards will come from this vector as well. We have some new ideas. Stay tuned.