From brian.goetz at oracle.com Wed Jun 1 18:56:28 2016 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 1 Jun 2016 14:56:28 -0400 Subject: species static prototype In-Reply-To: <20160601185214.A5BB813603C@b03ledav002.gho.boulder.ibm.com> References: <5748B465.3060308@oracle.com> <20160601185214.A5BB813603C@b03ledav002.gho.boulder.ibm.com> Message-ID: <6874e5dd-2966-33ce-caf5-4bb8afd76234@oracle.com> On 6/1/2016 2:52 PM, Bjorn B Vardal wrote: > Will the users be able to write their own ? > > * class Foo { > o __species { > + ... > } > } > I would assume so; even if we don't support a __species { } block, the user can still contribute to the species initialization with field initializers: __species int x = 3; So I see no reason to not adopt symmetry with static here. > Your access bridge solution using species methods looks fine, but are > we not solving that with nest mates? We now have two credible solutions. Before we had species-static, nestmates were basically a forced move; now its an optional move. > I'm also wondering whether the following are typos, or if I > misunderstood them: > > * TestResolution.m_I() was not meant to be decorated with '__species' > * TestForwardRef2.s1_S and TestForwardRef2.s2_SS don't have the > correct modifiers, or should not be error cases. > * TestTypeVar.m_I() was not meant to be decorated with '__species' > I'll let Maurizio answer these. From maurizio.cimadamore at oracle.com Wed Jun 1 20:19:22 2016 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Wed, 1 Jun 2016 21:19:22 +0100 Subject: species static prototype In-Reply-To: <20160601185214.A5BB813603C@b03ledav002.gho.boulder.ibm.com> References: <5748B465.3060308@oracle.com> <20160601185214.A5BB813603C@b03ledav002.gho.boulder.ibm.com> Message-ID: <574F434A.4020303@oracle.com> On 01/06/16 19:52, Bjorn B Vardal wrote: > Will the users be able to write their own ? > > * class Foo { > o __species { > + ... > } > } > > Hi Bjorn, Yep - that is supported. > Your access bridge solution using species methods looks fine, but are > we not solving that with nest mates? > I'm also wondering whether the following are typos, or if I > misunderstood them: > > * TestResolution.m_I() was not meant to be decorated with '__species' > Right - that's a type, the 'species' modifier was meant to be omitted (i.e. it's an instance method) > > * TestForwardRef2.s1_S and TestForwardRef2.s2_SS don't have the > correct modifiers, or should not be error cases. > Yeah - missing static and species there - in general members with _S are meant to be static, those with _SS are meant to be 'species' > > * TestTypeVar.m_I() was not meant to be decorated with '__species' > Yep - same as above Sorry for the typos! Maurizio > -- > Bj?rn V?rdal > IBM Runtimes > > ----- Original message ----- > From: Maurizio Cimadamore > Sent by: "valhalla-spec-experts" > > To: valhalla-spec-experts at openjdk.java.net > Cc: > Subject: species static prototype > Date: Fri, May 27, 2016 4:56 PM > > Hi, > over the last few days I've been busy putting together a prototype > [1, 2] of javac/runtime support for species static. I guess it > could be considered an prototype implementation of the approach > that Bjorn has described as "Repurpose existing statics" [4] in > his nice writeup. Here's what I have learned during the experience. > > Parser > ==== > > The prototype uses a no fuss approach where '__species' is the > modifier to denote species static stuff (of course a better syntax > will have to be picked at some point, but that's not the goal of > the current exercise). This means you can write: > > class Foo { > String i; //instance field > static String s; //static field > __species String ss; //species static field > } > > This is obviously good enough for the time being. > > A complication with parsing occurs when accessing species members; > in fact, species members can be accessed via their fully qualified > type (including all required type-arguments, if necessary). > > Foo.ss; > Foo.ss; > > The above are all valid species access expression. Now, adding > this kind of support in the parser is always tricky - as we have > to battle with ambiguities which might pop up. Luckily, this > pattern is similar enough to the one we use for method references > - i.e. : > > Foo::ss > > Which the compiler already had to special case; so I ended up > slightly generalizing what we did in JDK 8 method reference > parsing, and I got something working reasonably quick. But this > could be an area where coming up with a clean spec might be tricky > (as the impl uses abundant lookahead to disambiguate this one). > > Resolution > ====== > > The basic idea is to divide the world in three static levels, > whose properties are summarized in the table below: > enclosing type enclosing instance > instance yes yes > species yes no > static no no > > > So, in terms of who can access what, it follows that if we > consider 'instance' to be the highest static level and 'static' to > be the lowest, then it's ok for a member with static level S1 to > access another member of static level S2 provided that S1 >= S2. > Or, with a table: > from/to instance species static > instance yes yes yes > species no yes yes > static no no yes > > > > So, let's look at a concrete example: > > class TestResolution { > static void m_S() { > m_S(); //ok > m_SS(); //error > m_I(); //error > } > > __species void m_SS() { > m_S(); //ok > m_SS(); //ok > m_I(); //error > } > > __species void m_I() { > m_S(); //ok > m_SS(); //ok > m_I(); //ok > } > } > > A crucial property, of course, is that species static members can > reference to any type vars in the enclosing context: > > class TestTypeVar { > static void m_S() { > X x; //error > } > > __species void m_SS() { > X x; //ok > } > __species void m_I() { > X x; //ok > } > } > > Nesting > ===== > > Another concept that needs generalization is that of allowed > nesting; consider the following program: > > class TestNesting1 { > class MemberInner { > static String s_S; //error > String s_I; //ok > } > > static class StaticInner { > static String s_S; //ok > String s_I; //ok > } > } > > That is, the compiler will only allow you to declare static > members in toplevel classes or in static nested classes (which, > after all, act as toplevel classes). Now that we are adding a new > static level to the picture, how are the nesting rules affected? > > Looking at the table above, if we consider 'instance' to be the > highest static level and 'static' to be the lowest, then it's ok > for a member with static level S1 to declare a member of static > level S2 provided that S1 <= S2. Again, we can look at this in a > tabular fashion: > declaring/declared instance species static > instance yes no no > species yes yes no > static yes yes yes > > > This also seems like a nice generalization of the current rules. > The rationale behind these rules is to basically, guarantee some > invariants during member lookup; let's say that we are in a nested > class with static level S1 - then, by the rule above, it follows > that any member nested in this class will be able to access > another member with static level S1 declared in this class or in > any lexically enclosing class. > > A full example of nesting rules is given below: > > class TestNesting2 { > class MemberInner { > static String s_S; //error > __species String s_SS; //error > String s_I; //ok > } > > > __species class StaticInner { > static String s_S; //error > __species String s_SS; //ok > String s_I; //ok > } > > static class StaticInner { > static String s_S; //ok > __species String s_SS; //ok > String s_I; //ok > } > } > > Unchecked access > =========== > > Because of an unfortunate interplay between species and erasure, > code using species members is potentially unsound (the example > below is a variation of an example first discovered by Peter's > example [3] in this very mailing list): > > public class Foo { > __species T cache; > } > > > Foo.cache = "Hello"; > Integer i = Foo.cache; //whoops > > To prevent cases like these, the compiler implements a check which > looks at the qualifier of a species access; if such qualifier > (either explicit, or implicit) cannot be proven to be reifiable, > an unchecked warning is issued. > > Note that it is possible to restrict such warnings only to cases > where the signature of the accessed species static member changes > under erasure. E.g. in the above example, accessing 'cache' is > unchecked, because the type of 'cache' contains type-variables; > but if another species static field was accessed whose type did > not depend on type-variables, then the access should be considered > sound. > > > Species initializers > =========== > > In our model we have three static levels - but we have > initialization artifacts for only two of those; we need to fix that: > instance > species > static > > > > That is, a new method is added to a class containing one > or more species variables with an initializer. This method is used > to hoist the initialization code for all the species variables. > > Forward references > ============ > > Rules for detecting forward references have to be extended > accordingly. A forward reference occurs whenever there's an > attempt to reference a variable from a position P, where the > variable declaration occurs in a position P' > P. Currently, the > rules for forward references allow an instance variable to > forward-reference a static variable - as shown below: > > class TestForwardRef { > String s = s_S; > static String s_S = "Hello!"; > } > > The rationale behind this is that, by the time we see the instance > initializer for 's' we would have already executed the code for > initializing 's_S' (as initialization will occur in different > methods, and respectively, see section above). > With the new static level, the forward reference rules have to be > redefined according to the table below: > > from/to instance species static > instance forward ref ok ok > species illegal forward ref ok > static illegal illegal forward ref > > > In other words, it's ok to forward reference a variable whose > static level is lower than that available where the reference > occurs. An example is given below: > > class TestForwardRef2 { > String s1_I = s_S; //ok > String s2_I = s_SS; //ok > > String s1_S = s_S; //error! > > String s1_SS = s_S; //ok > String s2_SS = s_SS; //error! > > static String s_S = "Hello!"; > __species String s_SS = "Hello Species!"; > } > > This is an extension of the above principle: since instance > variables are initialized in , they can reference variables > initialized in or . If a variable is initialized > in it can similarly safely reference a variable > initialized in . Another way to think of this is that a > forward reference error only occurs if the static level of the > referenced symbol is the same as the static level where the > reference occurs. All other cases are either illegal (i.e. because > it's an attempt to go from a lower static level to an higher one) > or valid (because it can be guaranteed that the code initializing > the referenced variable has already been executed). > > Code generation > ========== > > Javac currently emits invokestatic/getstatic/putstatic for both > legacy static and species static access. javac will use the > 'owner' field of a CONSTANT_MethodRef, CONSTANT_FieldRef constants > to point to the sharp type of the species access (through a > constant pool type entry). Static access will always see an erased > owner. > > Consider this example: > > class TestGen { > __species void m_SS() { } > static void m_S() { } > > public static void main(String args) { > TestGen.m_SS(); > TestGen.m_SS(); > TestGen.m_S(); > TestGen.m_S(); > } > } > > The generated code in the 'main' method is reported below: > > 0: invokestatic #11 // Method TestGen<_>.m_SS:()V > 3: invokestatic #15 // Method TestGen.m_SS:()V > 6: invokestatic #18 // Method TestGen<_>.m_S:()V > 9: invokestatic #18 // Method TestGen<_>.m_S:()V > > As it can be seen, species static access can cause a sharper type > to end up in the 'owner' field of the member reference info; on > the other hand, a static access always lead to an erased 'owner'. > > Another detail worth mentioning is how __species is represented in > the bytecode. Given the current lack of flags bit I've opted to > use the last remaining bit 0x8000 - this is in fact the last > unused bit that can be shared across class, field and method > descriptors. Actually, this bit has already been used to encode > the ACC_MANDATED flag in the MethodParameters attribute (as of JDK > 8) - but since there's no other usage of that flag configuration > outside MethodParameters it would seem safe to recycle it. Of > course more compact approaches are also possible, but they would > lead to different flag configurations for species static fields, > methods and classes. > > Specialization > ========= > > Specializing species access is relatively straightforward: > > * both instance and species static members are copied in the > specialization > * static members are only copied in the erased specialization (and > skipped otherwise) > * ACC_SPECIES classes become regular classes when specialized > * ACC_SPECIES methods/fields become static methods/fields in the > specialization > * becomes the new in the specialization (and is > omitted if the specialization is the erased specialization) > > The last bullet requires some extra care when handling the > 'erased' specialization; consider the following example: > > class TestSpec { > static String s_S = "HelloStatic"; > __species String s_SS = "HelloSpecies"; > } > > This class will end up with the following two synthetic methods: > > static void (); > descriptor: ()V > flags: ACC_STATIC > Code: > stack=1, locals=0, args_size=0 > 0: ldc #8 // String HelloStatic > 2: putstatic #14 // Field > s_S:Ljava/lang/String; > 5: ldc #16 // String HelloSpecies > 7: putstatic #19 // Field > s_SS:Ljava/lang/String; > 10: return > > species void (); > descriptor: ()V > flags: ACC_SPECIES > Code: > stack=1, locals=1, args_size=1 > 0: ldc #16 // String HelloSpecies > 2: putstatic #19 // Field > s_SS:Ljava/lang/String; > 5: return > > As it can be seen, the method contains initialization > code for both static and species static fields! To understand why > this is so, let's consider how the specialized bits might be > derived from the template class following the rules above. Let's > consider a specialization like TestSpec: in this case, we > need to drop (it's a static method and TestSpec is > not an erased specialization), and we also need to rename > as in the new specialization. All is fine - the > specialization will contain the relevant code required to > initialize its species static fields. > > Let's now turn to the erased specialization TestSpec<_> - this > specialization receives both static and species static members. > Now, if we were to follow the same rules for initializers, we'd > end up with two different initializer methods - both and > . We could ask the specializer to merge them somehow, but > that would be tricky and expensive. Instead, we simply (i) drop > from the erased specialization and (ii) retain . > Of course this means that must also contain > initialization code for species static members. > > Bonus point: Generic methods > =================== > > As pointed out by Brian, if we have species static classes we can > translate static and species static specializable generic methods > quite effectively. Consider this example: > > class TestGenMethods { > static void m(X x) { ... } > > void test() { > m(42); > } > } > > without species static, this would translate to: > > class TestGenMethods { > static class TestGenMethods$m { > void m(X z) { ... } > } > > /* bridge */ void m(Object o) { new TestGenMethods$m().m(o); } > > void test() { > new TestGenMethod$m().m(42); // this is really done > inside the BSM > } > } > > Note how the bridge (called by legacy code) will need to spin a > new instance of the synthetic class and then call a method on it. > The bootstrap used to dispatch static generic specializable calls > also needs to do a very similar operation. But what if we turned > the translated generic method into a species static method? > > class TestGenMethods { > class TestGenMethods$m { > __species void m(X z) { ... } > } > > /* bridge */ void m(Object o) { TestGenMethods$m.m(o); } > > void test() { > TestGenMethod$m.m(42); // this is really done inside > the BSM > } > } > > With species static, we can now access the method w/o needing any > extra instance. This leads to simplification in both the bridging > strategy and the bootstrap implementation. We can apply a similar > simplification for dispatch of specializable species static calls > - the only difference is that the synthetic holder class has also > to be marked as species static (since it could access type-vars > from the enclosing context). > > Bonus point: Access bridges > ================= > > Access bridges are a constant pain in the current translation > strategy; such bridges are generated by the compiler to grant > access to otherwise inaccessible members. Example: > > class Outer { > private void m() { } > > class Inner { > void test() { > m(); > } > } > } > > This code will be translated as follows: > > class Outer { > > /* synthetic */ static access$m(Outer o) { o.m(); } > > private void m() { } > > class Inner { > /*synthetic*/ Outer this$0; > > void test() { > access$m(this$0); > } > } > } > > That is, access to private members is translated with an access to > an accessor bridge, which then performs access from the right > location. Note that the accessor bridge is static (because > otherwise it would be possible to maliciously override it to grant > access to otherwise inaccessible members); since it's static, > usual rules apply, so it cannot refer to type-variables, it cannot > be specialized, etc. This means that there are cases with > specialization where existing access bridge are not enough to > guarantee access - if the access happens to cross specialization > boundaries (i.e. accessing m() from an Outer.Inner). > > Again, species static comes to the rescue: > > class Outer { > > /* synthetic */ __species access$m(Outer o) { o.m(); } > > private void m() { } > > class Inner { > /*synthetic*/ Outer this$0; > > void test() { > Outer.access$m(this$0); > } > } > } > > Since the accessor bridge is now species static, it means it can > now mention type variables (such as X); and it also means that > when the bridge is accessed (from Inner), the qualifier type > (Outer) is guaranteed to remain sharp from the source code to > the bytecode - which means that when this code will get > specialized, all references to X will be dealt with accordingly > (and the right accessor bridge will be accessed). > > Parting thoughts > ========== > > On many levels, species statics seem to be the missing ingredient > for implementing many of the tricks of our translation strategy, > as well as to make it easier to express common idioms (i.e. > type-dependent caches) in user code. > > Adding support for species static has proven to be harder than > originally thought. This is mainly because the current world is > split in two static levels: static and instance. When something is > not static it's implicitly assumed to be instance, and viceversa. > If we add a third static level to the picture, a lot of the > existing code just doesn't work anymore, or has to be validated to > check as to whether 'static' means 'legacy static' or 'species > static' (or both). > > I started the implementation by treating static, species static > and instance as completely separate static levels - with different > internal flags, etc. but I soon realized that, while clean, this > approach was invalidating too much of the existing implementation. > More specifically, all the code snippets checking for static would > now have been updated to check for static OR species static > (overriding vs. hiding, access to 'this', access to 'super', > generic bridges, ...). On the other hand, the places where the > semantics of species static vs. static was different were quite > limited: > > * membership/type substitution: a species static behaves like an > instance member; the type variables of the owner are replaced into > the member signature. > * resolution: we need to implement the correct access rules as > shown in the tables above. > * code generation: an invokestatic involving a species static gets > a sharp qualifier type > > This quickly led to the realization that it was instead easier to > just treat 'species static' as a special case of 'static' - and > then to add finer grained logic whenever we really needed the > distinction. This led to a considerably easier patch, and I think > that a similar consideration will hold for the JLS. > > [1] - > http://hg.openjdk.java.net/valhalla/valhalla/langtools/rev/6949c3d06e8f > [2] - > http://hg.openjdk.java.net/valhalla/valhalla/jdk/rev/836efde938c1 > [3] - > http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-February/000096.html > [4] - > http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-May/000147.html > > Maurizio > > > From brian.goetz at oracle.com Wed Jun 1 21:44:57 2016 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 1 Jun 2016 17:44:57 -0400 Subject: Compatibility goals In-Reply-To: <20160601205928.3756D124044@b01ledav002.gho.pok.ibm.com> References: <07de2a60-6b4e-b597-1fca-2f3af30fc7f0@oracle.com> <20160601205928.3756D124044@b01ledav002.gho.pok.ibm.com> Message-ID: <90608fa1-89e6-d907-9c97-83b3a070e394@oracle.com> > /> Alpha-renaming a type variable (to a non-shadowed name) should be > binary and source compatible./ > The name is only used internally in the generic class in the > GenericClass attribute, and recompiling with different names will > therefore not affect users of the generic class. Right. (Like method parameter names, the name is not part of the API, and exist only to improve the readability of the implementation code.) > /> Reordering or removing type variables is not compatible. (These > first two together match the story for method argument lists; you can > rename method arguments, but not reorder or remove them.)/ > Other classes will refer to the generic class using ParamTypes in > their CPs. ParamType provides the type parameters in the order that > the generic type specified at compilation time. Reordering and > recompiling will therefore invalidate all ParamTypes referring to the > modified generic type. Right. Once you've published Foo, clients or subclasses may have Foo in their source files and ParamType[Foo, A, B] in their binaries, which they expect to retain their meaning. Dropping or reordering parameters would render these client / subclasses broken. > /> Anyfying an existing erased type variable should be binary and > source compatible./ > All ParamTypes referring to a ref-generic type variable will be > providing a reference type (erased) as the type parameter (or no > parameters?). As references are a subset of any, anyfying the type > variable does not invalidate existing ParamTypes. > I have one question here: What happens if I refer to Foo (not any > T) using ParamType[Foo, String]? Is it valid because String is a > reference type, or invalid because Foo is not specializable? There are two migration situations here: - Migrating a totally erased generic class to any-generic (Foo to Foo) - Migrating a partially anyfied class (Foo to Foo) For the former, there will be no ParamType entries, all references to Foo will be LFoo; / Constant_Class[Foo]. For the latter, there will be ParamType entries that specify 'erased' in the appropriate position. In either case, these remain valid parameterizations after the migration. To your question: I would say this is invalid, because Foo is not specializable / lacks a GenericClass attribute. > /> Adding a new type variableat the endof the argument list should be > binary compatible (though not source compatible.) Adding a new type > variable other than at the end is not compatible./ > The last point already said that we have to support missing type > parameters, and this point is really just and extension of that. If a > type parameter is not provided, the type variable is assumed to be erased. Right. Also, this one interacts with the story for inner classes, and influences the decision about how to represent enclosing class type parameters in ParamType (do we have a chain of ParamType, as proposed by the M3 doc, or do we lift all type parameters to the innermost class?) The chain approach seems to reduce the impact of generifying an enclosing class (as per the next item.) > /> Generifying an enclosing scope (evolving|Outer.Inner|to > |Outer.Inner|) should be binary compatible./ > At first glance, this might look like anyfying an existing erased type > or generifying a non-generic class. However, the complicating factor > is that the added type variable will also be added to the scope of the > enclosed class, and the question becomes whether we can handle this. > An enclosed class must be compiled with its enclosing class, so the > GenericClass attribute will be updated correctly. The type parameters > to Inner and Outer are provided separately, and any missing type > parameter will still be treated as erased. Right. Also, with the chain-of-enclosing-descriptors approach, it is fairly easy for a ParamType[parent=Outer, Inner, U] to recover from new parameters being added to Outer, whereas if we simply lifted the Outer parameters onto Inner, now we'd have a difficult time to reconstruct the actual parameterization. > /> Changing type variable bounds is not binary compatible./ > Type variables are erased to their bound, i.e. not necessarily > j.l.Object. Any descriptor that contained a type variable will > therefore contain the bound after compilation. Changing the bound > invalidates the descriptors in existing method refs, and is therefore > binary incompatible. Also, this is not a new constraint, as it already > applies to erased generics. Right. From forax at univ-mlv.fr Thu Jun 2 10:21:30 2016 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 2 Jun 2016 12:21:30 +0200 (CEST) Subject: Wildcards -- Models 4 and 5 In-Reply-To: <2421796e-cb8e-6241-9f80-5bb761673f2e@oracle.com> References: <2421796e-cb8e-6241-9f80-5bb761673f2e@oracle.com> Message-ID: <602551257.1734108.1464862890869.JavaMail.zimbra@u-pem.fr> There is another model (model 6), in order to support species, we need at runtime to have a way to represent them, so a static species can be stored in a location which is not along with the instance fields nor along with the static fields. Actually for the VM, an instance is represented like this: header ----> class ------ vtable1 field1 vtable2 field2 ... ... static fields It can be a little different if the .class object and the class of an instance are two different objects (for the JIT it's better to have the class to be a constant pointer but java.lang.Class is a Java object that may need to be moved in memory). Now, we want something like this header ----> species ----------> class ------ vtable1 field1 vtable2 field2 ... ... species fields so at runtime the header of an object is not a class anymore but a species (the runtime representation of a species). This allows the VM to answer to things like: obj instanceof ArrayList and everybody cheers ... No, because with , obj instanceof ArrayList may work or not. In fact there is little reason to allow a user to see species at runtime*, - it makes ArrayList reified sometimes, so with a Map> map, sometimes map.get("foo") will throw an exception, sometimes it will not (because the erasure at compile time) the VM insert a cast to List in front of the call to map.get(), depending if E is a String or an int, the behavior will be different. - The erasure concept (the runtime part) is entrenched in the mind of million developers, changing that is a recipe for disaster. so IMO even if the VM reify species at runtime, a developer should not be able to see that, it's better to lure him to think that the erasure at runtime is done the same way in Java 10 that it is in Java 5. This model has a cost at runtime, a checkcast/instanceof/arraystore to ArrayList may be polymorphic while it was monomorphic before**, or doing a dynamic typecheck requires a double indirection if the class is anyfied. An for a wildcard, ArrayList is mapped to ArrayList (from the runtime class point of view) as usual, so no big deal. It's IMO a far better model just because from the user point of view, nothing changed. regards, R?mi * you can access to a species field without seeing the species by itself. ** let suppose that ArrayList is effectively final here. ----- Mail original ----- > De: "Brian Goetz" > ?: valhalla-spec-experts at openjdk.java.net > Envoy?: Vendredi 20 Mai 2016 20:33:00 > Objet: Wildcards -- Models 4 and 5 > > In the 4/20 mail ?Wildcards and raw types: story so far?, we outlined > our explorations for fitting wildcard types into the first several > prototypes. The summary was: > > * > > Model 1: no wildcards at all > > * > > Model 2: A pale implementation of wildcards, with lots of problems > that stem from trying to fake wildcards via interfaces > > * > > Model 3: basically the same as Model 2, except members are accessed > via indy (which mitigated some of the problems but not all) > > The conclusion was: compiler-driven translation tricks are not going > to cut it (as we suspected all along). We?ve since explored two > other models (call them 4 and 5) which explore a range of options > for VM support for wildcards. The below is a preliminary analysis of > these options. > > > Reflection, classes, and runtime types > > While it may not be immediately obvious that this subject is deeply > connected to reflection, consider a typical implementation of |equals()|: > > |class Box { T t; public boolean equals(Object o) { if (!(o instanceof > Box)) return false; Box other = (Box) o; return (t == null && other.t == > null) || t.equals(other.t); } } | > > Some implementations use raw types (|Box|) for the |instanceof| and cast > target; others use wildcards (|Box|). While the latter is > recommended, both are widely used in circulation. In any case, as > observed in the last mail, were we to interpret |Box| or |Box| as > only including erased boxes, then this code would silently break. > > The term ?class? is horribly overloaded, used to describe the source > class (|class Foo { ... }|), the binary classfile, the runtime type > derived from the classfile, and the reflective mirror for that runtime > type. In the past these existed in 1:1 correspondence, but no more ? a > single source class now gives rise to a number of runtime types. Having > poor terminology causes confusion, so let?s refine these terms: > > * /class/ refers to a source-level class declaration > * /classfile/ refers to the binary classfile > * /template/ refers to the runtime representation of a classfile > * /runtime type/ refers to a primitive, value, class, or interface > type managed by the VM > > So historically, all objects had a class, which equally described the > source class, the classfile, and the runtime type. Going forward, the > class and the runtime type of an object are distinct concepts. So an > |ArrayList| has a /class/ of |ArrayList|, but a /runtime type/ of > |ArrayList|. Our code name for runtime type is /crass/ (obviously a > better name is needed, but we?ll paint that bikeshed later.) > > This allows us to untangle a question that?s been bugging us: what > should |Object.getClass()| return on an |ArrayList|? If we return > |ArrayList|, then we can?t distinguish between an erased and a > specialized object (bad); if we return |ArrayList|, then existing > code that depends on |(x.getClass() == List.class)| may break (bad). > > The answer is, of course, that there are two questions the user can ask > an object: what is your /class/, and what is your /crass/, and they need > to be detangled. The existing method |getClass()| will continue to > return the class mirror; a new method (|getCrass()|) will return a > runtime type mirror of some form for the runtime type. Similarly, a > class literal will evaluate to a class, and some other form of literal / > reflective lookup will be needed for crass. > > The reflective features built into the language (|instanceof|, casting, > class literals, |getClass()|) are mostly tilted towards classes, not > types. (Some exceptions: you can use a wildcard type in an |instanceof|, > and you can do unchecked static casts to generic types, which are > erased.) We need to extend these to deal in both classes /and/ crasses. > For |getClass()| and literals, there?s an obvious path: have two forms. > For casting, we are mostly there (except for the treatment of raw types > for any-generic classes ? which we need to work out separately.) For > instanceof, it seems a forced move that |instanceof Foo| is interpreted > as ?an instance of any runtime type projected from class Foo?, but we > also would want to apply it to any reifiable type as well. > > > Wildcard types > > In Model 3, we express a parameterized type with a |ParamType| constant, > which names a template class and a set of type parameters, which include > both valid runtime types as well as the special type parameter token > |erased|. One natural way to express a wildcard type is to introduce a > new special type parameter token, |wild|, so we?d translate |Foo| > as |ParamType[Foo,wild]|. > > In order for wildcard types to work seamlessly, the minimum > functionality we?d need from the VM is to manage subtyping (which is > used by the VM for |instanceof|, |checkcast|, verification, array store > checks, and array covariance.) The wildcard must be seen to be a ?top? > type for all parameterizations: > > |ParamType[Foo,T] <: ParamType[Foo,wild] // for all valid T | > > And, wildcard parameterizations must be seen to be subtypes of of their > wildcard-parameterized supertypes. If we have > > |class Foo extends Bar implements I { ... } class Moo > extends Goo { } | > > then we expect > > |ParamType[Foo,wild] <: ParamType[Bar,wild] ParamType[Foo,wild] <: > ParamType[I,wild] ParamType[Moo,wild] <: Goo | > > Wildcards must also support method invocation and field access to the > members that are in the intersection of the members of all > parameterizations (these are the total members (those not restricted to > particular instantiations) whose member descriptors do not contain any > type variables.) We can continue to implement member access via > invokedynamic (as we do in Model 3, or alternately, the VM can support > |invoke*| bytecodes on wildcard receivers.) > > We can apply these wildcard behaviors to any of the wildcard models > (i.e., retrofit them onto Model 2/3.) > > > Partial wildcards > > With multiple type variables, the rules for wildcards generalize > cleanly, but the number of wildcard types that are a supertype of any > given parameterized type grows exponentially in the number of type > variables. We are considering adopting the simplification of erasing all > partial wildcards in the source type system to a total wildcard in the > runtime type system (the costs of this are: some additional boxing on > access paths where boxing might not be necessary, and unchecked casts > when casting a broader wildcard to a narrower one.) > > > Model 4 > > A constraint we are under is: existing binaries translate the types > |Foo| (raw type), |Foo| (erased parameterization), and |Foo| > all as |LFoo;| (or its equivalent, |CONSTANT_Class[Foo]|); since > existing code treats this as meaning an erased class, the natural path > would be to continue to interpret |LFoo;| as an erased class. > > Model 4 asks the question: ?can we reinterpret legacy |LFoo;| in > classfiles, and |Foo| in source files, as |any Foo|? (restoring the > interpretation of |Foo| to be more in line with user intuition.) > > Not surprisingly, the cost of reinterpreting the binaries is extensive. > Many bytecodes would have to be reinterpreted, including |new|, > |{get,put}field|, |invoke*|, to make up the difference between the > legacy meaning of these constructs and the desired new meaning. Worse, > while boxing provides us a means to have a common representation of > signatures involving |T| (T?s bound), in order to get to a common > representation for signatures involving |T[]|, we?d need to either (a) > make |int[]| a subtype of |Object[]| or (b) have a ?boxing conversion? > from |int[]| to |Object[]| (which would be a proxy box; the data would > still live in the original |int[]|.) Both are intrusive into the > |aaload| and |aastore| bytecodes and still are not anomaly-free. > > So, overall, while this seems possible, the implementation cost is very > high, all of which is for the sake of migration, which will remain as > legacy constraints long after the old code has been migrated. > > > Model 5 > > Model 5 asks the simpler question: can we continue to interpret |LFoo;| > as erased in legacy classfiles, but upgrade to treating |Foo| as is > expected in source code? This entails changing the compilation > translation of |Foo| from ?erased foo? to |ParamType[Foo,wild]|. > > This is far less intrusive into the bytecode behavior ? legacy code > would continue to mean what it did at compile time. It does require some > migration support for handling the fact that field and method > descriptors have changed (but this is a problem we?re already working on > for managing the migration of reference classes to value classes.) There > are also some possible source incompatibilities in the face of separate > compilation (to be quantified separately). > > Model 5 allows users to keep their |Foo| and have it mean what they > think it should mean. So we don?t need to introduce a confusing > |Foo| wildcard, but we will need a way of saying ?erased Foo?, > which might be |Foo| or might be something more > compact like |Foo|. > > > Comparison > > Comparing the three models for wildcards (2, 4, 5): > > * Model 2 defines the source construct |Foo| to permanently mean > |Foo|, even when |Foo| is anyfied, and introduces a new > wildcard |Foo| ? but maintains source and binary compatibility. > * Model 4 let?s us keep |Foo|, and retroactively redefines bytecode > behavior ? so an old binary can still interoperate with a reified > generic instance, and will think a |Foo| is really a > |Foo|. > * Model 5 redefines the /source/ meaning of |Foo| to be what users > expect, but because we don?t reinterpret old binaries, allows some > source incompatibility during migration. > > I think this pretty much explores the solution space. Our choices are: > break the user model of what |Foo| means, take a probably prohibitive > hit to distort the VM to apply new semantics to old bytecode, or accept > some limited source incompatibility under separate compilation but > rescue the source form that users want. > > In my opinion, the Model 5 direction offers the best balance of costs > and benefits ? while there is some short-term migration pain (in > relatively limited cases, and can be mitigated with compiler help), in > the long run, it gets us to the world we want without permanently > burdening either the language (creating confusion between |Foo| and > |Foo|) or the VM implementation. > > In all these cases, we still haven?t defined the semantics of /raw > types/. Raw types existed for migration between pre-generic and generic > code; we still have that migration problem, plus the new migration > problems of generic to any-generic, and of pre-generic to any-generic. > So in any case, we?re going to need to define suitable semantics for raw > types corresponding to any-generic classes. > > ? > From brian.goetz at oracle.com Thu Jun 2 22:24:21 2016 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 2 Jun 2016 18:24:21 -0400 Subject: Species-static members vs singletons In-Reply-To: <20160602204903.738387805C@b03ledav004.gho.boulder.ibm.com> References: <4219406f-d842-97f4-7206-3a91ffe1e75c@oracle.com> <20160602204903.738387805C@b03ledav004.gho.boulder.ibm.com> Message-ID: <98756340-778b-46c6-20c0-e87383b5c109@oracle.com> > Brian,**I see that you mentioned that generic instance methods are > messier. > > The translation for generic instance methods is still somewhat > messier (will post separately), but still less messy than if we also > had to manage / cache a receiver. > Is this issue part of that mess? Do you have a solution, or is this an > open issue? I tried making m species statics with a receiver argument, > but that makes the invocation non-virtual. Here?s some more notes on how we might translate generic methods using species-static methods and nested classes. General strategy * A {static,species,instance} generic method m() in |Foo| is desugared into a species method |m()| in a {static,species,instance} nested class |Foo$m|. * The accessibility of the method |m()| is lifted onto the class |Foo$m|. * Foo also gets an erased bridge, that redirects to the erased invocation of the generic method (binary compatibility only.) * A generic method is invoked with indy. The indy call site statically captures the type parameters for the invocation, some representation of the owning class Foo, and the method m(). The dynamic argument list captures the Foo-valued receiver (for instance methods) and the arguments to the generic method. Goals * Dispatch should be fast :) * It would be nice if the name-mangling strategy (|Foo$m|) were private to |Foo| ? that it does not appear in bytecode of Foo?s clients or subclasses. * It would be ideal if we could express what we want with bytecode alone, not indy, but that does not seem possible in all cases at this time. Static methods Given source code: |class Foo { ACC static void m(U u) { } } | We translate this as: |@Generic class Foo { // for binary compatibility only @Bridge ACC static void m(Object u) { Foo.Foo$mm(u); } @Generic ACC static class Foo$m { ACC species void m(U u) { } } } | An invocation |Foo.m(u)| (where U is known statically) can be translated in bytecode as |invokespecies Foo.Foo$m.m(U) | However, since it was a goal to not let the mangled name |Foo$m| leak into client code, we can easily wrap this with an indy: |invokedynamic GenericStaticMethod[Foo, "m", descriptor, U](u) ^bootstrap ^static args ^dyn args | and let the bootstrap put together the class name |Foo$m| from constituent parts (the bootstrap and the compiler share a conspiratorial connection, but the client bytecode doesn?t participate). Since everything is static at the call site, we can link to a |ConstantCallSite| that always dispatches to an |MH[invokespecies Foo$m.m(U)]|. Species methods Species-static generic methods are translated almost the same way; given class |class Foo { ACC species void m(T t, U u) { } } | we translate as |@Generic class Foo { // for binary compatibility only @Bridge ACC species void m(Object t, Object u) { Foo.Foo$mm(t, u); } @Generic ACC species class Foo$m { ACC species void m(T t, U u) { } } } | and an invocation |Foo.m(T,U)| is translated as |invokespecies Foo.Foo$m.m(T, U) | Instance methods Our translation strategy ? desugaring to a helper class ? introduces some challenges in instance method dispatch. |class Foo { ACC void m(T t, U u) { } } class Bar extends Foo { @Override ACC void m(T t, U u) { } } | I am proposing we translate this as: |@Generic class Foo { // for binary compatibility only @Bridge ACC void m(Object t, Object u) { this.Foo$mm(t, u); } @Generic ACC species class Foo$m { ACC species void m(Foo outer, T t, U u) { } } } @Generic class Bar { // for binary compatibility only @Bridge ACC void m(Object t, Object u) { this.Bar$mm(t, u); } @Generic ACC species class Bar$m { ACC species void m(Bar outer, T t, U u) { } } } | In this proposal, |Bar$m| does not extend |Foo$m|; this is to avoid leaking dependence on desugaring in subclass bytecode. The implementation methods in |Xxx$m| take an extra ?outer? argument, which is the receiver for the instance generic method invocation. The use of species-static methods for the implementation methods mean that we need not maintain instances of Foo$m, but instead can pass the actual Fooreceiver directly to the implementation. For an invocation: |Foo f = ... f.m(t,u) | We translate with indy as: |invokedynamic InstanceStaticMethod[ParamType[Foo,T], "m", descriptor, U](r,t,u) | The static receiver type ? here |Foo| ? is a static parameter to the bootstrap. Let?s call this class SR (the actual target will be |Xxx$m| where |Xxx| may be SR or a subclass of SR.) The dynamic receiver (a |Foo|) is passed in the dynamic argument list as |r|. The linking of the callsite is somewhat complex, but should optimize reasonably well. It proceeds as follows: * For each (static receiver class, method, specialization args) ? all static properties of the callsite ? there is a dispatch table, which is found statically at linkage time and stored as part of the callsite state; * The dispatch table is a |ClassValue|, which maps the dynamic receiver type (a subtype of SR) to a |MethodHandle| for the |invokespecies| implementation method; * Invocation performs |ClassValue.get(receiver.getClass()).invokeExact(receiver, args)| * This dispatch can be optionally wrapped with a PIC against |receiver.getClass()| The first thing the bootstrap must do (at linkage time) is compute SR. This is done by taking the owner class |Foo|, along with the method name and descriptor, computing the name-mangled class |Foo.Foo$m|, call it SR?. We then take this class and compute |SR=Crass.forSpecialization(SR', U)|. (Both of these computations are done with the classloader for |Foo|, and we should check that both SR? and SR share the classloader with |Foo|.) SR corresponds to the fully specialized class |Foo.Foo$m|. Once we?ve computed SR, we have to find the dispatch table for SR, which is a multi-step lookup first by |ClassLoader| (to allow for classloader unloading) and then by SR, which results in a |DispatchTable| mapping a dynamic receiver type |R| to a fully specialized desugared species-static MH. |class DispatchTable extends ClassValue { ... } class MetaDispatchTable extends ClassValue { ... } private static final WeakHashMap mdt = ... | We compute |mdt.computeIfAbsent(SR.getClassLoader(), ...).get(SR)| and store that as |DT| in the |CallSite|. The |ClassValue| implementation for |MetaDispatchTable| simply creates a new |ClassValue| entry. The callsite target is linked to the following logic: |Class R = r.getClass(); DT.get(R).invokeExact(r, args) | The |computeValue()| method of |DispatchTable| does the meat of the work. For the receiver type R, we have to find the corresponding |Xxx$m| class, which might be declared in a superclass of R, specialize it to the desired method type parameters, and look up (findSpecies) the appropriate specialized MH. (Finding the corresponding |Xxx$m| class could be done by walking the hierarchy directly, or by doing a |MethodHandle.resolveOrFail| on the erased bridge.) The cost of an invocation is one |ClassValue| lookup, plus the overhead of folding the arguments together appropriately and doing a MH invoke. The above logic seems representable as a single method handle expression using |fold| and |filter| combinators, but if not, might also introduce some varargs spreading/collecting overhead. (It could be further optimized by wrapping the the result of DT.get(R) with a PIC on R.) This doesn?t seem so bad. The first time a given target is resolved (a given combination of enclosing |Foo|, |m(...)|, type arguments U, and receiver class), a relatively expensive linkage step is performed ? but is then cached in a table specific to the members involved, not the call site ? so this should stabilize quickly. Interface methods Interface methods add an additional layer, but do not change the story fundamentally. If I have: |interface I { ACC void m(T t, U u) { } } class Foo implements I { @Override ACC void m(T t, U u) { } } | then we need to generate an artifact |I$m| artifact as we do with classes. When linking an invocation which static receiver is a specialization of an interface rather than a class, we compute |I$m| as our SR, and proceed as before. (In this case, our linkage strategy should probably use |resolveOrFail| rather than manual hierarchy walking, so we should probably do this in both cases.) Additionally, we may need to do a check that the |Xxx| class corresponding to the resolved |Xxx$m| holder class actually implements |I| ? again this can be done at linkage time, not dispatch time. Default methods If we do our dispatch using |resolveOrFail| against the erased bridges, and the method is not implemented in the receiver?s superclass hierarchy, then I believe that resolution will hand us the MH for the default? If, so, we?re good; we resolve to the default, just like any other implementation. One integrity risk here is that the |Xxx$m| hierarchy is properly aligned to the |Foo| hierarchy. I think we can validate that (at the time we lazily populate the dispatch table) simply by checking that the resolved erased target |Xxx.m()| and the corresponding specialized class |Xxx$m.m()| share a nest (and hence derive from the same source class.) ? From maurizio.cimadamore at oracle.com Wed Jun 15 17:39:25 2016 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Wed, 15 Jun 2016 18:39:25 +0100 Subject: Valhalla reflection API - first stab Message-ID: <576192CD.9050707@oracle.com> Hi, I've just pushed a new Valhalla-centric reflection API: http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-June/001968.html I'm working on a more complete document which will provide the background for the design decisions we made. I hope to make it available within the next few days. Cheers Maurizio From maurizio.cimadamore at oracle.com Wed Jun 22 16:05:30 2016 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Wed, 22 Jun 2016 17:05:30 +0100 Subject: Valhalla reflection API - first stab In-Reply-To: <576192CD.9050707@oracle.com> References: <576192CD.9050707@oracle.com> Message-ID: <576AB74A.7060905@oracle.com> Hi, as promised, here's a detailed writeup of the exploration we did in the reflection space. I hope this will be useful as a context for further discussions. Cheers Maurizio On 15/06/16 18:39, Maurizio Cimadamore wrote: > Hi, > I've just pushed a new Valhalla-centric reflection API: > > http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-June/001968.html > > I'm working on a more complete document which will provide the > background for the design decisions we made. I hope to make it > available within the next few days. > > Cheers > Maurizio From forax at univ-mlv.fr Wed Jun 22 16:20:11 2016 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 22 Jun 2016 18:20:11 +0200 (CEST) Subject: Valhalla reflection API - first stab In-Reply-To: <576AB74A.7060905@oracle.com> References: <576192CD.9050707@oracle.com> <576AB74A.7060905@oracle.com> Message-ID: <1693989791.1512639.1466612411195.JavaMail.zimbra@u-pem.fr> I suppose the attachment was eaten by some beast in between. R?mi ----- Mail original ----- > De: "Maurizio Cimadamore" > ?: valhalla-spec-experts at openjdk.java.net > Envoy?: Mercredi 22 Juin 2016 18:05:30 > Objet: Re: Valhalla reflection API - first stab > > Hi, > as promised, here's a detailed writeup of the exploration we did in the > reflection space. I hope this will be useful as a context for further > discussions. > > Cheers > Maurizio > > On 15/06/16 18:39, Maurizio Cimadamore wrote: > > Hi, > > I've just pushed a new Valhalla-centric reflection API: > > > > http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-June/001968.html > > > > I'm working on a more complete document which will provide the > > background for the design decisions we made. I hope to make it > > available within the next few days. > > > > Cheers > > Maurizio > > From maurizio.cimadamore at oracle.com Wed Jun 22 16:26:41 2016 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Wed, 22 Jun 2016 17:26:41 +0100 Subject: Valhalla reflection API - first stab In-Reply-To: <576AB74A.7060905@oracle.com> References: <576192CD.9050707@oracle.com> <576AB74A.7060905@oracle.com> Message-ID: <576ABC41.60405@oracle.com> There seem to be problems with the attachment - uploaded here: http://cr.openjdk.java.net/~mcimadamore/reflection-manifesto.html Maurizio On 22/06/16 17:05, Maurizio Cimadamore wrote: > Hi, > as promised, here's a detailed writeup of the exploration we did in > the reflection space. I hope this will be useful as a context for > further discussions. > > Cheers > Maurizio > > On 15/06/16 18:39, Maurizio Cimadamore wrote: >> Hi, >> I've just pushed a new Valhalla-centric reflection API: >> >> http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-June/001968.html >> >> >> I'm working on a more complete document which will provide the >> background for the design decisions we made. I hope to make it >> available within the next few days. >> >> Cheers >> Maurizio > From brian.goetz at oracle.com Tue Jun 28 17:43:03 2016 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 28 Jun 2016 13:43:03 -0400 Subject: In-person meeting Message-ID: <795faa88-5ec4-6100-e117-e22437dfad14@oracle.com> As we have done in previous years, I would like to hold an in-person EG meeting in Santa Clara the day after JVM Language Summit (Thursday). Please contact me offline to reserve a seat.