species static prototype

Remi Forax forax at univ-mlv.fr
Fri May 27 21:26:32 UTC 2016



----- Mail original -----
> De: "Maurizio Cimadamore" <maurizio.cimadamore at oracle.com>
> À: valhalla-spec-experts at openjdk.java.net
> Envoyé: Vendredi 27 Mai 2016 22:56:05
> Objet: species static prototype
> 
> Hi,
> over the last few days I've been busy putting together a prototype [1,
> 2] of javac/runtime support for species static. I guess it could be
> considered an prototype implementation of the approach that Bjorn has
> described as "Repurpose existing statics" [4] in his nice writeup.
> Here's what I have learned during the experience.
> 
> Parser
> ====
> 
> The prototype uses a no fuss approach where '__species' is the modifier
> to denote species static stuff (of course a better syntax will have to
> be picked at some point, but that's not the goal of the current
> exercise). This means you can write:
> 
> class Foo<X> {
>     String i; //instance field
>     static String s; //static field
>     __species String ss; //species static field
> }
> 
> This is obviously good enough for the time being.
> 
> A complication with parsing occurs when accessing species members; in
> fact, species members can be accessed via their fully qualified type
> (including all required type-arguments, if necessary).
> 
> Foo<String>.ss;
> Foo<int>.ss;
> 
> The above are all valid species access expression. Now, adding this kind
> of support in the parser is always tricky - as we have to battle with
> ambiguities which might pop up. Luckily, this pattern is similar enough
> to the one we use for method references - i.e. :
> 
> Foo<String>::ss
> 
> Which the compiler already had to special case; so I ended up slightly
> generalizing what we did in JDK 8 method reference parsing, and I got
> something working reasonably quick. But this could be an area where
> coming up with a clean spec might be tricky (as the impl uses abundant
> lookahead to disambiguate this one).
> 
> Resolution
> ======
> 
> The basic idea is to divide the world in three static levels, whose
> properties are summarized in the table below:
> 
> 
> 	enclosing type
> 	enclosing instance
> instance
> 	yes
> 	yes
> species
> 	yes
> 	no
> static
> 	no
> 	no
> 
> 
> So, in terms of who can access what, it follows that if we consider
> 'instance' to be the highest static level and 'static' to be the lowest,
> then it's ok for a member with static level S1 to access another member
> of static level S2 provided that S1 >= S2. Or, with a table:
> 
> from/to
> 	instance
> 	species
> 	static
> instance
> 	yes
> 	yes
> 	yes
> species
> 	no
> 	yes
> 	yes
> static
> 	no
> 	no
> 	yes
> 
> 
> 
> So, let's look at a concrete example:
> 
> class TestResolution {
>      static void m_S() {
>          m_S(); //ok
>          m_SS(); //error
>          m_I(); //error
>      }
> 
>      __species void m_SS() {
>          m_S(); //ok
>          m_SS(); //ok
>          m_I(); //error
>      }
> 
>      __species void m_I() {
>          m_S(); //ok
>          m_SS(); //ok
>          m_I(); //ok
>      }
> }
> 
> A crucial property, of course, is that species static members can
> reference to any type vars in the enclosing context:
> 
> class TestTypeVar<X> {
>      static void m_S() {
>          X x; //error
>      }
> 
>      __species void m_SS() {
>           X x; //ok
>      }
>      __species void m_I() {
>           X x; //ok
>      }
> }
> 
> Nesting
> =====
> 
> Another concept that needs generalization is that of allowed nesting;
> consider the following program:
> 
> class TestNesting1 {
>      class MemberInner {
>          static String s_S; //error
>          String s_I; //ok
>      }
> 
>      static class StaticInner {
>          static String s_S; //ok
>          String s_I; //ok
>      }
> }
> 
> That is, the compiler will only allow you to declare static members in
> toplevel classes or in static nested classes (which, after all, act as
> toplevel classes). Now that we are adding a new static level to the
> picture, how are the nesting rules affected?
> 
> Looking at the table above, if we consider 'instance' to be the highest
> static level and 'static' to be the lowest, then it's ok for a member
> with static level S1 to declare a member of static level S2 provided
> that S1 <= S2. Again, we can look at this in a tabular fashion:
> 
> declaring/declared
> 	instance
> 	species
> 	static
> instance
> 	yes
> 	no
> 	no
> species
> 	yes
> 	yes
> 	no
> static
> 	yes
> 	yes
> 	yes
> 
> 
> This also seems like a nice generalization of the current rules. The
> rationale behind these rules is to  basically, guarantee some invariants
> during member lookup; let's say that we are in a nested class with
> static level S1 - then, by the rule above, it follows that any member
> nested in this class will be able to access another member with static
> level S1 declared in this class or in any lexically enclosing class.
> 
> A full example of nesting rules is given below:
> 
> class TestNesting2 {
>      class MemberInner {
>          static String s_S; //error
>          __species String s_SS; //error
>          String s_I; //ok
>      }
> 
> 
>      __species class StaticInner {
>          static String s_S; //error
>          __species String s_SS; //ok
>          String s_I; //ok
>      }
> 
>      static class StaticInner {
>          static String s_S; //ok
>          __species String s_SS; //ok
>          String s_I; //ok
>      }
> }
> 
> Unchecked access
> ===========
> 
> Because of an unfortunate interplay between species and erasure, code
> using species members is potentially unsound (the example below is a
> variation of an example first discovered by Peter's example [3] in this
> very mailing list):
> 
> public class Foo<any T> {
>      __species T cache;
> }
> 
> 
> Foo<String>.cache = "Hello";
> Integer i = Foo<Integer>.cache; //whoops
> 
> To prevent cases like these, the compiler implements a check which looks
> at the qualifier of a species access; if such qualifier (either
> explicit, or implicit) cannot be proven to be reifiable, an unchecked
> warning is issued.
> 
> Note that it is possible to restrict such warnings only to cases where
> the signature of the accessed species static member changes under
> erasure. E.g. in the above example, accessing 'cache' is unchecked,
> because the type of 'cache' contains type-variables; but if another
> species static field was accessed whose type did not depend on
> type-variables, then the access should be considered sound.
> 

I wonder if it's not better to use Foo.cache for refs and Foo<int>/Foo<Complex> for primitives and value types ?

> 
> Species initializers
> ===========
> 
> In our model we have three static levels - but we have initialization
> artifacts for only two of those; we need to fix that:
> 
> instance
> 	<init>
> species
> 	<sclinit>
> static
> 	<clinit>
> 
> 
> 
> That is, a new <sclinit> method is added to a class containing one or
> more species variables with an initializer. This method is used to hoist
> the initialization code for all the species variables.
> 
> Forward references
> ============
> 
> Rules for detecting forward references have to be extended accordingly.
> A forward reference occurs whenever there's an attempt to reference a
> variable from a position P, where the variable declaration occurs in a
> position P' > P. Currently, the rules for forward references allow an
> instance variable to forward-reference a static variable - as shown below:
> 
> class TestForwardRef {
>     String s = s_S;
>     static String s_S = "Hello!";
> }
> 
> The rationale behind this is that, by the time we see the instance
> initializer for 's' we would have already executed the code for
> initializing 's_S' (as initialization will occur in different methods,
> <init> and <clinit> respectively, see section above). With the new
> static level, the forward reference rules have to be redefined according
> to the table below:
> 
> 
> from/to
> 	instance
> 	species
> 	static
> instance
> 	forward ref
> 	ok 	ok
> species
> 	illegal
> 	forward ref
> 	ok
> static
> 	illegal
> 	illegal
> 	forward ref
> 
> 
> In other words, it's ok to forward reference a variable whose static
> level is lower than that available where the reference occurs. An
> example is given below:
> 
> class TestForwardRef2 {
>     String s1_I = s_S; //ok
>     String s2_I = s_SS; //ok
> 
>     String s1_S = s_S; //error!
> 
>     String s1_SS = s_S; //ok
>     String s2_SS = s_SS; //error!
> 
> static String s_S = "Hello!";
>     __species String s_SS = "Hello Species!";
> }
> 
> This is an extension of the above principle: since instance variables
> are initialized in <init>, they can reference variables initialized in
> <clinit> or <sclinit>. If a variable is initialized in <sclinit> it can
> similarly safely reference a variable initialized in <clinit>. Another
> way to think of this is that a forward reference error only occurs if
> the static level of the referenced symbol is the same as the static
> level where the reference occurs. All other cases are either illegal
> (i.e. because it's an attempt to go from a lower static level to an
> higher one) or valid (because it can be guaranteed that the code
> initializing the referenced variable has already been executed).
> 
> Code generation
> ==========
> 
> Javac currently emits invokestatic/getstatic/putstatic for both legacy
> static and species static access. javac will use the 'owner' field of a
> CONSTANT_MethodRef, CONSTANT_FieldRef constants to point to the sharp
> type of the species access (through a constant pool type entry). Static
> access will always see an erased owner.
> 
> Consider this example:
> 
> class TestGen<any X> {
>     __species void m_SS() { }
>     static void m_S() { }
> 
>     public static void main(String args) {
>         TestGen<String>.m_SS();
>         TestGen<int>.m_SS();
>         TestGen<String>.m_S();
>         TestGen<int>.m_S();
>     }
> }
> 
> The generated code in the 'main' method is reported below:
> 
> 0: invokestatic  #11                 // Method TestGen<_>.m_SS:()V
> 3: invokestatic  #15                 // Method TestGen<I>.m_SS:()V
> 6: invokestatic  #18                 // Method TestGen<_>.m_S:()V
> 9: invokestatic  #18                 // Method TestGen<_>.m_S:()V
> 
> As it can be seen, species static access can cause a sharper type to end
> up in the 'owner' field of the member reference info; on the other hand,
> a static access always lead to an erased 'owner'.
> 
> Another detail worth mentioning is how __species is represented in the
> bytecode. Given the current lack of flags bit I've opted to use the last
> remaining bit 0x8000 - this is in fact the last unused bit that can be
> shared across class, field and method descriptors. Actually, this bit
> has already been used to encode the ACC_MANDATED flag in the
> MethodParameters attribute (as of JDK 8) - but since there's no other
> usage of that flag configuration outside MethodParameters it would seem
> safe to recycle it. Of course more compact approaches are also possible,
> but they would lead to different flag configurations for species static
> fields, methods and classes.

0x8000 is ACC_MODULE in 9 :(

> 
> Specialization
> =========
> 
> Specializing species access is relatively straightforward:
> 
> * both instance and species static members are copied in the specialization
> * static members are only copied in the erased specialization (and
> skipped otherwise)
> * ACC_SPECIES classes become regular classes when specialized
> * ACC_SPECIES methods/fields become static methods/fields in the
> specialization
> * <sclinit> becomes the new <clinit> in the specialization (and is
> omitted if the specialization is the erased specialization)
> 
> The last bullet requires some extra care when handling the 'erased'
> specialization; consider the following example:
> 
> class TestSpec<any X> {
>     static String s_S = "HelloStatic";
>     __species String s_SS = "HelloSpecies";
> }
> 
> This class will end up with the following two synthetic methods:
> 
> static void <clinit>();
>      descriptor: ()V
>      flags: ACC_STATIC
>      Code:
>        stack=1, locals=0, args_size=0
>           0: ldc           #8                  // String HelloStatic
>           2: putstatic     #14                 // Field
> s_S:Ljava/lang/String;
>           5: ldc           #16                 // String HelloSpecies
>           7: putstatic     #19                 // Field
> s_SS:Ljava/lang/String;
>          10: return
> 
>    species void <sclinit>();
>      descriptor: ()V
>      flags: ACC_SPECIES
>      Code:
>        stack=1, locals=1, args_size=1
>           0: ldc           #16                 // String HelloSpecies
>           2: putstatic     #19                 // Field
> s_SS:Ljava/lang/String;
>           5: return
> 
> As it can be seen, the <clinit> method contains initialization code for
> both static and species static fields! To understand why this is so,
> let's consider how the specialized bits might be derived from the
> template class following the rules above. Let's consider a
> specialization like TestSpec<int>: in this case, we need to drop
> <clinit> (it's a static method and TestSpec<int> is not an erased
> specialization), and we also need to rename <sclinit> as <clinit> in the
> new specialization. All is fine - the specialization will contain the
> relevant code required to initialize its species static fields.
> 
> Let's now turn to the erased specialization TestSpec<_> - this
> specialization receives both static and species static members. Now, if
> we were to follow the same rules for initializers, we'd end up with two
> different initializer methods - both <clinit> and <sclinit>. We could
> ask the specializer to merge them somehow, but that would be tricky and
> expensive. Instead, we simply (i) drop <sclinit> from the erased
> specialization and (ii) retain <clinit>. Of course this means that
> <clinit> must also contain initialization code for species static members.
> 
> Bonus point: Generic methods
> ===================
> 
> As pointed out by Brian, if we have species static classes we can
> translate static and species static specializable generic methods quite
> effectively. Consider this example:
> 
> class TestGenMethods {
>     static <any X> void m(X x) { ... }
> 
>     void test() {
>         m(42);
>     }
> }
> 
> without species static, this would translate to:
> 
> class TestGenMethods {
>      static class TestGenMethods$m<any X> {
>           void m(X z) { ... }
>      }
> 
>      /* bridge */ void m(Object o) { new TestGenMethods$m().m(o); }
> 
>      void test() {
>          new TestGenMethod$m<int>().m(42); // this is really done inside
> the BSM
>      }
> }
> 
> Note how the bridge (called by legacy code) will need to spin a new
> instance of the synthetic class and then call a method on it. The
> bootstrap used to dispatch static generic specializable calls also needs
> to do a very similar operation. But what if we turned the translated
> generic method into a species static method?
> 
> class TestGenMethods {
>      class TestGenMethods$m<any X> {
>           __species void m(X z) { ... }
>      }
> 
>      /* bridge */ void m(Object o) { TestGenMethods$m.m(o); }
> 
>      void test() {
>          TestGenMethod$m<int>.m(42); // this is really done inside the BSM
>      }
> }
> 
> With species static, we can now access the method w/o needing any extra
> instance. This leads to simplification in both the bridging strategy and
> the bootstrap implementation. We can apply a similar simplification for
> dispatch of specializable species static calls - the only difference is
> that the synthetic holder class has also to be marked as species static
> (since it could access type-vars from the enclosing context).
> 
> Bonus point: Access bridges
> =================
> 
> Access bridges are a constant pain in the current translation strategy;
> such bridges are generated by the compiler to grant access to otherwise
> inaccessible members. Example:
> 
> class Outer<any X> {
>      private void m() { }
> 
>      class Inner {
>          void test() {
>              m();
>          }
>      }
> }
> 
> This code will be translated as follows:
> 
> class Outer<any X> {
> 
>      /* synthetic */ static access$m(Outer o) { o.m(); }
> 
>      private void m() { }
> 
>      class Inner {
>          /*synthetic*/ Outer this$0;
> 
>          void test() {
>              access$m(this$0);
>          }
>      }
> }
> 
> That is, access to private members is translated with an access to an
> accessor bridge, which then performs access from the right location.
> Note that the accessor bridge is static (because otherwise it would be
> possible to maliciously override it to grant access to otherwise
> inaccessible members); since it's static, usual rules apply, so it
> cannot refer to type-variables, it cannot be specialized, etc. This
> means that there are cases with specialization where existing access
> bridge are not enough to guarantee access - if the access happens to
> cross specialization boundaries (i.e. accessing m() from an
> Outer<int>.Inner).
> 
> Again, species static comes to the rescue:
> 
> class Outer<any X> {
> 
>      /* synthetic */ __species access$m(Outer<X> o) { o.m(); }
> 
>      private void m() { }
> 
>      class Inner {
>          /*synthetic*/ Outer this$0;
> 
>          void test() {
>              Outer<X>.access$m(this$0);
>          }
>      }
> }
> 
> Since the accessor bridge is now species static, it means it can now
> mention type variables (such as X); and it also means that when the
> bridge is accessed (from Inner), the qualifier type (Outer<X>) is
> guaranteed to remain sharp from the source code to the bytecode - which
> means that when this code will get specialized, all references to X will
> be dealt with accordingly (and the right accessor bridge will be accessed).

do we still need bridge if we have nestmate ?

> 
> Parting thoughts
> ==========
> 
> On many levels, species statics seem to be the missing ingredient for
> implementing many of the tricks of our translation strategy, as well as
> to make it easier to express common idioms (i.e. type-dependent caches)
> in user code.

Given that type-dependent cache use the same value for all Object specialization,
it severely limit it's usefulness,  you can not retrofit java.lang.ClassValue to use it by example.

The last mail of Brian that propose to add a support of any (wildcard) in the VM let me wonder if we can not do the same but for the bottom of the type lattice instead of the top,
i.e. introduce a type named Nothing can is a subtype of the type of null and the primitive types/value types.
In that case, you don't need to have several specialization of EmptyList because you can write EmptyList<Nothing> which is a subtype of EmptyList<T> for any T (bounded by any). 

> 
> Adding support for species static has proven to be harder than
> originally thought. This is mainly because the current world is split in
> two static levels: static and instance. When something is not static
> it's implicitly assumed to be instance, and viceversa. If we add a third
> static level to the picture, a lot of the existing code just doesn't
> work anymore, or has to be validated to check as to whether 'static'
> means 'legacy static' or 'species static' (or both).
> 
> I started the implementation by treating static, species static and
> instance as completely separate static levels - with different internal
> flags, etc. but I soon realized that, while clean, this approach was
> invalidating too much of the existing implementation. More specifically,
> all the code snippets checking for static would now have been updated to
> check for static OR species static (overriding vs. hiding, access to
> 'this', access to 'super', generic bridges, ...). On the other hand, the
> places where the semantics of species static vs. static was different
> were quite limited:
> 
> * membership/type substitution: a species static behaves like an
> instance member; the type variables of the owner are replaced into the
> member signature.
> * resolution: we need to implement the correct access rules as shown in
> the tables above.
> * code generation: an invokestatic involving a species static gets a
> sharp qualifier type
> 
> This quickly led to the realization that it was instead easier to just
> treat 'species static' as a special case of 'static' - and then to add
> finer grained logic whenever we really needed the distinction. This led
> to a considerably easier patch, and I think that a similar consideration
> will hold for the JLS.

if you consider the semantics of C#, static is an anomaly, every static in a generics class should be static species, so it seems that from the compileer POV, you can consider not species static as a special form of static but static as a special form of species static ...

> 
> [1] -
> http://hg.openjdk.java.net/valhalla/valhalla/langtools/rev/6949c3d06e8f
> [2] - http://hg.openjdk.java.net/valhalla/valhalla/jdk/rev/836efde938c1
> [3] -
> http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-February/000096.html
> [4] -
> http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-May/000147.html
> 
> Maurizio

regards,
Rémi


More information about the valhalla-spec-observers mailing list