species static prototype

Wed Jun 1 20:19:22 UTC 2016

On 01/06/16 19:52, Bjorn B Vardal wrote:
> Will the users be able to write their own <sclinit>?
>
>   * class Foo {
>       o __species {
>           + ...
>         }
>     }
>
>
Hi Bjorn,

Yep - that is supported.
> Your access bridge solution using species methods looks fine, but are 
> we not solving that with nest mates?
> I'm also wondering whether the following are typos, or if I 
> misunderstood them:
>
>   * TestResolution.m_I() was not meant to be decorated with '__species'
>
Right - that's a type, the 'species' modifier was meant to be omitted 
(i.e. it's an instance method)
>
>   * TestForwardRef2.s1_S and TestForwardRef2.s2_SS don't have the
>     correct modifiers, or should not be error cases.
>
Yeah - missing static and species there - in general members with _S are 
meant to be static, those with _SS are meant to be 'species'
>
>   * TestTypeVar<X>.m_I() was not meant to be decorated with '__species'
>
Yep - same as above

Sorry for the typos!

Maurizio
> --
> Bjørn Vårdal
> IBM Runtimes
>
>     ----- Original message -----
>     From: Maurizio Cimadamore <maurizio.cimadamore at oracle.com>
>     Sent by: "valhalla-spec-experts"
>     <valhalla-spec-experts-bounces at openjdk.java.net>
>     To: valhalla-spec-experts at openjdk.java.net
>     Cc:
>     Subject: species static prototype
>     Date: Fri, May 27, 2016 4:56 PM
>
>     Hi,
>     over the last few days I've been busy putting together a prototype
>     [1, 2] of javac/runtime support for species static. I guess it
>     could be considered an prototype implementation of the approach
>     that Bjorn has described as "Repurpose existing statics" [4] in
>     his nice writeup. Here's what I have learned during the experience.
>
>     Parser
>     ====
>
>     The prototype uses a no fuss approach where '__species' is the
>     modifier to denote species static stuff (of course a better syntax
>     will have to be picked at some point, but that's not the goal of
>     the current exercise). This means you can write:
>
>     class Foo<X> {
>        String i; //instance field
>        static String s; //static field
>        __species String ss; //species static field
>     }
>
>     This is obviously good enough for the time being.
>
>     A complication with parsing occurs when accessing species members;
>     in fact, species members can be accessed via their fully qualified
>     type (including all required type-arguments, if necessary).
>
>     Foo<String>.ss;
>     Foo<int>.ss;
>
>     The above are all valid species access expression. Now, adding
>     this kind of support in the parser is always tricky - as we have
>     to battle with ambiguities which might pop up. Luckily, this
>     pattern is similar enough to the one we use for method references
>     - i.e. :
>
>     Foo<String>::ss
>
>     Which the compiler already had to special case; so I ended up
>     slightly generalizing what we did in JDK 8 method reference
>     parsing, and I got something working reasonably quick. But this
>     could be an area where coming up with a clean spec might be tricky
>     (as the impl uses abundant lookahead to disambiguate this one).
>
>     Resolution
>     ======
>
>     The basic idea is to divide the world in three static levels,
>     whose properties are summarized in the table below:
>     	enclosing type 	enclosing instance
>     instance 	yes 	yes
>     species 	yes 	no
>     static 	no 	no
>
>
>     So, in terms of who can access what, it follows that if we
>     consider 'instance' to be the highest static level and 'static' to
>     be the lowest, then it's ok for a member with static level S1 to
>     access another member of static level S2 provided that S1 >= S2.
>     Or, with a table:
>     from/to 	instance 	species 	static
>     instance 	yes 	yes 	yes
>     species 	no 	yes 	yes
>     static 	no 	no 	yes
>
>
>
>     So, let's look at a concrete example:
>
>     class TestResolution {
>         static void m_S() {
>             m_S(); //ok
>             m_SS(); //error
>             m_I(); //error
>         }
>
>         __species void m_SS() {
>             m_S(); //ok
>             m_SS(); //ok
>             m_I(); //error
>         }
>
>         __species void m_I() {
>             m_S(); //ok
>             m_SS(); //ok
>             m_I(); //ok
>         }
>     }
>
>     A crucial property, of course, is that species static members can
>     reference to any type vars in the enclosing context:
>
>     class TestTypeVar<X> {
>         static void m_S() {
>             X x; //error
>         }
>
>         __species void m_SS() {
>              X x; //ok
>         }
>         __species void m_I() {
>              X x; //ok
>         }
>     }
>
>     Nesting
>     =====
>
>     Another concept that needs generalization is that of allowed
>     nesting; consider the following program:
>
>     class TestNesting1 {
>         class MemberInner {
>             static String s_S; //error
>             String s_I; //ok
>         }
>
>         static class StaticInner {
>             static String s_S; //ok
>             String s_I; //ok
>         }
>     }
>
>     That is, the compiler will only allow you to declare static
>     members in toplevel classes or in static nested classes (which,
>     after all, act as toplevel classes). Now that we are adding a new
>     static level to the picture, how are the nesting rules affected?
>
>     Looking at the table above, if we consider 'instance' to be the
>     highest static level and 'static' to be the lowest, then it's ok
>     for a member with static level S1 to declare a member of static
>     level S2 provided that S1 <= S2. Again, we can look at this in a
>     tabular fashion:
>     declaring/declared 	instance 	species 	static
>     instance 	yes 	no 	no
>     species 	yes 	yes 	no
>     static 	yes 	yes 	yes
>
>
>     This also seems like a nice generalization of the current rules.
>     The rationale behind these rules is to  basically, guarantee some
>     invariants during member lookup; let's say that we are in a nested
>     class with static level S1 - then, by the rule above, it follows
>     that any member nested in this class will be able to access
>     another member with static level S1 declared in this class or in
>     any lexically enclosing class.
>
>     A full example of nesting rules is given below:
>
>     class TestNesting2 {
>         class MemberInner {
>             static String s_S; //error
>             __species String s_SS; //error
>             String s_I; //ok
>         }
>
>
>         __species class StaticInner {
>             static String s_S; //error
>             __species String s_SS; //ok
>             String s_I; //ok
>         }
>
>         static class StaticInner {
>             static String s_S; //ok
>             __species String s_SS; //ok
>             String s_I; //ok
>         }
>     }
>
>     Unchecked access
>     ===========
>
>     Because of an unfortunate interplay between species and erasure,
>     code using species members is potentially unsound (the example
>     below is a variation of an example first discovered by Peter's
>     example [3] in this very mailing list):
>
>     public class Foo<any T> {
>         __species T cache;
>     }
>
>
>     Foo<String>.cache = "Hello";
>     Integer i = Foo<Integer>.cache; //whoops
>
>     To prevent cases like these, the compiler implements a check which
>     looks at the qualifier of a species access; if such qualifier
>     (either explicit, or implicit) cannot be proven to be reifiable,
>     an unchecked warning is issued.
>
>     Note that it is possible to restrict such warnings only to cases
>     where the signature of the accessed species static member changes
>     under erasure. E.g. in the above example, accessing 'cache' is
>     unchecked, because the type of 'cache' contains type-variables;
>     but if another species static field was accessed whose type did
>     not depend on type-variables, then the access should be considered
>     sound.
>
>
>     Species initializers
>     ===========
>
>     In our model we have three static levels - but we have
>     initialization artifacts for only two of those; we need to fix that:
>     instance 	<init>
>     species 	<sclinit>
>     static 	<clinit>
>
>
>
>     That is, a new <sclinit> method is added to a class containing one
>     or more species variables with an initializer. This method is used
>     to hoist the initialization code for all the species variables.
>
>     Forward references
>     ============
>
>     Rules for detecting forward references have to be extended
>     accordingly. A forward reference occurs whenever there's an
>     attempt to reference a variable from a position P, where the
>     variable declaration occurs in a position P' > P. Currently, the
>     rules for forward references allow an instance variable to
>     forward-reference a static variable - as shown below:
>
>     class TestForwardRef {
>        String s = s_S;
>        static String s_S = "Hello!";
>     }
>
>     The rationale behind this is that, by the time we see the instance
>     initializer for 's' we would have already executed the code for
>     initializing 's_S' (as initialization will occur in different
>     methods, <init> and <clinit> respectively, see section above).
>     With the new static level, the forward reference rules have to be
>     redefined according to the table below:
>
>     from/to 	instance 	species 	static
>     instance 	forward ref 	ok 	ok
>     species 	illegal 	forward ref 	ok
>     static 	illegal 	illegal 	forward ref
>
>
>     In other words, it's ok to forward reference a variable whose
>     static level is lower than that available where the reference
>     occurs. An example is given below:
>
>     class TestForwardRef2 {
>        String s1_I = s_S; //ok
>        String s2_I = s_SS; //ok
>
>        String s1_S = s_S; //error!
>
>        String s1_SS = s_S; //ok
>        String s2_SS = s_SS; //error!
>
>     static String s_S = "Hello!";
>        __species String s_SS = "Hello Species!";
>     }
>
>     This is an extension of the above principle: since instance
>     variables are initialized in <init>, they can reference variables
>     initialized in <clinit> or <sclinit>. If a variable is initialized
>     in <sclinit> it can similarly safely reference a variable
>     initialized in <clinit>. Another way to think of this is that a
>     forward reference error only occurs if the static level of the
>     referenced symbol is the same as the static level where the
>     reference occurs. All other cases are either illegal (i.e. because
>     it's an attempt to go from a lower static level to an higher one)
>     or valid (because it can be guaranteed that the code initializing
>     the referenced variable has already been executed).
>
>     Code generation
>     ==========
>
>     Javac currently emits invokestatic/getstatic/putstatic for both
>     legacy static and species static access. javac will use the
>     'owner' field of a CONSTANT_MethodRef, CONSTANT_FieldRef constants
>     to point to the sharp type of the species access (through a
>     constant pool type entry). Static access will always see an erased
>     owner.
>
>     Consider this example:
>
>     class TestGen<any X> {
>        __species void m_SS() { }
>        static void m_S() { }
>
>        public static void main(String args) {
>            TestGen<String>.m_SS();
>            TestGen<int>.m_SS();
>            TestGen<String>.m_S();
>            TestGen<int>.m_S();
>        }
>     }
>
>     The generated code in the 'main' method is reported below:
>
>     0: invokestatic  #11                 // Method TestGen<_>.m_SS:()V
>     3: invokestatic  #15                 // Method TestGen<I>.m_SS:()V
>     6: invokestatic  #18                 // Method TestGen<_>.m_S:()V
>     9: invokestatic  #18                 // Method TestGen<_>.m_S:()V
>
>     As it can be seen, species static access can cause a sharper type
>     to end up in the 'owner' field of the member reference info; on
>     the other hand, a static access always lead to an erased 'owner'.
>
>     Another detail worth mentioning is how __species is represented in
>     the bytecode. Given the current lack of flags bit I've opted to
>     use the last remaining bit 0x8000 - this is in fact the last
>     unused bit that can be shared across class, field and method
>     descriptors. Actually, this bit has already been used to encode
>     the ACC_MANDATED flag in the MethodParameters attribute (as of JDK
>     8) - but since there's no other usage of that flag configuration
>     outside MethodParameters it would seem safe to recycle it. Of
>     course more compact approaches are also possible, but they would
>     lead to different flag configurations for species static fields,
>     methods and classes.
>
>     Specialization
>     =========
>
>     Specializing species access is relatively straightforward:
>
>     * both instance and species static members are copied in the
>     specialization
>     * static members are only copied in the erased specialization (and
>     skipped otherwise)
>     * ACC_SPECIES classes become regular classes when specialized
>     * ACC_SPECIES methods/fields become static methods/fields in the
>     specialization
>     * <sclinit> becomes the new <clinit> in the specialization (and is
>     omitted if the specialization is the erased specialization)
>
>     The last bullet requires some extra care when handling the
>     'erased' specialization; consider the following example:
>
>     class TestSpec<any X> {
>        static String s_S = "HelloStatic";
>        __species String s_SS = "HelloSpecies";
>     }
>
>     This class will end up with the following two synthetic methods:
>
>     static void <clinit>();
>         descriptor: ()V
>         flags: ACC_STATIC
>         Code:
>           stack=1, locals=0, args_size=0
>              0: ldc           #8                  // String HelloStatic
>              2: putstatic     #14                 // Field
>     s_S:Ljava/lang/String;
>              5: ldc           #16                 // String HelloSpecies
>              7: putstatic     #19                 // Field
>     s_SS:Ljava/lang/String;
>             10: return
>
>       species void <sclinit>();
>         descriptor: ()V
>         flags: ACC_SPECIES
>         Code:
>           stack=1, locals=1, args_size=1
>              0: ldc           #16                 // String HelloSpecies
>              2: putstatic     #19                 // Field
>     s_SS:Ljava/lang/String;
>              5: return
>
>     As it can be seen, the <clinit> method contains initialization
>     code for both static and species static fields! To understand why
>     this is so, let's consider how the specialized bits might be
>     derived from the template class following the rules above. Let's
>     consider a specialization like TestSpec<int>: in this case, we
>     need to drop <clinit> (it's a static method and TestSpec<int> is
>     not an erased specialization), and we also need to rename
>     <sclinit> as <clinit> in the new specialization. All is fine - the
>     specialization will contain the relevant code required to
>     initialize its species static fields.
>
>     Let's now turn to the erased specialization TestSpec<_> - this
>     specialization receives both static and species static members.
>     Now, if we were to follow the same rules for initializers, we'd
>     end up with two different initializer methods - both <clinit> and
>     <sclinit>. We could ask the specializer to merge them somehow, but
>     that would be tricky and expensive. Instead, we simply (i) drop
>     <sclinit> from the erased specialization and (ii) retain <clinit>.
>     Of course this means that <clinit> must also contain
>     initialization code for species static members.
>
>     Bonus point: Generic methods
>     ===================
>
>     As pointed out by Brian, if we have species static classes we can
>     translate static and species static specializable generic methods
>     quite effectively. Consider this example:
>
>     class TestGenMethods {
>        static <any X> void m(X x) { ... }
>
>        void test() {
>            m(42);
>        }
>     }
>
>     without species static, this would translate to:
>
>     class TestGenMethods {
>         static class TestGenMethods$m<any X> {
>              void m(X z) { ... }
>         }
>
>         /* bridge */ void m(Object o) { new TestGenMethods$m().m(o); }
>
>         void test() {
>             new TestGenMethod$m<int>().m(42); // this is really done
>     inside the BSM
>         }
>     }
>
>     Note how the bridge (called by legacy code) will need to spin a
>     new instance of the synthetic class and then call a method on it.
>     The bootstrap used to dispatch static generic specializable calls
>     also needs to do a very similar operation. But what if we turned
>     the translated generic method into a species static method?
>
>     class TestGenMethods {
>         class TestGenMethods$m<any X> {
>              __species void m(X z) { ... }
>         }
>
>         /* bridge */ void m(Object o) { TestGenMethods$m.m(o); }
>
>         void test() {
>             TestGenMethod$m<int>.m(42); // this is really done inside
>     the BSM
>         }
>     }
>
>     With species static, we can now access the method w/o needing any
>     extra instance. This leads to simplification in both the bridging
>     strategy and the bootstrap implementation. We can apply a similar
>     simplification for dispatch of specializable species static calls
>     - the only difference is that the synthetic holder class has also
>     to be marked as species static (since it could access type-vars
>     from the enclosing context).
>
>     Bonus point: Access bridges
>     =================
>
>     Access bridges are a constant pain in the current translation
>     strategy; such bridges are generated by the compiler to grant
>     access to otherwise inaccessible members. Example:
>
>     class Outer<any X> {
>         private void m() { }
>
>         class Inner {
>             void test() {
>                 m();
>             }
>         }
>     }
>
>     This code will be translated as follows:
>
>     class Outer<any X> {
>
>         /* synthetic */ static access$m(Outer o) { o.m(); }
>
>         private void m() { }
>
>         class Inner {
>             /*synthetic*/ Outer this$0;
>
>             void test() {
>                 access$m(this$0);
>             }
>         }
>     }
>
>     That is, access to private members is translated with an access to
>     an accessor bridge, which then performs access from the right
>     location. Note that the accessor bridge is static (because
>     otherwise it would be possible to maliciously override it to grant
>     access to otherwise inaccessible members); since it's static,
>     usual rules apply, so it cannot refer to type-variables, it cannot
>     be specialized, etc. This means that there are cases with
>     specialization where existing access bridge are not enough to
>     guarantee access - if the access happens to cross specialization
>     boundaries (i.e. accessing m() from an Outer<int>.Inner).
>
>     Again, species static comes to the rescue:
>
>     class Outer<any X> {
>
>         /* synthetic */ __species access$m(Outer<X> o) { o.m(); }
>
>         private void m() { }
>
>         class Inner {
>             /*synthetic*/ Outer this$0;
>
>             void test() {
>                 Outer<X>.access$m(this$0);
>             }
>         }
>     }
>
>     Since the accessor bridge is now species static, it means it can
>     now mention type variables (such as X); and it also means that
>     when the bridge is accessed (from Inner), the qualifier type
>     (Outer<X>) is guaranteed to remain sharp from the source code to
>     the bytecode - which means that when this code will get
>     specialized, all references to X will be dealt with accordingly
>     (and the right accessor bridge will be accessed).
>
>     Parting thoughts
>     ==========
>
>     On many levels, species statics seem to be the missing ingredient
>     for implementing many of the tricks of our translation strategy,
>     as well as to make it easier to express common idioms (i.e.
>     type-dependent caches) in user code.
>
>     Adding support for species static has proven to be harder than
>     originally thought. This is mainly because the current world is
>     split in two static levels: static and instance. When something is
>     not static it's implicitly assumed to be instance, and viceversa.
>     If we add a third static level to the picture, a lot of the
>     existing code just doesn't work anymore, or has to be validated to
>     check as to whether 'static' means 'legacy static' or 'species
>     static' (or both).
>
>     I started the implementation by treating static, species static
>     and instance as completely separate static levels - with different
>     internal flags, etc. but I soon realized that, while clean, this
>     approach was invalidating too much of the existing implementation.
>     More specifically, all the code snippets checking for static would
>     now have been updated to check for static OR species static
>     (overriding vs. hiding, access to 'this', access to 'super',
>     generic bridges, ...). On the other hand, the places where the
>     semantics of species static vs. static was different were quite
>     limited:
>
>     * membership/type substitution: a species static behaves like an
>     instance member; the type variables of the owner are replaced into
>     the member signature.
>     * resolution: we need to implement the correct access rules as
>     shown in the tables above.
>     * code generation: an invokestatic involving a species static gets
>     a sharp qualifier type
>
>     This quickly led to the realization that it was instead easier to
>     just treat 'species static' as a special case of 'static' - and
>     then to add finer grained logic whenever we really needed the
>     distinction. This led to a considerably easier patch, and I think
>     that a similar consideration will hold for the JLS.
>
>     [1] -
>     http://hg.openjdk.java.net/valhalla/valhalla/langtools/rev/6949c3d06e8f
>     [2] -
>     http://hg.openjdk.java.net/valhalla/valhalla/jdk/rev/836efde938c1
>     [3] -
>     http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-February/000096.html
>     [4] -
>     http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-May/000147.html
>
>     Maurizio
>
>
>