species static prototype
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Wed Jun 1 20:19:22 UTC 2016
On 01/06/16 19:52, Bjorn B Vardal wrote:
> Will the users be able to write their own <sclinit>?
>
> * class Foo {
> o __species {
> + ...
> }
> }
>
>
Hi Bjorn,
Yep - that is supported.
> Your access bridge solution using species methods looks fine, but are
> we not solving that with nest mates?
> I'm also wondering whether the following are typos, or if I
> misunderstood them:
>
> * TestResolution.m_I() was not meant to be decorated with '__species'
>
Right - that's a type, the 'species' modifier was meant to be omitted
(i.e. it's an instance method)
>
> * TestForwardRef2.s1_S and TestForwardRef2.s2_SS don't have the
> correct modifiers, or should not be error cases.
>
Yeah - missing static and species there - in general members with _S are
meant to be static, those with _SS are meant to be 'species'
>
> * TestTypeVar<X>.m_I() was not meant to be decorated with '__species'
>
Yep - same as above
Sorry for the typos!
Maurizio
> --
> Bjørn Vårdal
> IBM Runtimes
>
> ----- Original message -----
> From: Maurizio Cimadamore <maurizio.cimadamore at oracle.com>
> Sent by: "valhalla-spec-experts"
> <valhalla-spec-experts-bounces at openjdk.java.net>
> To: valhalla-spec-experts at openjdk.java.net
> Cc:
> Subject: species static prototype
> Date: Fri, May 27, 2016 4:56 PM
>
> Hi,
> over the last few days I've been busy putting together a prototype
> [1, 2] of javac/runtime support for species static. I guess it
> could be considered an prototype implementation of the approach
> that Bjorn has described as "Repurpose existing statics" [4] in
> his nice writeup. Here's what I have learned during the experience.
>
> Parser
> ====
>
> The prototype uses a no fuss approach where '__species' is the
> modifier to denote species static stuff (of course a better syntax
> will have to be picked at some point, but that's not the goal of
> the current exercise). This means you can write:
>
> class Foo<X> {
> String i; //instance field
> static String s; //static field
> __species String ss; //species static field
> }
>
> This is obviously good enough for the time being.
>
> A complication with parsing occurs when accessing species members;
> in fact, species members can be accessed via their fully qualified
> type (including all required type-arguments, if necessary).
>
> Foo<String>.ss;
> Foo<int>.ss;
>
> The above are all valid species access expression. Now, adding
> this kind of support in the parser is always tricky - as we have
> to battle with ambiguities which might pop up. Luckily, this
> pattern is similar enough to the one we use for method references
> - i.e. :
>
> Foo<String>::ss
>
> Which the compiler already had to special case; so I ended up
> slightly generalizing what we did in JDK 8 method reference
> parsing, and I got something working reasonably quick. But this
> could be an area where coming up with a clean spec might be tricky
> (as the impl uses abundant lookahead to disambiguate this one).
>
> Resolution
> ======
>
> The basic idea is to divide the world in three static levels,
> whose properties are summarized in the table below:
> enclosing type enclosing instance
> instance yes yes
> species yes no
> static no no
>
>
> So, in terms of who can access what, it follows that if we
> consider 'instance' to be the highest static level and 'static' to
> be the lowest, then it's ok for a member with static level S1 to
> access another member of static level S2 provided that S1 >= S2.
> Or, with a table:
> from/to instance species static
> instance yes yes yes
> species no yes yes
> static no no yes
>
>
>
> So, let's look at a concrete example:
>
> class TestResolution {
> static void m_S() {
> m_S(); //ok
> m_SS(); //error
> m_I(); //error
> }
>
> __species void m_SS() {
> m_S(); //ok
> m_SS(); //ok
> m_I(); //error
> }
>
> __species void m_I() {
> m_S(); //ok
> m_SS(); //ok
> m_I(); //ok
> }
> }
>
> A crucial property, of course, is that species static members can
> reference to any type vars in the enclosing context:
>
> class TestTypeVar<X> {
> static void m_S() {
> X x; //error
> }
>
> __species void m_SS() {
> X x; //ok
> }
> __species void m_I() {
> X x; //ok
> }
> }
>
> Nesting
> =====
>
> Another concept that needs generalization is that of allowed
> nesting; consider the following program:
>
> class TestNesting1 {
> class MemberInner {
> static String s_S; //error
> String s_I; //ok
> }
>
> static class StaticInner {
> static String s_S; //ok
> String s_I; //ok
> }
> }
>
> That is, the compiler will only allow you to declare static
> members in toplevel classes or in static nested classes (which,
> after all, act as toplevel classes). Now that we are adding a new
> static level to the picture, how are the nesting rules affected?
>
> Looking at the table above, if we consider 'instance' to be the
> highest static level and 'static' to be the lowest, then it's ok
> for a member with static level S1 to declare a member of static
> level S2 provided that S1 <= S2. Again, we can look at this in a
> tabular fashion:
> declaring/declared instance species static
> instance yes no no
> species yes yes no
> static yes yes yes
>
>
> This also seems like a nice generalization of the current rules.
> The rationale behind these rules is to basically, guarantee some
> invariants during member lookup; let's say that we are in a nested
> class with static level S1 - then, by the rule above, it follows
> that any member nested in this class will be able to access
> another member with static level S1 declared in this class or in
> any lexically enclosing class.
>
> A full example of nesting rules is given below:
>
> class TestNesting2 {
> class MemberInner {
> static String s_S; //error
> __species String s_SS; //error
> String s_I; //ok
> }
>
>
> __species class StaticInner {
> static String s_S; //error
> __species String s_SS; //ok
> String s_I; //ok
> }
>
> static class StaticInner {
> static String s_S; //ok
> __species String s_SS; //ok
> String s_I; //ok
> }
> }
>
> Unchecked access
> ===========
>
> Because of an unfortunate interplay between species and erasure,
> code using species members is potentially unsound (the example
> below is a variation of an example first discovered by Peter's
> example [3] in this very mailing list):
>
> public class Foo<any T> {
> __species T cache;
> }
>
>
> Foo<String>.cache = "Hello";
> Integer i = Foo<Integer>.cache; //whoops
>
> To prevent cases like these, the compiler implements a check which
> looks at the qualifier of a species access; if such qualifier
> (either explicit, or implicit) cannot be proven to be reifiable,
> an unchecked warning is issued.
>
> Note that it is possible to restrict such warnings only to cases
> where the signature of the accessed species static member changes
> under erasure. E.g. in the above example, accessing 'cache' is
> unchecked, because the type of 'cache' contains type-variables;
> but if another species static field was accessed whose type did
> not depend on type-variables, then the access should be considered
> sound.
>
>
> Species initializers
> ===========
>
> In our model we have three static levels - but we have
> initialization artifacts for only two of those; we need to fix that:
> instance <init>
> species <sclinit>
> static <clinit>
>
>
>
> That is, a new <sclinit> method is added to a class containing one
> or more species variables with an initializer. This method is used
> to hoist the initialization code for all the species variables.
>
> Forward references
> ============
>
> Rules for detecting forward references have to be extended
> accordingly. A forward reference occurs whenever there's an
> attempt to reference a variable from a position P, where the
> variable declaration occurs in a position P' > P. Currently, the
> rules for forward references allow an instance variable to
> forward-reference a static variable - as shown below:
>
> class TestForwardRef {
> String s = s_S;
> static String s_S = "Hello!";
> }
>
> The rationale behind this is that, by the time we see the instance
> initializer for 's' we would have already executed the code for
> initializing 's_S' (as initialization will occur in different
> methods, <init> and <clinit> respectively, see section above).
> With the new static level, the forward reference rules have to be
> redefined according to the table below:
>
> from/to instance species static
> instance forward ref ok ok
> species illegal forward ref ok
> static illegal illegal forward ref
>
>
> In other words, it's ok to forward reference a variable whose
> static level is lower than that available where the reference
> occurs. An example is given below:
>
> class TestForwardRef2 {
> String s1_I = s_S; //ok
> String s2_I = s_SS; //ok
>
> String s1_S = s_S; //error!
>
> String s1_SS = s_S; //ok
> String s2_SS = s_SS; //error!
>
> static String s_S = "Hello!";
> __species String s_SS = "Hello Species!";
> }
>
> This is an extension of the above principle: since instance
> variables are initialized in <init>, they can reference variables
> initialized in <clinit> or <sclinit>. If a variable is initialized
> in <sclinit> it can similarly safely reference a variable
> initialized in <clinit>. Another way to think of this is that a
> forward reference error only occurs if the static level of the
> referenced symbol is the same as the static level where the
> reference occurs. All other cases are either illegal (i.e. because
> it's an attempt to go from a lower static level to an higher one)
> or valid (because it can be guaranteed that the code initializing
> the referenced variable has already been executed).
>
> Code generation
> ==========
>
> Javac currently emits invokestatic/getstatic/putstatic for both
> legacy static and species static access. javac will use the
> 'owner' field of a CONSTANT_MethodRef, CONSTANT_FieldRef constants
> to point to the sharp type of the species access (through a
> constant pool type entry). Static access will always see an erased
> owner.
>
> Consider this example:
>
> class TestGen<any X> {
> __species void m_SS() { }
> static void m_S() { }
>
> public static void main(String args) {
> TestGen<String>.m_SS();
> TestGen<int>.m_SS();
> TestGen<String>.m_S();
> TestGen<int>.m_S();
> }
> }
>
> The generated code in the 'main' method is reported below:
>
> 0: invokestatic #11 // Method TestGen<_>.m_SS:()V
> 3: invokestatic #15 // Method TestGen<I>.m_SS:()V
> 6: invokestatic #18 // Method TestGen<_>.m_S:()V
> 9: invokestatic #18 // Method TestGen<_>.m_S:()V
>
> As it can be seen, species static access can cause a sharper type
> to end up in the 'owner' field of the member reference info; on
> the other hand, a static access always lead to an erased 'owner'.
>
> Another detail worth mentioning is how __species is represented in
> the bytecode. Given the current lack of flags bit I've opted to
> use the last remaining bit 0x8000 - this is in fact the last
> unused bit that can be shared across class, field and method
> descriptors. Actually, this bit has already been used to encode
> the ACC_MANDATED flag in the MethodParameters attribute (as of JDK
> 8) - but since there's no other usage of that flag configuration
> outside MethodParameters it would seem safe to recycle it. Of
> course more compact approaches are also possible, but they would
> lead to different flag configurations for species static fields,
> methods and classes.
>
> Specialization
> =========
>
> Specializing species access is relatively straightforward:
>
> * both instance and species static members are copied in the
> specialization
> * static members are only copied in the erased specialization (and
> skipped otherwise)
> * ACC_SPECIES classes become regular classes when specialized
> * ACC_SPECIES methods/fields become static methods/fields in the
> specialization
> * <sclinit> becomes the new <clinit> in the specialization (and is
> omitted if the specialization is the erased specialization)
>
> The last bullet requires some extra care when handling the
> 'erased' specialization; consider the following example:
>
> class TestSpec<any X> {
> static String s_S = "HelloStatic";
> __species String s_SS = "HelloSpecies";
> }
>
> This class will end up with the following two synthetic methods:
>
> static void <clinit>();
> descriptor: ()V
> flags: ACC_STATIC
> Code:
> stack=1, locals=0, args_size=0
> 0: ldc #8 // String HelloStatic
> 2: putstatic #14 // Field
> s_S:Ljava/lang/String;
> 5: ldc #16 // String HelloSpecies
> 7: putstatic #19 // Field
> s_SS:Ljava/lang/String;
> 10: return
>
> species void <sclinit>();
> descriptor: ()V
> flags: ACC_SPECIES
> Code:
> stack=1, locals=1, args_size=1
> 0: ldc #16 // String HelloSpecies
> 2: putstatic #19 // Field
> s_SS:Ljava/lang/String;
> 5: return
>
> As it can be seen, the <clinit> method contains initialization
> code for both static and species static fields! To understand why
> this is so, let's consider how the specialized bits might be
> derived from the template class following the rules above. Let's
> consider a specialization like TestSpec<int>: in this case, we
> need to drop <clinit> (it's a static method and TestSpec<int> is
> not an erased specialization), and we also need to rename
> <sclinit> as <clinit> in the new specialization. All is fine - the
> specialization will contain the relevant code required to
> initialize its species static fields.
>
> Let's now turn to the erased specialization TestSpec<_> - this
> specialization receives both static and species static members.
> Now, if we were to follow the same rules for initializers, we'd
> end up with two different initializer methods - both <clinit> and
> <sclinit>. We could ask the specializer to merge them somehow, but
> that would be tricky and expensive. Instead, we simply (i) drop
> <sclinit> from the erased specialization and (ii) retain <clinit>.
> Of course this means that <clinit> must also contain
> initialization code for species static members.
>
> Bonus point: Generic methods
> ===================
>
> As pointed out by Brian, if we have species static classes we can
> translate static and species static specializable generic methods
> quite effectively. Consider this example:
>
> class TestGenMethods {
> static <any X> void m(X x) { ... }
>
> void test() {
> m(42);
> }
> }
>
> without species static, this would translate to:
>
> class TestGenMethods {
> static class TestGenMethods$m<any X> {
> void m(X z) { ... }
> }
>
> /* bridge */ void m(Object o) { new TestGenMethods$m().m(o); }
>
> void test() {
> new TestGenMethod$m<int>().m(42); // this is really done
> inside the BSM
> }
> }
>
> Note how the bridge (called by legacy code) will need to spin a
> new instance of the synthetic class and then call a method on it.
> The bootstrap used to dispatch static generic specializable calls
> also needs to do a very similar operation. But what if we turned
> the translated generic method into a species static method?
>
> class TestGenMethods {
> class TestGenMethods$m<any X> {
> __species void m(X z) { ... }
> }
>
> /* bridge */ void m(Object o) { TestGenMethods$m.m(o); }
>
> void test() {
> TestGenMethod$m<int>.m(42); // this is really done inside
> the BSM
> }
> }
>
> With species static, we can now access the method w/o needing any
> extra instance. This leads to simplification in both the bridging
> strategy and the bootstrap implementation. We can apply a similar
> simplification for dispatch of specializable species static calls
> - the only difference is that the synthetic holder class has also
> to be marked as species static (since it could access type-vars
> from the enclosing context).
>
> Bonus point: Access bridges
> =================
>
> Access bridges are a constant pain in the current translation
> strategy; such bridges are generated by the compiler to grant
> access to otherwise inaccessible members. Example:
>
> class Outer<any X> {
> private void m() { }
>
> class Inner {
> void test() {
> m();
> }
> }
> }
>
> This code will be translated as follows:
>
> class Outer<any X> {
>
> /* synthetic */ static access$m(Outer o) { o.m(); }
>
> private void m() { }
>
> class Inner {
> /*synthetic*/ Outer this$0;
>
> void test() {
> access$m(this$0);
> }
> }
> }
>
> That is, access to private members is translated with an access to
> an accessor bridge, which then performs access from the right
> location. Note that the accessor bridge is static (because
> otherwise it would be possible to maliciously override it to grant
> access to otherwise inaccessible members); since it's static,
> usual rules apply, so it cannot refer to type-variables, it cannot
> be specialized, etc. This means that there are cases with
> specialization where existing access bridge are not enough to
> guarantee access - if the access happens to cross specialization
> boundaries (i.e. accessing m() from an Outer<int>.Inner).
>
> Again, species static comes to the rescue:
>
> class Outer<any X> {
>
> /* synthetic */ __species access$m(Outer<X> o) { o.m(); }
>
> private void m() { }
>
> class Inner {
> /*synthetic*/ Outer this$0;
>
> void test() {
> Outer<X>.access$m(this$0);
> }
> }
> }
>
> Since the accessor bridge is now species static, it means it can
> now mention type variables (such as X); and it also means that
> when the bridge is accessed (from Inner), the qualifier type
> (Outer<X>) is guaranteed to remain sharp from the source code to
> the bytecode - which means that when this code will get
> specialized, all references to X will be dealt with accordingly
> (and the right accessor bridge will be accessed).
>
> Parting thoughts
> ==========
>
> On many levels, species statics seem to be the missing ingredient
> for implementing many of the tricks of our translation strategy,
> as well as to make it easier to express common idioms (i.e.
> type-dependent caches) in user code.
>
> Adding support for species static has proven to be harder than
> originally thought. This is mainly because the current world is
> split in two static levels: static and instance. When something is
> not static it's implicitly assumed to be instance, and viceversa.
> If we add a third static level to the picture, a lot of the
> existing code just doesn't work anymore, or has to be validated to
> check as to whether 'static' means 'legacy static' or 'species
> static' (or both).
>
> I started the implementation by treating static, species static
> and instance as completely separate static levels - with different
> internal flags, etc. but I soon realized that, while clean, this
> approach was invalidating too much of the existing implementation.
> More specifically, all the code snippets checking for static would
> now have been updated to check for static OR species static
> (overriding vs. hiding, access to 'this', access to 'super',
> generic bridges, ...). On the other hand, the places where the
> semantics of species static vs. static was different were quite
> limited:
>
> * membership/type substitution: a species static behaves like an
> instance member; the type variables of the owner are replaced into
> the member signature.
> * resolution: we need to implement the correct access rules as
> shown in the tables above.
> * code generation: an invokestatic involving a species static gets
> a sharp qualifier type
>
> This quickly led to the realization that it was instead easier to
> just treat 'species static' as a special case of 'static' - and
> then to add finer grained logic whenever we really needed the
> distinction. This led to a considerably easier patch, and I think
> that a similar consideration will hold for the JLS.
>
> [1] -
> http://hg.openjdk.java.net/valhalla/valhalla/langtools/rev/6949c3d06e8f
> [2] -
> http://hg.openjdk.java.net/valhalla/valhalla/jdk/rev/836efde938c1
> [3] -
> http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-February/000096.html
> [4] -
> http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-May/000147.html
>
> Maurizio
>
>
>
More information about the valhalla-spec-observers
mailing list