species static prototype
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri May 27 20:56:05 UTC 2016
Hi,
over the last few days I've been busy putting together a prototype [1,
2] of javac/runtime support for species static. I guess it could be
considered an prototype implementation of the approach that Bjorn has
described as "Repurpose existing statics" [4] in his nice writeup.
Here's what I have learned during the experience.
Parser
====
The prototype uses a no fuss approach where '__species' is the modifier
to denote species static stuff (of course a better syntax will have to
be picked at some point, but that's not the goal of the current
exercise). This means you can write:
class Foo<X> {
String i; //instance field
static String s; //static field
__species String ss; //species static field
}
This is obviously good enough for the time being.
A complication with parsing occurs when accessing species members; in
fact, species members can be accessed via their fully qualified type
(including all required type-arguments, if necessary).
Foo<String>.ss;
Foo<int>.ss;
The above are all valid species access expression. Now, adding this kind
of support in the parser is always tricky - as we have to battle with
ambiguities which might pop up. Luckily, this pattern is similar enough
to the one we use for method references - i.e. :
Foo<String>::ss
Which the compiler already had to special case; so I ended up slightly
generalizing what we did in JDK 8 method reference parsing, and I got
something working reasonably quick. But this could be an area where
coming up with a clean spec might be tricky (as the impl uses abundant
lookahead to disambiguate this one).
Resolution
======
The basic idea is to divide the world in three static levels, whose
properties are summarized in the table below:
enclosing type
enclosing instance
instance
yes
yes
species
yes
no
static
no
no
So, in terms of who can access what, it follows that if we consider
'instance' to be the highest static level and 'static' to be the lowest,
then it's ok for a member with static level S1 to access another member
of static level S2 provided that S1 >= S2. Or, with a table:
from/to
instance
species
static
instance
yes
yes
yes
species
no
yes
yes
static
no
no
yes
So, let's look at a concrete example:
class TestResolution {
static void m_S() {
m_S(); //ok
m_SS(); //error
m_I(); //error
}
__species void m_SS() {
m_S(); //ok
m_SS(); //ok
m_I(); //error
}
__species void m_I() {
m_S(); //ok
m_SS(); //ok
m_I(); //ok
}
}
A crucial property, of course, is that species static members can
reference to any type vars in the enclosing context:
class TestTypeVar<X> {
static void m_S() {
X x; //error
}
__species void m_SS() {
X x; //ok
}
__species void m_I() {
X x; //ok
}
}
Nesting
=====
Another concept that needs generalization is that of allowed nesting;
consider the following program:
class TestNesting1 {
class MemberInner {
static String s_S; //error
String s_I; //ok
}
static class StaticInner {
static String s_S; //ok
String s_I; //ok
}
}
That is, the compiler will only allow you to declare static members in
toplevel classes or in static nested classes (which, after all, act as
toplevel classes). Now that we are adding a new static level to the
picture, how are the nesting rules affected?
Looking at the table above, if we consider 'instance' to be the highest
static level and 'static' to be the lowest, then it's ok for a member
with static level S1 to declare a member of static level S2 provided
that S1 <= S2. Again, we can look at this in a tabular fashion:
declaring/declared
instance
species
static
instance
yes
no
no
species
yes
yes
no
static
yes
yes
yes
This also seems like a nice generalization of the current rules. The
rationale behind these rules is to basically, guarantee some invariants
during member lookup; let's say that we are in a nested class with
static level S1 - then, by the rule above, it follows that any member
nested in this class will be able to access another member with static
level S1 declared in this class or in any lexically enclosing class.
A full example of nesting rules is given below:
class TestNesting2 {
class MemberInner {
static String s_S; //error
__species String s_SS; //error
String s_I; //ok
}
__species class StaticInner {
static String s_S; //error
__species String s_SS; //ok
String s_I; //ok
}
static class StaticInner {
static String s_S; //ok
__species String s_SS; //ok
String s_I; //ok
}
}
Unchecked access
===========
Because of an unfortunate interplay between species and erasure, code
using species members is potentially unsound (the example below is a
variation of an example first discovered by Peter's example [3] in this
very mailing list):
public class Foo<any T> {
__species T cache;
}
Foo<String>.cache = "Hello";
Integer i = Foo<Integer>.cache; //whoops
To prevent cases like these, the compiler implements a check which looks
at the qualifier of a species access; if such qualifier (either
explicit, or implicit) cannot be proven to be reifiable, an unchecked
warning is issued.
Note that it is possible to restrict such warnings only to cases where
the signature of the accessed species static member changes under
erasure. E.g. in the above example, accessing 'cache' is unchecked,
because the type of 'cache' contains type-variables; but if another
species static field was accessed whose type did not depend on
type-variables, then the access should be considered sound.
Species initializers
===========
In our model we have three static levels - but we have initialization
artifacts for only two of those; we need to fix that:
instance
<init>
species
<sclinit>
static
<clinit>
That is, a new <sclinit> method is added to a class containing one or
more species variables with an initializer. This method is used to hoist
the initialization code for all the species variables.
Forward references
============
Rules for detecting forward references have to be extended accordingly.
A forward reference occurs whenever there's an attempt to reference a
variable from a position P, where the variable declaration occurs in a
position P' > P. Currently, the rules for forward references allow an
instance variable to forward-reference a static variable - as shown below:
class TestForwardRef {
String s = s_S;
static String s_S = "Hello!";
}
The rationale behind this is that, by the time we see the instance
initializer for 's' we would have already executed the code for
initializing 's_S' (as initialization will occur in different methods,
<init> and <clinit> respectively, see section above). With the new
static level, the forward reference rules have to be redefined according
to the table below:
from/to
instance
species
static
instance
forward ref
ok ok
species
illegal
forward ref
ok
static
illegal
illegal
forward ref
In other words, it's ok to forward reference a variable whose static
level is lower than that available where the reference occurs. An
example is given below:
class TestForwardRef2 {
String s1_I = s_S; //ok
String s2_I = s_SS; //ok
String s1_S = s_S; //error!
String s1_SS = s_S; //ok
String s2_SS = s_SS; //error!
static String s_S = "Hello!";
__species String s_SS = "Hello Species!";
}
This is an extension of the above principle: since instance variables
are initialized in <init>, they can reference variables initialized in
<clinit> or <sclinit>. If a variable is initialized in <sclinit> it can
similarly safely reference a variable initialized in <clinit>. Another
way to think of this is that a forward reference error only occurs if
the static level of the referenced symbol is the same as the static
level where the reference occurs. All other cases are either illegal
(i.e. because it's an attempt to go from a lower static level to an
higher one) or valid (because it can be guaranteed that the code
initializing the referenced variable has already been executed).
Code generation
==========
Javac currently emits invokestatic/getstatic/putstatic for both legacy
static and species static access. javac will use the 'owner' field of a
CONSTANT_MethodRef, CONSTANT_FieldRef constants to point to the sharp
type of the species access (through a constant pool type entry). Static
access will always see an erased owner.
Consider this example:
class TestGen<any X> {
__species void m_SS() { }
static void m_S() { }
public static void main(String args) {
TestGen<String>.m_SS();
TestGen<int>.m_SS();
TestGen<String>.m_S();
TestGen<int>.m_S();
}
}
The generated code in the 'main' method is reported below:
0: invokestatic #11 // Method TestGen<_>.m_SS:()V
3: invokestatic #15 // Method TestGen<I>.m_SS:()V
6: invokestatic #18 // Method TestGen<_>.m_S:()V
9: invokestatic #18 // Method TestGen<_>.m_S:()V
As it can be seen, species static access can cause a sharper type to end
up in the 'owner' field of the member reference info; on the other hand,
a static access always lead to an erased 'owner'.
Another detail worth mentioning is how __species is represented in the
bytecode. Given the current lack of flags bit I've opted to use the last
remaining bit 0x8000 - this is in fact the last unused bit that can be
shared across class, field and method descriptors. Actually, this bit
has already been used to encode the ACC_MANDATED flag in the
MethodParameters attribute (as of JDK 8) - but since there's no other
usage of that flag configuration outside MethodParameters it would seem
safe to recycle it. Of course more compact approaches are also possible,
but they would lead to different flag configurations for species static
fields, methods and classes.
Specialization
=========
Specializing species access is relatively straightforward:
* both instance and species static members are copied in the specialization
* static members are only copied in the erased specialization (and
skipped otherwise)
* ACC_SPECIES classes become regular classes when specialized
* ACC_SPECIES methods/fields become static methods/fields in the
specialization
* <sclinit> becomes the new <clinit> in the specialization (and is
omitted if the specialization is the erased specialization)
The last bullet requires some extra care when handling the 'erased'
specialization; consider the following example:
class TestSpec<any X> {
static String s_S = "HelloStatic";
__species String s_SS = "HelloSpecies";
}
This class will end up with the following two synthetic methods:
static void <clinit>();
descriptor: ()V
flags: ACC_STATIC
Code:
stack=1, locals=0, args_size=0
0: ldc #8 // String HelloStatic
2: putstatic #14 // Field
s_S:Ljava/lang/String;
5: ldc #16 // String HelloSpecies
7: putstatic #19 // Field
s_SS:Ljava/lang/String;
10: return
species void <sclinit>();
descriptor: ()V
flags: ACC_SPECIES
Code:
stack=1, locals=1, args_size=1
0: ldc #16 // String HelloSpecies
2: putstatic #19 // Field
s_SS:Ljava/lang/String;
5: return
As it can be seen, the <clinit> method contains initialization code for
both static and species static fields! To understand why this is so,
let's consider how the specialized bits might be derived from the
template class following the rules above. Let's consider a
specialization like TestSpec<int>: in this case, we need to drop
<clinit> (it's a static method and TestSpec<int> is not an erased
specialization), and we also need to rename <sclinit> as <clinit> in the
new specialization. All is fine - the specialization will contain the
relevant code required to initialize its species static fields.
Let's now turn to the erased specialization TestSpec<_> - this
specialization receives both static and species static members. Now, if
we were to follow the same rules for initializers, we'd end up with two
different initializer methods - both <clinit> and <sclinit>. We could
ask the specializer to merge them somehow, but that would be tricky and
expensive. Instead, we simply (i) drop <sclinit> from the erased
specialization and (ii) retain <clinit>. Of course this means that
<clinit> must also contain initialization code for species static members.
Bonus point: Generic methods
===================
As pointed out by Brian, if we have species static classes we can
translate static and species static specializable generic methods quite
effectively. Consider this example:
class TestGenMethods {
static <any X> void m(X x) { ... }
void test() {
m(42);
}
}
without species static, this would translate to:
class TestGenMethods {
static class TestGenMethods$m<any X> {
void m(X z) { ... }
}
/* bridge */ void m(Object o) { new TestGenMethods$m().m(o); }
void test() {
new TestGenMethod$m<int>().m(42); // this is really done inside
the BSM
}
}
Note how the bridge (called by legacy code) will need to spin a new
instance of the synthetic class and then call a method on it. The
bootstrap used to dispatch static generic specializable calls also needs
to do a very similar operation. But what if we turned the translated
generic method into a species static method?
class TestGenMethods {
class TestGenMethods$m<any X> {
__species void m(X z) { ... }
}
/* bridge */ void m(Object o) { TestGenMethods$m.m(o); }
void test() {
TestGenMethod$m<int>.m(42); // this is really done inside the BSM
}
}
With species static, we can now access the method w/o needing any extra
instance. This leads to simplification in both the bridging strategy and
the bootstrap implementation. We can apply a similar simplification for
dispatch of specializable species static calls - the only difference is
that the synthetic holder class has also to be marked as species static
(since it could access type-vars from the enclosing context).
Bonus point: Access bridges
=================
Access bridges are a constant pain in the current translation strategy;
such bridges are generated by the compiler to grant access to otherwise
inaccessible members. Example:
class Outer<any X> {
private void m() { }
class Inner {
void test() {
m();
}
}
}
This code will be translated as follows:
class Outer<any X> {
/* synthetic */ static access$m(Outer o) { o.m(); }
private void m() { }
class Inner {
/*synthetic*/ Outer this$0;
void test() {
access$m(this$0);
}
}
}
That is, access to private members is translated with an access to an
accessor bridge, which then performs access from the right location.
Note that the accessor bridge is static (because otherwise it would be
possible to maliciously override it to grant access to otherwise
inaccessible members); since it's static, usual rules apply, so it
cannot refer to type-variables, it cannot be specialized, etc. This
means that there are cases with specialization where existing access
bridge are not enough to guarantee access - if the access happens to
cross specialization boundaries (i.e. accessing m() from an
Outer<int>.Inner).
Again, species static comes to the rescue:
class Outer<any X> {
/* synthetic */ __species access$m(Outer<X> o) { o.m(); }
private void m() { }
class Inner {
/*synthetic*/ Outer this$0;
void test() {
Outer<X>.access$m(this$0);
}
}
}
Since the accessor bridge is now species static, it means it can now
mention type variables (such as X); and it also means that when the
bridge is accessed (from Inner), the qualifier type (Outer<X>) is
guaranteed to remain sharp from the source code to the bytecode - which
means that when this code will get specialized, all references to X will
be dealt with accordingly (and the right accessor bridge will be accessed).
Parting thoughts
==========
On many levels, species statics seem to be the missing ingredient for
implementing many of the tricks of our translation strategy, as well as
to make it easier to express common idioms (i.e. type-dependent caches)
in user code.
Adding support for species static has proven to be harder than
originally thought. This is mainly because the current world is split in
two static levels: static and instance. When something is not static
it's implicitly assumed to be instance, and viceversa. If we add a third
static level to the picture, a lot of the existing code just doesn't
work anymore, or has to be validated to check as to whether 'static'
means 'legacy static' or 'species static' (or both).
I started the implementation by treating static, species static and
instance as completely separate static levels - with different internal
flags, etc. but I soon realized that, while clean, this approach was
invalidating too much of the existing implementation. More specifically,
all the code snippets checking for static would now have been updated to
check for static OR species static (overriding vs. hiding, access to
'this', access to 'super', generic bridges, ...). On the other hand, the
places where the semantics of species static vs. static was different
were quite limited:
* membership/type substitution: a species static behaves like an
instance member; the type variables of the owner are replaced into the
member signature.
* resolution: we need to implement the correct access rules as shown in
the tables above.
* code generation: an invokestatic involving a species static gets a
sharp qualifier type
This quickly led to the realization that it was instead easier to just
treat 'species static' as a special case of 'static' - and then to add
finer grained logic whenever we really needed the distinction. This led
to a considerably easier patch, and I think that a similar consideration
will hold for the JLS.
[1] -
http://hg.openjdk.java.net/valhalla/valhalla/langtools/rev/6949c3d06e8f
[2] - http://hg.openjdk.java.net/valhalla/valhalla/jdk/rev/836efde938c1
[3] -
http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-February/000096.html
[4] -
http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-May/000147.html
Maurizio
More information about the valhalla-spec-observers
mailing list