Species-static members vs singletons

Mon May 23 14:27:49 UTC 2016


On 23/05/16 15:20, Brian Goetz wrote:
> Right.  And Peter’s question is: (a) did we think of this (yes) and 
> (b) are we OK with this.  Which I think is also yes?
I think it's yes; an unfortunate accident of erasure - I don't see any 
other way around it at the moment.

Maurizio
>
>> On May 23, 2016, at 7:18 AM, Maurizio Cimadamore 
>> <maurizio.cimadamore at oracle.com 
>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>> Sorry - I now realize that the point I made in my earlier email was 
>> unclear.
>>
>> What I'm suggesting is to have a single rule for generating unchecked 
>> warnings that goes like this:
>>
>> "If the qualifier of a species static access is not reifiable, an 
>> unchecked warning should occur".
>>
>> In the example Peter sent, the only thing worth mentioning is that 
>> the qualifier is 'implicit' (i.e. can be omitted and be assumed to be 
>> the current class Foo<T>); now since Foo<T> is not reifiable, every 
>> unqualified access to 'st' from Foo<T> will get a warning - 
>> excluding, of course, accesses occurring in a context where T is 
>> restricted (i.e. __WhereVal(T)).
>>
>> Maurizio
>>
>> On 23/05/16 14:56, Brian Goetz wrote:
>>> Note that we have this same problem with unchecked warnings today in 
>>> many of the use cases.  For example, in the “cached empty list” 
>>> case, we always have to use an unchecked cast to cast the cached 
>>> list to the desired type.  When we use species-static to do the 
>>> same, and it is possible that the species could correspond to more 
>>> than one T, we still have to do the same unchecked warning (and as 
>>> you mention, the singleton form has the same problem.)  I think its 
>>> an unescapable consequence of erasure, but one we’re already sort of 
>>> comfortable with.
>>>
>>> If you use a more constrained type selector (e.g., List<int>), you 
>>> won’t get a warning, as the compiler will know that st is exactly int.
>>>
>>>> On May 23, 2016, at 3:05 AM, Maurizio Cimadamore 
>>>> <maurizio.cimadamore at oracle.com 
>>>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>>
>>>> Hi Peter,
>>>> are you sure we need special treatment for 'it = st' ? After all, 
>>>> the compiler will issue unchecked warnings every time you'll try to 
>>>> access a species static from a non-reifiable type i.e.
>>>>
>>>> Foo<String>.st = ""; //warn
>>>> Foo<int>.st = 42; //no warn
>>>>
>>>> In other words, can we put the burden of heap pollution-ness on the 
>>>> client and be happy?
>>>>
>>>> Maurizio
>>>>
>>>> On 22/05/16 23:58, Peter Levart wrote:
>>>>> Hi Brian,
>>>>>
>>>>> I agree that "species" placement is a better, less verbose option. 
>>>>> But how to solve the language problem of having "species" and 
>>>>> "instance" members of the same "type-variable" type be assignable 
>>>>> to one-another? For example:
>>>>>
>>>>> class Foo<any T> {
>>>>>     species T st;
>>>>>     T it;
>>>>>
>>>>>     void m() {
>>>>>         it = st; // this can not be allowed
>>>>>         st = it; // this can be allowed
>>>>>
>>>>>         // maybe this could be allowed?
>>>>>         @SuppressWarnings("unchecked")
>>>>>         it = (T) st;
>>>>>     }
>>>>>
>>>>>
>>>>> Singleton abstraction has the same problem.
>>>>>
>>>>> So while technically possible, it would be weird to have 'T' 
>>>>> sometimes not be assignable to 'T'. Can we live with that?
>>>>>
>>>>> Regards, Peter
>>>>>
>>>>> On 05/19/2016 04:36 PM, Brian Goetz wrote:
>>>>>> We discussed two primary means to surface species-specific 
>>>>>> members in the language: a "species" placement (name TBD) as 
>>>>>> distinct from static and instance, or a "singleton" abstraction 
>>>>>> (a la Scala's "object" abstraction, as Peter L suggested).  We've 
>>>>>> done some experiments comparing the two approaches.
>>>>>>
>>>>>> Separately, we discussed two strategies for handling this at the 
>>>>>> VM level: having three separate placements (ACC_STATIC, 
>>>>>> ACC_SPECIES, and instance) or retconning ACC_STATIC to mean 
>>>>>> "species" and using compiler trickery to simulate traditional 
>>>>>> statics.  In recent discussions with Oracle and IBM VM folks, 
>>>>>> they seemed happy enough with having a new placement (and 
>>>>>> possibly new bytecodes, {get,put,invoke}species, or overloading 
>>>>>> these onto *static with ParamTypes in the owner field of the 
>>>>>> various XxxRef constants.)
>>>>>>
>>>>>>
>>>>>> There are several places where the language itself can take 
>>>>>> advantage of species members:
>>>>>>
>>>>>> 1.  Reifying type variables.  For an any-generic class Foo<T,U>, 
>>>>>> the compiler can generate public static final 
>>>>>> reflection-thingie-valued fields called "T" and "U", which means 
>>>>>> that "aFoo.T" (as an ordinary field ref!) would evaluate to the 
>>>>>> reflective mirror for the reified T -- if present, otherwise it 
>>>>>> would evaluate to the reflective mirror for 'erased'.
>>>>>>
>>>>>> 2.  Representation of generic methods.  The current translation 
>>>>>> strategy has us translating any-generic methods to classes; a 
>>>>>> static method
>>>>>>
>>>>>>     static<any T> void foo(T t) { }
>>>>>>
>>>>>> translates to a class (plus an erased bridge):
>>>>>>
>>>>>>     bridge static foo(Object o) { ... invoke erased 
>>>>>> specialization ... }
>>>>>>
>>>>>>     static class Xxx$foo<any T> {
>>>>>>         void foo(T t) { ... }
>>>>>>     }
>>>>>>
>>>>>> This means that an instance of Xxx$foo is needed to invoke the 
>>>>>> method -- but serves solely to carry the type variables -- which 
>>>>>> is unfortunate.  If instead we translate as:
>>>>>>
>>>>>>     static class Xxx$foo<any T> {
>>>>>> *species-static *void foo(T t) { ... }
>>>>>>     }
>>>>>>
>>>>>> then we can invoke this method via invokespecies:
>>>>>>
>>>>>>     invokespecies ParamType[Xxx$foo, T_inf].foo(T_inf)
>>>>>>
>>>>>> where T_inf is the erasure-normalized type inferred for T 
>>>>>> (reified if value, `erased` reference.)  No fake receiver required.
>>>>>>
>>>>>> The translation for generic instance methods is still somewhat 
>>>>>> messier (will post separately), but still less messy than if we 
>>>>>> also had to manage / cache a receiver.
>>>>>>
>>>>>>
>>>>>> We also drafted some examples of how such a facility would be 
>>>>>> used, writing them both with species-static and with singleton.  
>>>>>> Examples and notes below; the summary is that in all cases, the 
>>>>>> species-static version is either better or about as good.
>>>>>>
>>>>>>
>>>>>>
>>>>>> 1.  The old favorite, caching an instantiated instance.
>>>>>>
>>>>>> Species
>>>>>> 	Singleton
>>>>>> class Collections {
>>>>>>     private static class Holder<any T> {
>>>>>>         private species List<T> empty = new EmptyList<T>();
>>>>>>     }
>>>>>>
>>>>>>     static<any T> List<T> emptyList() { return Holder<T>.empty; }
>>>>>> }
>>>>>> 	class Collections {
>>>>>>     private singleton Holder<any T> {
>>>>>>         private empty = new EmptyList<T>();
>>>>>>     }
>>>>>>
>>>>>>     static<any T> List<T> emptyList() { return Holder<T>.empty; }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Note that in this case, species by itself isn't enough -- we 
>>>>>> still need a holder class, and its a bit ugly.  Arguably we could 
>>>>>> merge Holder into EmptyList (if that's under our control) but 
>>>>>> because Collections is an old-style "static bag" class (aka "sin 
>>>>>> bin"), we would still need a holder class for state. (Collections 
>>>>>> could share a single holder for multiple things; empty list, 
>>>>>> empty set, etc.)
>>>>>>
>>>>>> Neither the left nor the right seems particularly better than the 
>>>>>> other here.  (If we were putting this method on Collection, where 
>>>>>> it would likely go in new code since now interfaces can have 
>>>>>> statics, the species approach would win, since we'd not need the 
>>>>>> holder class any more.)
>>>>>>
>>>>>>
>>>>>> 2.  Instantiation tracking.
>>>>>>
>>>>>> Species
>>>>>> 	Singleton
>>>>>> class Foo<any T> {
>>>>>>     private species int count;
>>>>>>     private species List<Foo<T>> foos;
>>>>>>
>>>>>>     public Foo() {
>>>>>>         ++count;
>>>>>>         foos.add(this);
>>>>>>     }
>>>>>> }
>>>>>> 	class Foo<any T> {
>>>>>>     private singleton FooStuff<T> {
>>>>>>         private int count;
>>>>>>         private List<Foo<T>> foos;
>>>>>>     }
>>>>>>
>>>>>>     public Foo() {
>>>>>> ++Foo<T>.count;
>>>>>> Foo<T>.foos.add(this);
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Because the state is directly tied to the instantiation, the left 
>>>>>> seems more attractive -- doesn't require an extra artifact, and 
>>>>>> the constructor body seems more straightforward.
>>>>>>
>>>>>>
>>>>>> 3.  Implicit-like associations. Here, we're caching type 
>>>>>> associations.  For example, suppose we have a Box<T>, and we want 
>>>>>> to cache the associated class for List<T>.
>>>>>>
>>>>>>
>>>>>> Species
>>>>>> 	Singleton
>>>>>> class Box<any T> {
>>>>>>     private species Class<List<T>> listClass
>>>>>>         = Class.forSpecialization(List, T.crass);
>>>>>> }
>>>>>> 	class Box<any T> {
>>>>>>     private singleton ListBuddy<any T> {
>>>>>> Class<List<T>> clazz
>>>>>>             = Class.forSpecialization(List, T.crass);
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> The extra singleton declaration feels like "noise" here, because 
>>>>>> again the association is with the full set of type args for the 
>>>>>> class.
>>>>>>
>>>>>>
>>>>>> 4.  Static factories.  Arguably, it makes sense to move factories 
>>>>>> to the types they describe.
>>>>>>
>>>>>> Species
>>>>>> 	Singleton
>>>>>> interface List<any T> {
>>>>>>     private species List<T> empty = new EmptyList<>();
>>>>>>     species List<T> emptyList() { return empty; }
>>>>>> }
>>>>>> 	interface List<any T> {
>>>>>>     private singleton Stuff<any T> {
>>>>>>         List<T> empty = new EmptyList<>();
>>>>>>     }
>>>>>>     species List<T> emptyList() { return Stuff<T>.empty; }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> In this model, you'd get an empty list with
>>>>>>
>>>>>>     List<T> aList = List<T>.empty()
>>>>>> rather than
>>>>>> List<T> aList = Collections.<T>empty();
>>>>>>
>>>>>> In the latter, the type witnesses can be omitted; in the former 
>>>>>> they probably can be as well but that's something new.
>>>>>>
>>>>>>
>>>>>> 5.  Typevar shredding.  Here, we have separate state for 
>>>>>> different subsets of variables.  This should be the place where 
>>>>>> the singleton approach shines.
>>>>>>
>>>>>>
>>>>>> Species
>>>>>> 	Singleton
>>>>>> class HashMap<any K, any V> {
>>>>>>     private static class Keys<any K> {
>>>>>>         species Set<K> allKeys = ...
>>>>>>     }
>>>>>>
>>>>>>     private static class Vals<any V> {
>>>>>> species Set<V> allVals = ...
>>>>>>     }
>>>>>>
>>>>>>     void put(K k, V v) {
>>>>>> Keys<K>.allKeys.add(k);
>>>>>> Vals<V>.allVals.add(v);
>>>>>>     }
>>>>>> }
>>>>>> 	class HashMap<any K, any V> {
>>>>>>     private singleton Keys<any K> {
>>>>>>         Set<K> allKeys = ...
>>>>>>     }
>>>>>>
>>>>>>     private singleton Vals<any V> {
>>>>>> Set<V> allVals = ...
>>>>>>     }
>>>>>>
>>>>>>     void put(K k, V v) {
>>>>>> Keys<K>.allKeys.add(k);
>>>>>> Vals<V>.allVals.add(v);
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>>> But, it doesn't really shine that much; the left is not really 
>>>>>> much worse than the right, just a little more fussy.
>>>>>>
>>>>>> In cases where the singleton approach is more natural, the 
>>>>>> corresponding "species in static class" idiom isn't so bad 
>>>>>> either.  But in cases where the species approach is more natural, 
>>>>>> there's something unappealing about creating classes (both in 
>>>>>> source and runtime footprint) in cases 2/3/4 when we don't need 
>>>>>> one. The only place where the singleton approach seems to win big 
>>>>>> is when there are multiple variables in the same scope bound by 
>>>>>> invariants -- here, the singleton having a ctor is a big win -- 
>>>>>> but how often does this happen?
>>>>>>
>>>>>>
>>>>>> So our conclusion is that the species-placement is as good or 
>>>>>> better for the identified use cases -- and it also fits cleanly 
>>>>>> into the existing model for member placement.
>>>>>
>>>>
>>>
>>
>