Species-static members vs singletons

Thu May 19 14:36:13 UTC 2016

We discussed two primary means to surface species-specific members in 
the language: a "species" placement (name TBD) as distinct from static 
and instance, or a "singleton" abstraction (a la Scala's "object" 
abstraction, as Peter L suggested).  We've done some experiments 
comparing the two approaches.

Separately, we discussed two strategies for handling this at the VM 
level: having three separate placements (ACC_STATIC, ACC_SPECIES, and 
instance) or retconning ACC_STATIC to mean "species" and using compiler 
trickery to simulate traditional statics.  In recent discussions with 
Oracle and IBM VM folks, they seemed happy enough with having a new 
placement (and possibly new bytecodes, {get,put,invoke}species, or 
overloading these onto *static with ParamTypes in the owner field of the 
various XxxRef constants.)

There are several places where the language itself can take advantage of 
species members:

1.  Reifying type variables.  For an any-generic class Foo<T,U>, the 
compiler can generate public static final reflection-thingie-valued 
fields called "T" and "U", which means that "aFoo.T" (as an ordinary 
field ref!) would evaluate to the reflective mirror for the reified T -- 
if present, otherwise it would evaluate to the reflective mirror for 
'erased'.

2.  Representation of generic methods.  The current translation strategy 
has us translating any-generic methods to classes; a static method

     static<any T> void foo(T t) { }

translates to a class (plus an erased bridge):

     bridge static foo(Object o) { ... invoke erased specialization ... }

     static class Xxx$foo<any T> {
         void foo(T t) { ... }
     }

This means that an instance of Xxx$foo is needed to invoke the method -- 
but serves solely to carry the type variables -- which is unfortunate.  
If instead we translate as:

     static class Xxx$foo<any T> {
*species-static *void foo(T t) { ... }
     }

then we can invoke this method via invokespecies:

     invokespecies ParamType[Xxx$foo, T_inf].foo(T_inf)

where T_inf is the erasure-normalized type inferred for T (reified if 
value, `erased` reference.)  No fake receiver required.

The translation for generic instance methods is still somewhat messier 
(will post separately), but still less messy than if we also had to 
manage / cache a receiver.

We also drafted some examples of how such a facility would be used, 
writing them both with species-static and with singleton. Examples and 
notes below; the summary is that in all cases, the species-static 
version is either better or about as good.

1.  The old favorite, caching an instantiated instance.

Species
	Singleton
class Collections {
     private static class Holder<any T> {
         private species List<T> empty = new EmptyList<T>();
     }

     static<any T> List<T> emptyList() { return Holder<T>.empty; }
}
	class Collections {
     private singleton Holder<any T> {
         private empty = new EmptyList<T>();
     }

     static<any T> List<T> emptyList() { return Holder<T>.empty; }
}

Note that in this case, species by itself isn't enough -- we still need 
a holder class, and its a bit ugly.  Arguably we could merge Holder into 
EmptyList (if that's under our control) but because Collections is an 
old-style "static bag" class (aka "sin bin"), we would still need a 
holder class for state. (Collections could share a single holder for 
multiple things; empty list, empty set, etc.)

Neither the left nor the right seems particularly better than the other 
here.  (If we were putting this method on Collection, where it would 
likely go in new code since now interfaces can have statics, the species 
approach would win, since we'd not need the holder class any more.)

2.  Instantiation tracking.

Species
	Singleton
class Foo<any T> {
     private species int count;
     private species List<Foo<T>> foos;

     public Foo() {
         ++count;
         foos.add(this);
     }
}
	class Foo<any T> {
     private singleton FooStuff<T> {
         private int count;
         private List<Foo<T>> foos;
     }

     public Foo() {
         ++Foo<T>.count;
         Foo<T>.foos.add(this);
     }
}

Because the state is directly tied to the instantiation, the left seems 
more attractive -- doesn't require an extra artifact, and the 
constructor body seems more straightforward.

3.  Implicit-like associations.  Here, we're caching type associations.  
For example, suppose we have a Box<T>, and we want to cache the 
associated class for List<T>.

Species
	Singleton
class Box<any T> {
     private species Class<List<T>> listClass
         = Class.forSpecialization(List, T.crass);
}
	class Box<any T> {
     private singleton ListBuddy<any T> {
         Class<List<T>> clazz
             = Class.forSpecialization(List, T.crass);
     }
}

The extra singleton declaration feels like "noise" here, because again 
the association is with the full set of type args for the class.

4.  Static factories.  Arguably, it makes sense to move factories to the 
types they describe.

Species
	Singleton
interface List<any T> {
     private species List<T> empty = new EmptyList<>();
     species List<T> emptyList() { return empty; }
}
	interface List<any T> {
     private singleton Stuff<any T> {
         List<T> empty = new EmptyList<>();
     }
     species List<T> emptyList() { return Stuff<T>.empty; }
}

In this model, you'd get an empty list with

     List<T> aList = List<T>.empty()
rather than
List<T> aList = Collections.<T>empty();

In the latter, the type witnesses can be omitted; in the former they 
probably can be as well but that's something new.

5.  Typevar shredding.  Here, we have separate state for different 
subsets of variables.  This should be the place where the singleton 
approach shines.

Species
	Singleton
class HashMap<any K, any V> {
     private static class Keys<any K> {
         species Set<K> allKeys = ...
     }

     private static class Vals<any V> {
species Set<V> allVals = ...
     }

     void put(K k, V v) {
         Keys<K>.allKeys.add(k);
Vals<V>.allVals.add(v);
     }
}
	class HashMap<any K, any V> {
     private singleton Keys<any K> {
         Set<K> allKeys = ...
     }

     private singleton Vals<any V> {
Set<V> allVals = ...
     }

     void put(K k, V v) {
         Keys<K>.allKeys.add(k);
Vals<V>.allVals.add(v);
     }
}

But, it doesn't really shine that much; the left is not really much 
worse than the right, just a little more fussy.

In cases where the singleton approach is more natural, the corresponding 
"species in static class" idiom isn't so bad either.  But in cases where 
the species approach is more natural, there's something unappealing 
about creating classes (both in source and runtime footprint) in cases 
2/3/4 when we don't need one. The only place where the singleton 
approach seems to win big is when there are multiple variables in the 
same scope bound by invariants -- here, the singleton having a ctor is a 
big win -- but how often does this happen?

So our conclusion is that the species-placement is as good or better for 
the identified use cases -- and it also fits cleanly into the existing 
model for member placement.