[vector] Vector API -- alignment with value types

Sat Jan 26 00:14:42 UTC 2019

> As I mentioned in the past, I think the high-order bit of where we want 
> to rotate the API is to ensure that when we have value types, the key 
> abstractions -- Species and Vector -- can be values, because then we'll 
> get most of the optimizations we want from the general properties of 
> values, rather than ad-hoc tricks surrounding the vector types.
> 
> Part I
> ------
> 
> Here's an idea for simplifying Species, which is: let's drive Species 
> down to be a simple class that is really just a constant holder for 
> (element type, shape), and move all the behavior to static methods on 
> XxxVector.  (At that point, Species can just be an enum, or a group of 
> enums with a common interface parent, if we like.)  I think we can 
> greatly reduce the importance of Species in the API, making XxxVector 
> the star player.
> 
> (The cost here is it becomes harder to write code that is agnostic to 
> _both_ element type and size -- but I am not convinced this is an 
> important use case?)
> 
> Here are the methods on Species currently:
> 
>   - Simple state methods: elementType, elementSize, shape, length, bitSize
>   - Generic factories: zero(), fromByteArray(), fromByteBuffer(), 
> maskXxx(), shuffleXxx()
>   - Transforms: reshape(), rebracket(), resize(), cast()
>   - Specialized factories: broadcast(), single(), random(), scalars(), 
> fromArray()
> 
> My somewhat radical suggestion is: let's get all of these, except the 
> first line, off of Species, and onto XxxVector, with versions that take 
> an explicit species argument, and versions that take no species argument 
> (defaulting to the preferred species for that shape.)
> 
> One positive result of this is that code that just wants to multiply int 
> vectors can remain _entirely ignorant of species_; you just use the 
> defaults:
> 
>      IntVector.fromArray(...).add(...).intoArray(...)
> 
> Only if users want to have finer control over the vector width do they 
> need to use species at all.

> A slightly negative result is that one loses the ability to write code 
> that is agnostic across both element type and width.

Another downside: for factories it means 2 overloads per operation - 
default and parameterized by Species.

Leaving parameterized variants on Species and introducing default ones 
(as static methods) looks more attractive to me.

> Another slightly negative result is that some of these methods will have 
> to dispatch on the species argument, which means we will need strong 
> constant propagation so that these dispatches fold away.  However, I 
> think those fold away in exactly the same cases that the virtual methods 
> on species do today, so I don't think this is necessarily a change in 
> reality.

Agree. It can even stay the same virtual call in the implementation:

     IntVector.fromArray(Species s, ...) { return s.fromArray(...); }

> At this point, we're ready for species to become values, which will only 
> help our constant propagation story.

> Part II
> -------
> 
> To make the Vector types value-ready, we have to flatten out the 
> inheritance hierarchy.  This is an easy enough game; we make the 
> concrete vector types into values, and the abstract vector types into 
> interfaces.  So we have:
> 
>      public interface Vector<T> { .. }
>      public interface IntVector <: Vector<Integer> { .. }
> 
>      private value class Int64Vector <: IntVector { .. }
> 
> This is easy enough, but I'm sure if we just did this, the carefully 
> crafted optimizations we've done for vectors would fail (at first) when 
> the abstract vector types become interfaces. We'll also need to ensure 
> that type sharpening / nullity analysis is up to the task, so we get 
> scalarization.  But this seems the straightest path to get from the API 
> we have to one that use value types.
> 
> Another path, slightly more circuitous, is to collapse the XxxNnVector 
> types to a single XxxVector value type, which looks something like:
> 
>      IntVector<T extends SuperLong> {
>          IntSpecies species;
>          T vector;
>      }
> 
> where there are SuperLong types for 64, 128, 256, and 512.  This gets us 
> away from using arrays, which is eventually where we'll want to go, but 
> it's a longer road.

As a first step, I'm in favor of the former proposal. Reducing the 
number of specializations is attractive, but additional performance work 
is needed to ensure JIT-compiler is capable of eliminating the boxes. 
So, I'd prefer to see it as a separate experiment.

Right now, the hierarchy is the following:

   public abstract class Vector<T> { .. }
   public abstract class IntVector extends Vector<Integer> { .. }

   /*package-private*/
   class Int64Vector implements IntVector { .. }

The only reason why Vector & IntVector are abstract classes is 
simplification of custom box elimination logic. My understanding is it 
shouldn't be too much work to extend it. The plan was to get a bare 
minimum to keep the implementation viable while waiting for value types.

Best regards,
Vladimir Ivanov