[vector] Vector API -- alignment with value types

Fri Jan 25 21:00:14 UTC 2019

Coming back to this after a long break...

As I mentioned in the past, I think the high-order bit of where we want 
to rotate the API is to ensure that when we have value types, the key 
abstractions -- Species and Vector -- can be values, because then we'll 
get most of the optimizations we want from the general properties of 
values, rather than ad-hoc tricks surrounding the vector types.

Part I
------

Here's an idea for simplifying Species, which is: let's drive Species 
down to be a simple class that is really just a constant holder for 
(element type, shape), and move all the behavior to static methods on 
XxxVector.  (At that point, Species can just be an enum, or a group of 
enums with a common interface parent, if we like.)  I think we can 
greatly reduce the importance of Species in the API, making XxxVector 
the star player.

(The cost here is it becomes harder to write code that is agnostic to 
_both_ element type and size -- but I am not convinced this is an 
important use case?)

Here are the methods on Species currently:

  - Simple state methods: elementType, elementSize, shape, length, bitSize
  - Generic factories: zero(), fromByteArray(), fromByteBuffer(), 
maskXxx(), shuffleXxx()
  - Transforms: reshape(), rebracket(), resize(), cast()
  - Specialized factories: broadcast(), single(), random(), scalars(), 
fromArray()

My somewhat radical suggestion is: let's get all of these, except the 
first line, off of Species, and onto XxxVector, with versions that take 
an explicit species argument, and versions that take no species argument 
(defaulting to the preferred species for that shape.)

One positive result of this is that code that just wants to multiply int 
vectors can remain _entirely ignorant of species_; you just use the 
defaults:

     IntVector.fromArray(...).add(...).intoArray(...)

Only if users want to have finer control over the vector width do they 
need to use species at all.

A slightly negative result is that one loses the ability to write code 
that is agnostic across both element type and width.

Another slightly negative result is that some of these methods will have 
to dispatch on the species argument, which means we will need strong 
constant propagation so that these dispatches fold away.  However, I 
think those fold away in exactly the same cases that the virtual methods 
on species do today, so I don't think this is necessarily a change in 
reality.

At this point, we're ready for species to become values, which will only 
help our constant propagation story.

Part II
-------

To make the Vector types value-ready, we have to flatten out the 
inheritance hierarchy.  This is an easy enough game; we make the 
concrete vector types into values, and the abstract vector types into 
interfaces.  So we have:

     public interface Vector<T> { .. }
     public interface IntVector <: Vector<Integer> { .. }

     private value class Int64Vector <: IntVector { .. }

This is easy enough, but I'm sure if we just did this, the carefully 
crafted optimizations we've done for vectors would fail (at first) when 
the abstract vector types become interfaces. We'll also need to ensure 
that type sharpening / nullity analysis is up to the task, so we get 
scalarization.  But this seems the straightest path to get from the API 
we have to one that use value types.

Another path, slightly more circuitous, is to collapse the XxxNnVector 
types to a single XxxVector value type, which looks something like:

     IntVector<T extends SuperLong> {
         IntSpecies species;
         T vector;
     }

where there are SuperLong types for 64, 128, 256, and 512.  This gets us 
away from using arrays, which is eventually where we'll want to go, but 
it's a longer road.

Thoughts on these directions?