[vector] Vector API -- alignment with value types
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Sat Jan 26 00:14:42 UTC 2019
> As I mentioned in the past, I think the high-order bit of where we want
> to rotate the API is to ensure that when we have value types, the key
> abstractions -- Species and Vector -- can be values, because then we'll
> get most of the optimizations we want from the general properties of
> values, rather than ad-hoc tricks surrounding the vector types.
>
> Part I
> ------
>
> Here's an idea for simplifying Species, which is: let's drive Species
> down to be a simple class that is really just a constant holder for
> (element type, shape), and move all the behavior to static methods on
> XxxVector. (At that point, Species can just be an enum, or a group of
> enums with a common interface parent, if we like.) I think we can
> greatly reduce the importance of Species in the API, making XxxVector
> the star player.
>
> (The cost here is it becomes harder to write code that is agnostic to
> _both_ element type and size -- but I am not convinced this is an
> important use case?)
>
> Here are the methods on Species currently:
>
> - Simple state methods: elementType, elementSize, shape, length, bitSize
> - Generic factories: zero(), fromByteArray(), fromByteBuffer(),
> maskXxx(), shuffleXxx()
> - Transforms: reshape(), rebracket(), resize(), cast()
> - Specialized factories: broadcast(), single(), random(), scalars(),
> fromArray()
>
> My somewhat radical suggestion is: let's get all of these, except the
> first line, off of Species, and onto XxxVector, with versions that take
> an explicit species argument, and versions that take no species argument
> (defaulting to the preferred species for that shape.)
>
> One positive result of this is that code that just wants to multiply int
> vectors can remain _entirely ignorant of species_; you just use the
> defaults:
>
> IntVector.fromArray(...).add(...).intoArray(...)
>
> Only if users want to have finer control over the vector width do they
> need to use species at all.
> A slightly negative result is that one loses the ability to write code
> that is agnostic across both element type and width.
Another downside: for factories it means 2 overloads per operation -
default and parameterized by Species.
Leaving parameterized variants on Species and introducing default ones
(as static methods) looks more attractive to me.
> Another slightly negative result is that some of these methods will have
> to dispatch on the species argument, which means we will need strong
> constant propagation so that these dispatches fold away. However, I
> think those fold away in exactly the same cases that the virtual methods
> on species do today, so I don't think this is necessarily a change in
> reality.
Agree. It can even stay the same virtual call in the implementation:
IntVector.fromArray(Species s, ...) { return s.fromArray(...); }
> At this point, we're ready for species to become values, which will only
> help our constant propagation story.
> Part II
> -------
>
> To make the Vector types value-ready, we have to flatten out the
> inheritance hierarchy. This is an easy enough game; we make the
> concrete vector types into values, and the abstract vector types into
> interfaces. So we have:
>
> public interface Vector<T> { .. }
> public interface IntVector <: Vector<Integer> { .. }
>
> private value class Int64Vector <: IntVector { .. }
>
> This is easy enough, but I'm sure if we just did this, the carefully
> crafted optimizations we've done for vectors would fail (at first) when
> the abstract vector types become interfaces. We'll also need to ensure
> that type sharpening / nullity analysis is up to the task, so we get
> scalarization. But this seems the straightest path to get from the API
> we have to one that use value types.
>
> Another path, slightly more circuitous, is to collapse the XxxNnVector
> types to a single XxxVector value type, which looks something like:
>
> IntVector<T extends SuperLong> {
> IntSpecies species;
> T vector;
> }
>
> where there are SuperLong types for 64, 128, 256, and 512. This gets us
> away from using arrays, which is eventually where we'll want to go, but
> it's a longer road.
As a first step, I'm in favor of the former proposal. Reducing the
number of specializations is attractive, but additional performance work
is needed to ensure JIT-compiler is capable of eliminating the boxes.
So, I'd prefer to see it as a separate experiment.
Right now, the hierarchy is the following:
public abstract class Vector<T> { .. }
public abstract class IntVector extends Vector<Integer> { .. }
/*package-private*/
class Int64Vector implements IntVector { .. }
The only reason why Vector & IntVector are abstract classes is
simplification of custom box elimination logic. My understanding is it
shouldn't be too much work to extend it. The plan was to get a bare
minimum to keep the implementation viable while waiting for value types.
Best regards,
Vladimir Ivanov
More information about the panama-dev
mailing list