Vector API "Straw Man" Boxed/Unboxed Vector Instantiation
John Rose
john.r.rose at oracle.com
Mon Mar 28 19:50:21 UTC 2016
On Mar 25, 2016, at 2:45 PM, Graves, Ian L <ian.l.graves at intel.com> wrote:
>
> Hello!
>
> We've been working with some in-house implementations of the straw man[1] Vector API provided by John and test driving it with some workloads to look for any apparent pain points one could have when using the API to accelerate a common task. One place we started was with simple matrix multiplication of an (let's say two dimensional) array of unboxed floats. We assume the role of a user looking to accelerate a computation given data already in this format. The unaltered straw man API gives us a method to instantiate vectors from an array in the Species of a given Vector with the fromArrayFactory(Float[] f, Integer offset) method. We've been trying to avoid making any significant opinionated changes to the API or specific deviations in an implementation, but some of the issues around this workload warrant some thought (or some help, if somebody knows something I don't [quite likely!]).
Clearly you are thoroughly engaged with this API; these questions are all very much on-point.
For some of this stuff I have an answer already waiting. For some of it I'm still groping around. Here's my take.
> 1. Little Quibble: It seems as though in order to instantiate a Vector by using a method of the Species, one first needs an instance of a species of that Vector (or a species). Is this the case? This isn't a big deal as we could provide a singleton dummy instance for this or something similar.
I think we need to avoid an explicit user-visible class structure for implementations of Vector, so that implementations can be fully encapsulated. I.e., a user should never say "new FooVectorImpl" to create a vector. The paradigm for vector creation is less like "new ArrayList" and more like "Collections.nCopies".
This means we need one or more factory APIs. Putting a factory API into the Vector interface itself is simply a pragmatic move to avoid creating a separate API for vector creation. (It's a "clumping" rather than a "splitting" design move. Both are valid at times, and often matters of taste.) But we may end up with a VectorFactory API if the practicalities move us in that direction.
So, yes, a user will first obtain or create a prototypical vector. (Using whatever base-level factory is available: which is TBD.) Then the user will make many vectors that descend from that prototype. It is likely that copying the prototype.
How many distinct classes are needed under the hood? Well, vectors are fundamentally distinguished by shape (which determines lane structure) and base type (which determines what's in each lane). That's already a very rich cross-product. Beyond that, we may have specialized HW-influenced vector subtypes, such as gather-vectors or masked-vectors. Higher-level streaming APIs may benefit from vectors whose shape includes a data-dependent lane-count size. In fact, such a high-level vector would be not too different from a view of some or all of the contents of a Stream. (I'm not certain yet where to put a bright line between vectors and streams. We should play with this.)
> 2. Not-so-Little Quibble: Assuming that a user wants to leverage this API by starting with primitive types in arrays, the current iteration of the API would seem to require arrays to be boxed because the vector-building methods are all parameterized by element type.
We have to make a distinction in an API like this between primitives that occur as scalars, and primitives that occur in memory, as sources or destinations. The latter *must* be packed Fortran-style. The former *may* be boxed, but preferably should be used unboxed. Because of the structure of our generics, there will be complicated tradeoffs, until we get parametric polymorphism over both references and primitives (and value types!) from Project Valhalla. It is as you surmised (below).
> For folks coming from primitive arrays, this seems like a non-starter, so maybe I am missing something. What does the path to Vector usage look like if I am coming from primitive arrays?
>
> We've war-gamed a couple of possibilities here. I'd love to get the thoughts of the people on this listserv. I'll list them out.
>
>
> 1. Vanilla/Status Quo: We stick with requiring boxed types for now for elegance and simplicity's sake. Construction via parameterized methods will be slower, and perhaps some other non-API method would be better suited to this (static factory methods, etc).
This will only carry us a little further. You have put your finger on the problem, which is not boxing in general, but specifically memory layout of arrays.
In Panama we are also experimenting with notations for specifying non-Java layouts. I think it is quite possible that we will be building Vectors that slice from non-Java arrays, either off-heap, or from the middle of on-heap carrier types like long[], or from a mix of locations, such as ByteBuffers. So the problem will not be solved once we decide on Integer vs. int; we will want to support structural types like complex-number arrays or bit-slice arrays, imported from non-Java APIs. A Vector which carries a handful of such values will need to come from a vector-factory (or shape-factory) which is parameterized by
I guess the net of this, for int-vs.-Integer, is that we need to come up with a way for the shape S of a Vector<Integer,S> to determine, in a data-dependent way, how the lanes of the vector are to be loaded up from a data source, or stored to a data sink. The mentions of concrete Java array types in the Vector API are therefore simple usage examples, rather than a comprehensive story, of how to get lane data in and out of memory.
Those array-based API points are as far as my imagination went, when I was scribbling out the API. There's got to be more. Maybe this is where streams come in? Perhaps we need another complementary pair of APIs for sourcing and sinking data in vector-sized chunks, from a variety of formats. In that case, the existing array-based elements become user conveniences, rather than fundamental cut points.
Specifically, I think we may have to decouple occurrences of E[] from E in the Vector API, replacing E[] by a type which expresses the mode of collection:
default void intoArray(E[] a, int o) { collect(Collectors.intoArray(a, o); }
default <R,A> R collect(Collector<E, A, R> c, int o) { … }
(This assumes that the stream auxiliary type Collector is a good match for vector storage.)
> 2. Shell Game: Vector<Element,Shape> -> ConcreteVector takes on a similar flavor and relationship to BaseStream<Element, Stream> -> IntStream (say). That is, we have element-wise concrete instances of these vectors (FloatVector<Shape> or FloatVector256/FloatVector8) that have a standardized suite of methods that support primitives and are named consistently across the API like (Int|Double|Long)Stream. This design could incorporate additional interfaces or inherit from abstract classes that contain repeated functionality across element types that we support.
Brian already gave an outline of a good strategy:
On Mar 26, 2016, at 10:43 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
>
> For sure, convergence with Valhalla should be part of the goal. For example, we made API choices with VarHandles to not try and make them more type-safe, because (a) it was a low-level API and (b) the ability to make them type-safe at lower cost was coming, and we could wrap VHs later with specialize type-safe wrappers. We may want to consider running this play again, as you allude to in (3).
I think Brian is over-modest not to emphasize IntStream/Stream<Integer> as an excellent model for working this problem. I believe that with Valhalla parametric polymorphism, Stream<int> will be a natural, easy-to-adopt replacement for IntStream.
> 3. Deus ex Machina: JDK 10 or onward introduces a form of specialization[2] powerful enough to let us have our cake and eat it too. While perhaps this isn't something we can bet on to happen concurrent with this API design (or could we?), this might be something we should be planning for. That is, if we know that the API in its current state would be enhanced almost wholesale by the introduction of specialization and/or value types, perhaps that would be a reason enough to leave it how it is and avoid fragmenting it into overly concretized instances.
Yes. Let's steer Vector this way. IntVector now, then Vector<int> later.
> 4. Boxing Elision: Perhaps there's a way we can elide the boxing on these arrays with compiler optimizations, or is there a way already?
We can't elide boxing on arrays (yet) because Java's array performance model is tightly coupled to the monomorphic (non-varying) format of each array type. In particular, there's no way to abstract over int[] vs. Integer[], because the extra indirections for Integer are intrinsic to the array format. What we *can* abstract over is the boxing of individual values (int vs. Integer). This means that, with a little care, BinaryOperator<Integer> and IntBinaryOperator can be treated as interchangeable operand formats for HOF methods like Vector.reduce.
(Separate problem: We need to give the Vector API some way to "crack" those operands, or to guide the JIT in cracking them, as it organizes its instruction selection.)
> We are particularly interested in being able to build vectors from arrays because arrays are amenable to gathering[3] and scattering[4] instructions. We can construct lightweight Vector views over arrays of primitives using gathering and scattering (for writes). If the arrays are arrays of boxed types, that makes scattering and gathering a bit more complex as we add a non-uniform layer of indirection to each array index. This is quite useful for workloads like matrix multiplication where it might make sense to stride an array to build a vector.
Yes; see above discussion about vector classes.
> I would love to get thoughts on how we might negotiate this boxing issue, if indeed it is an issue.
My basic advice: It's time to decouple E[] from E. Isolate E[] into places where bulk data is being sourced or sinked.
Either use an interface to abstract the manner for sourcing and sinking (Collector), or use a type parameter shared with Shape. I.e., (as a clumping move) hang the responsibility for sourcing and sinking on the Shape. In that case, the source or sink types could be a type parameter shared between Vector and Shape.
Challenge: Do this without complicating Vector too much. Adding IntVector at this time will let you sprinkle the user-friendly API points with "int" and "int[]".
— John
> Thanks!
>
> Ian
>
> [1] http://cr.openjdk.java.net/~jrose/arrays/vector/Vector.java
> [2] http://cr.openjdk.java.net/~briangoetz/valhalla/specialization.html
> [3] https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=gather_ps&techs=AVX,AVX2&expand=2798
> [4] https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=scatter_ps&techs=AVX_512&expand=2798,4072,2823,2839
>
>
More information about the panama-dev
mailing list