Vector API Latest Draft Spec

Paul Sandoz paul.sandoz at oracle.com
Fri Jul 14 19:00:58 UTC 2017


Hi Ian,

Thanks. Nice work.

As an experiment i just pushed a template mechanism that can generate the vector code. I resisted the temptation to do some major refactoring, so the template generator tries to keep things as close as possible to the original code (with slight differences when operating on the FP domain when doing bitwise operations). Conservatively i have not yet replaced the existing implementations.

Implementation-wise i think we can clean this up a little:

1) There is no need to have entirely separate implementations for every bit size of an element, there can be a common parent class for most things.

2) We can use a simple forEach abstraction to avoid the loop repetition.

If there are no objections i’ll keep plugging away at this.

This was also a useful exercise to understand the scope of the API:

1) I think it may be worth splitting the Vector interfaces into three, the top level Vector with common operations, FPVector with FP specific ops, and BitwiseVector with bitwise specific ops, thereby more clearly separating operations in each domain, rather than some vectors throw UOEs.
API-wise it may be clearer to transform from say FP to BW and back when performing such operations, as long as the JIT can optimize it the transformation. That puts extra stress on the cast operation.

2) I was pondering about masks and wondering whether Vector etc should be parameterised by lane rather than bits. API-wise this is more appealing when pushing/pulling from element sources as its more obvious what the quantities are. Masks are more easily usable across vector types. However optimisation-wise this may become more tricky since a mask consisting of 8 lanes could be 8 bytes packed into 128 bits, or 8 ints packed into 512 bits (this might work on AVX512 but there are likely other examples on AVX2 where the register sizes don’t correspond). An alternative is to support a cast, which would be optimal for cases where the lane and element bit size are the same (namely transforming between Float/Int and Double/Long).
My inclination would be to explore the lane declaring route, as long as we are confident the JIT can optimize. Note that generics help here but HotSpot still presumably has to do checks when compiling since raw types can be used.

Paul.

> On 11 Jul 2017, at 16:35, Graves, Ian L <ian.l.graves at intel.com> wrote:
> 
> All,
> 
> I have just pushed the latest draft spec of the Vector API.  This spec is based on discussions both on and off Panama as it relates to an idiomatic Vector API in the Java style.  This work dates back to John's straw man[1], but focuses on the "concrete" functionality of the Vector API.  The higher order aspects of the API have been handed off to the exploration in the expression language work.  As such this iteration of the Vector API is more concrete, like you'd see in the Intel Intrinsics Library, but platform independent.  This draft spec is a functional implementation, but the functionality is emulated in pure Java.  This is intended for usability studies without regarding performance.  Lane types for Byte, Short, Int, Long, Float, and Double for shapes of 128, 256, and 512 bits are supported.  The API is structured around ideas on using interfaces for richer API's[2].  You'll notice that this spec is quite a bit bigger.  The intent is to make this as feature complete as possible for exploring with workloads and the identification of pain points.
> 
> Included in the push are a few workloads to start.  Three are basic and one is more advanced.  One is a simple addition test, one is conversions between types, and another is loading and storing to arrays based on an index variable. The other is a Mandelbrot fractal generator that I adapted to this newer spec from the older one we had previously.  The Mandelbrot generator attempts to use most of the features of the Vector API (left out shuffling right now, but there is a place to do it) in the ways that cause us the most pain on the code gen side.  Particularly, we are using masking which we haven't done much of before.  We are also using vector accumulators in loops, which has probably been the biggest single sticking point where code generation is concerned.  Mandelbrot also projects boolean values by reductions of masks derived from comparisons of vectors.  It's a pretty hairy example and also one that, if you can do it well, will probably result in pretty good code generation for a lot of other examples.  Some early work on our end has shown that there exists a path to intrinsification of this type of programming model, so we are very interested in exploring this spec further.  This Vector API can also act as the back end of the Expression Language posted a couple weeks ago as that approach is based upon MethodHandles.
> 
> Am very happy to get this spec out the door for other people to play with and see what workloads they can think of to drive it with.  Would love to get your feedback!  In the meantime I will be exploring other more advanced workloads and posting them when they're ready.
> 
> --Ian
> 
> 
> [1] http://cr.openjdk.java.net/~jrose/arrays/vector/Vector.java
> [2] http://cr.openjdk.java.net/~jrose/panama/using-interfaces.html



More information about the panama-dev mailing list