Vector API Latest Draft Spec
Graves, Ian L
ian.l.graves at intel.com
Fri Jul 14 19:51:53 UTC 2017
> As an experiment i just pushed a template mechanism that can generate the
> vector code. I resisted the temptation to do some major refactoring, so the
> template generator tries to keep things as close as possible to the original
> code (with slight differences when operating on the FP domain when doing
> bitwise operations). Conservatively i have not yet replaced the existing
> implementations.
Awesome! There's a lot of boilerplate to factor out.
> 1) There is no need to have entirely separate implementations for every bit
> size of an element, there can be a common parent class for most things.
>
> 2) We can use a simple forEach abstraction to avoid the loop repetition.
I agree. Most of this code is copy/pasted. The loops need to go, but I wasn't sure what to replace them with.
> If there are no objections i’ll keep plugging away at this.
Please do!
>
> This was also a useful exercise to understand the scope of the API:
>
> 1) I think it may be worth splitting the Vector interfaces into three, the top
> level Vector with common operations, FPVector with FP specific ops, and
> BitwiseVector with bitwise specific ops, thereby more clearly separating
> operations in each domain, rather than some vectors throw UOEs.
> API-wise it may be clearer to transform from say FP to BW and back when
> performing such operations, as long as the JIT can optimize it the
> transformation. That puts extra stress on the cast operation.
Would this change the way you query for a Vector? Would the user have to ask for a FP-Vector or a BW-Vector explicitly? It makes sense to break out features into multiple type bounds. You could introduce new vector features for different architectures in this manner, too.
>
> 2) I was pondering about masks and wondering whether Vector etc should
> be parameterised by lane rather than bits. API-wise this is more appealing
> when pushing/pulling from element sources as its more obvious what the
> quantities are. Masks are more easily usable across vector types. However
> optimisation-wise this may become more tricky since a mask consisting of 8
> lanes could be 8 bytes packed into 128 bits, or 8 ints packed into 512 bits (this
> might work on AVX512 but there are likely other examples on AVX2 where
> the register sizes don’t correspond). An alternative is to support a cast, which
> would be optimal for cases where the lane and element bit size are the same
> (namely transforming between Float/Int and Double/Long).
> My inclination would be to explore the lane declaring route, as long as we are
> confident the JIT can optimize. Note that generics help here but HotSpot still
> presumably has to do checks when compiling since raw types can be used.
>
All of these points are reasonable to me. When I experiemented with Lane vs Bit-size, I ran into the most pain around casting. Casting with a lane count parameter puts you into an undefined space. When recasting to a Vector of a specific shape, you have the knowledge that the shape is unchanged by the cast. This isn't the case with lane counts. Some casts will shorten or lengthen your vector's element count. If you're parameterizing by lane number, you would have to loosen the constraint on the lane parameter in these operations or strengthen the assumptions about lane counts in the casting operations.
In any case, I think there's plenty one can do to cut down the bloat in this example. I expanded on my first template class to give a prototype working spec for playing around with, but it needs to be pared down so the code is manageable. Templating and aggressive refactoring should get the bloat under control.
--Ian
More information about the panama-dev
mailing list