[vector api] Operations on Mask-s
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Fri Jan 4 01:02:02 UTC 2019
Tomasz,
Yes, I had similar experience with complex use cases involving masks.
The API works well for lane-wise cases, but falls apart when cross-lane
operations are needed, requiring user to go through intermediate
representations which aren't optimized well.
There are different ways to fix that gap on API level, e.g.:
* declare more operations on Mask
* operate on long: Mask <=> long
* operate on Vector: Mask <=> Vector or Mask <: BoolVector <: Vector
But IMO the main limiting factor to proceed is implementation
considerations.
On x86 there are 2 ways to represent masks: (1) as high bits in vector
registers (pre-AVX512) and (2) as opmask registers (k0-k7). (I believe
it's similar on ARM with NEON and SVE.)
C2 doesn't have full AVX512 support yet (e.g., no opmask register
support in register allocator) and it was decided to focus on pre-AVX512
model first. So, current implementation represents masks as vectors
uniformly across pre-AVX512 and AVX512-capable CPUs. And that's probably
the main reason why Mask hasn't got enough attention yet.
Regarding workarounds to your immediate problem: with the API in its
current state, either Mask.toLong() or Mask.toVector() can be used, but
considering current state of the implementation (where both haven't been
intrinsified yet), I'd suggest to avoid them altogether in hot code for
now and try the following workaround instead (pseudo-code follows):
mask0x55 = ... < fromLong(0x55 << 0) > ... // mask constant
mask0xAA = ... < fromLong(0x55 << 1) > ... // mask constant
compMask0 = vector.lessThan(input)
compMask1 = vector.shiftEL(1).lessThan(input.shiftEL(1)) // lessThan
preserves zero in 1st element
// equivalent of "(m & 0x55) | ((m << 1) & (0x55 << 1))"
compMask0.and(mask0x55).or(compMask1.and(mask0xAA))
The idea is to shift vectors and recompute masks instead.
(Unfortunately, Vector.shiftEL() hasn't been intrinsified yet as well,
but you can implement an equivalent using Vector.rearrange(Shuffle)
which has enough support on JVM side.)
Hope that helps.
Best regards,
Vladimir Ivanov
On 02/01/2019 01:34, Tomasz Kowalczewski wrote:
> Hi,
>
> I was working on implementing a simple sorting algorithm[1] using VectorAPI
> and stumbled on something that might be a gap in Mask API. It might be
> intentional for this stage of API maturity - please advise.
>
> I need to modify bits of the mask I get from _mm512_cmp_X_mask (e.g.
> vector.lessThan(input)) by doing and-s and or-s which can be simulated by
> preparing appropriate Mask objects. What I found missing is shift
> operations. Example:
>
> ( compMask & 0x55 ) | ( ( compMask & 0x55 ) << 1)
>
> I cannot do this using Mask object. Most natural alternative would be to
> convert the mask into long, perform these operations and convert it back.
> Unfortunately I was unable to find how to convert long back to Mask. Of
> course this can be done (via boolean[] array) but that does not seem
> efficient :).
>
> Excuse me if I missed some API call.
>
> 1. https://hal.inria.fr/hal-01512970v1/document (page 14, Code 1).
>
More information about the panama-dev
mailing list