[vector api] Operations on Mask-s

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Fri Jan 4 01:02:02 UTC 2019


Tomasz,

Yes, I had similar experience with complex use cases involving masks. 
The API works well for lane-wise cases, but falls apart when cross-lane 
operations are needed, requiring user to go through intermediate 
representations which aren't optimized well.

There are different ways to fix that gap on API level, e.g.:
   * declare more operations on Mask
   * operate on long: Mask <=> long
   * operate on Vector: Mask <=> Vector or Mask <: BoolVector <: Vector

But IMO the main limiting factor to proceed is implementation 
considerations.

On x86 there are 2 ways to represent masks: (1) as high bits in vector 
registers (pre-AVX512) and (2) as opmask registers (k0-k7). (I believe 
it's similar on ARM with NEON and SVE.)

C2 doesn't have full AVX512 support yet (e.g., no opmask register 
support in register allocator) and it was decided to focus on pre-AVX512 
model first. So, current implementation represents masks as vectors 
uniformly across pre-AVX512 and AVX512-capable CPUs. And that's probably 
the main reason why Mask hasn't got enough attention yet.

Regarding workarounds to your immediate problem: with the API in its 
current state, either Mask.toLong() or Mask.toVector() can be used, but 
considering current state of the implementation (where both haven't been 
intrinsified yet), I'd suggest to avoid them altogether in hot code for 
now and try the following workaround instead (pseudo-code follows):

   mask0x55 = ... < fromLong(0x55 << 0) > ... // mask constant
   mask0xAA = ... < fromLong(0x55 << 1) > ... // mask constant

   compMask0 = vector.lessThan(input)
   compMask1 = vector.shiftEL(1).lessThan(input.shiftEL(1)) // lessThan 
preserves zero in 1st element

   // equivalent of "(m & 0x55) | ((m << 1) & (0x55 << 1))"
   compMask0.and(mask0x55).or(compMask1.and(mask0xAA))

The idea is to shift vectors and recompute masks instead.

(Unfortunately, Vector.shiftEL() hasn't been intrinsified yet as well, 
but you can implement an equivalent using Vector.rearrange(Shuffle) 
which has enough support on JVM side.)

Hope that helps.

Best regards,
Vladimir Ivanov

On 02/01/2019 01:34, Tomasz Kowalczewski wrote:
> Hi,
> 
> I was working on implementing a simple sorting algorithm[1] using VectorAPI
> and stumbled on something that might be a gap in Mask API. It might be
> intentional for this stage of API maturity - please advise.
> 
> I need to modify bits of the mask I get from _mm512_cmp_X_mask (e.g.
> vector.lessThan(input)) by doing and-s and or-s which can be simulated by
> preparing appropriate Mask objects. What I found missing is shift
> operations. Example:
> 
> ( compMask & 0x55 ) | ( ( compMask & 0x55 ) << 1)
> 
> I cannot do this using Mask object. Most natural alternative would be to
> convert the mask into long, perform these operations and convert it back.
> Unfortunately I was unable to find how to convert long back to Mask. Of
> course this can be done (via boolean[] array) but that does not seem
> efficient :).
> 
> Excuse me if I missed some API call.
> 
> 1. https://hal.inria.fr/hal-01512970v1/document (page 14, Code 1).
> 


More information about the panama-dev mailing list