[vector api] Operations on Mask-s

Fri Jan 4 19:23:03 UTC 2019

Thank you from thorough answer. I will test the solutions you proposed.

What puzzled me initially is that there is actually no "fromLong" method,
so API seems asymmetric in this regard (but it can be easily worked around).

Thanks,
Tomasz

On Fri, Jan 4, 2019 at 2:02 AM Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
wrote:

> Tomasz,
>
> Yes, I had similar experience with complex use cases involving masks.
> The API works well for lane-wise cases, but falls apart when cross-lane
> operations are needed, requiring user to go through intermediate
> representations which aren't optimized well.
>
> There are different ways to fix that gap on API level, e.g.:
>    * declare more operations on Mask
>    * operate on long: Mask <=> long
>    * operate on Vector: Mask <=> Vector or Mask <: BoolVector <: Vector
>
> But IMO the main limiting factor to proceed is implementation
> considerations.
>
> On x86 there are 2 ways to represent masks: (1) as high bits in vector
> registers (pre-AVX512) and (2) as opmask registers (k0-k7). (I believe
> it's similar on ARM with NEON and SVE.)
>
> C2 doesn't have full AVX512 support yet (e.g., no opmask register
> support in register allocator) and it was decided to focus on pre-AVX512
> model first. So, current implementation represents masks as vectors
> uniformly across pre-AVX512 and AVX512-capable CPUs. And that's probably
> the main reason why Mask hasn't got enough attention yet.
>
> Regarding workarounds to your immediate problem: with the API in its
> current state, either Mask.toLong() or Mask.toVector() can be used, but
> considering current state of the implementation (where both haven't been
> intrinsified yet), I'd suggest to avoid them altogether in hot code for
> now and try the following workaround instead (pseudo-code follows):
>
>    mask0x55 = ... < fromLong(0x55 << 0) > ... // mask constant
>    mask0xAA = ... < fromLong(0x55 << 1) > ... // mask constant
>
>    compMask0 = vector.lessThan(input)
>    compMask1 = vector.shiftEL(1).lessThan(input.shiftEL(1)) // lessThan
> preserves zero in 1st element
>
>    // equivalent of "(m & 0x55) | ((m << 1) & (0x55 << 1))"
>    compMask0.and(mask0x55).or(compMask1.and(mask0xAA))
>
> The idea is to shift vectors and recompute masks instead.
>
> (Unfortunately, Vector.shiftEL() hasn't been intrinsified yet as well,
> but you can implement an equivalent using Vector.rearrange(Shuffle)
> which has enough support on JVM side.)
>
> Hope that helps.
>
> Best regards,
> Vladimir Ivanov
>
> On 02/01/2019 01:34, Tomasz Kowalczewski wrote:
> > Hi,
> >
> > I was working on implementing a simple sorting algorithm[1] using
> VectorAPI
> > and stumbled on something that might be a gap in Mask API. It might be
> > intentional for this stage of API maturity - please advise.
> >
> > I need to modify bits of the mask I get from _mm512_cmp_X_mask (e.g.
> > vector.lessThan(input)) by doing and-s and or-s which can be simulated by
> > preparing appropriate Mask objects. What I found missing is shift
> > operations. Example:
> >
> > ( compMask & 0x55 ) | ( ( compMask & 0x55 ) << 1)
> >
> > I cannot do this using Mask object. Most natural alternative would be to
> > convert the mask into long, perform these operations and convert it back.
> > Unfortunately I was unable to find how to convert long back to Mask. Of
> > course this can be done (via boolean[] array) but that does not seem
> > efficient :).
> >
> > Excuse me if I missed some API call.
> >
> > 1. https://hal.inria.fr/hal-01512970v1/document (page 14, Code 1).
> >
>

-- 
Tomasz Kowalczewski