RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs
Jatin Bhateja
jbhateja at openjdk.java.net
Tue May 11 06:16:53 UTC 2021
On Fri, 7 May 2021 19:04:01 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:
> > Hi @PaulSandoz , that's a nice suggestion, I think instead of reduction which may emit bulky sequence, VectorMask.toLong() + Long.bitCount() could have been used for trueCount. But since toLong may not work for ARM SVE, so in the mean time intrinsifying at the level of API looked reasonable.
>
> Do you mean that reusing `VectorSupport.reductionCoerced` as the intrinsic entry point may emit bulky sequence?
Hi @PaulSandoz , semantically reductionCoerced could be used as an entry point for trueCount (VECTOR_OP_BITCOUNT) since we are iterating over each lane element (boolean type in this case) and returning the final set bits count, but for lastTrue and firstTrue operation are more like iterative operation on the lines of Vector.lane and Vector.withLane for which we have explicit entry points.
Also VectorSupport.reductionCoerced adds a constraint on the type parameter V to have lower bound as Vector, VectorMask is not in the hierarchy of Vector class. We can relax that constraint though. In addition we may need bypass some portions in LibraryCallKit::inline_vector_reduction for mask query APIs, given all this does it sound reasonable to add a one different entry point (maskOp) for all the mask query APIs. Looking for your feedback.
>
> Note that i was not suggesting to reuse `Long.bitCount()` etc. i was just using that as a example that the bit-wise reduction operations on masks can also apply to integral vectors, suggesting there might be some sharing in C2 just like is done for binary-wise operations, such as logical AND.
>
> For example:
>
> ```
> @Override
> @ForceInline
> public Int256Mask and(VectorMask<Integer> mask) {
> Objects.requireNonNull(mask);
> Int256Mask m = (Int256Mask)mask;
> return VectorSupport.binaryOp(VECTOR_OP_AND, Int256Mask.class, int.class, VLENGTH,
> this, m,
> (m1, m2) -> m1.bOp(m2, (i, a, b) -> a & b));
> }
> ```
>
> And notice that `VECTOR_OP_AND` is reused for vector lane-wise binary and reduction operations on `IntVector` etc. Can we do the same for other bitwise reduction-like operations, first implementing optimal support for masks, then later expanding for integral vectors?
>
> So rather than introducing specific constants, such as `VECTOR_OP_MASK_TRUECOUNT` etc, we can generalize to `VECTOR_OP_BITCOUNT` etc that can apply to both masks and integral vectors, where for masks we interpret `BIT` appropriately to mean `boolean` true value.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3916
More information about the hotspot-compiler-dev
mailing list