[vectorIntrinsics] RFR: 8283413: Add C2 mid-end and x86 back-end implementation for bit REVERSE operation [v2]

Tue Mar 22 10:09:00 UTC 2022

On Mon, 21 Mar 2022 05:03:35 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi All,
>> 
>> Patch includes following changes:-
>> - New C2 IR nodes to support VectorOperations.REVERSE operation.
>> - X86 backend implementation for targets supporting AVX2, AVX512 and GFNI features.
>> 
>> Please find below the performance data of Vector API JMH micros:-
>> 
>> System Configuration:
>> ICX: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S)
>> CLX: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (28C 2S)
>> 
>> ![image](https://user-images.githubusercontent.com/59989778/159196997-fd1ae2ad-37ee-4294-9928-5764707bb456.png)
>> 
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8283413: Adding Ideal transform for (ReverseV (ReverseV VEC)) => VEC and (ReverseV (ReverseV VEC MASK) MASK)) => VEC

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4397:

> 4395: #endif
> 4396: 
> 4397: void C2_MacroAssembler::vector_reverse_bit_avx(BasicType bt, XMMRegister dst, XMMRegister src, XMMRegister xtmp1,

You can do a this bit reverse using lookup table on each nibble and oring the results, the pseudocode would look something like this

    lut = broadcasti128(0b0000, 0b1000, 0b0100, 0b1100, 0b0010, 0b1010, 0b0110, 0b1110, 0b0001, ...)
    mask = pbroadcastd(0x0f0f0f0f)
    hi = pand(src, mask)
    hi = pshufb(lut, hi)
    hi = pslld(hi, 4)
    lo = psrld(src, 4)
    lo = pand(lo, mask)
    lo = pshufb(lut, lo)
    dst = por(lo, hi)

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4498:

> 4496: #endif
> 4497: 
> 4498: void C2_MacroAssembler::vector_reverse_byte_avx(BasicType bt, XMMRegister dst, XMMRegister src,

Since this is an in-lane shuffle, can we just use `vpshufb` for this?
Thanks.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/182