[vectorIntrinsics] RFR: 8283413: Add C2 mid-end and x86 back-end implementation for bit REVERSE operation [v2]
Quan Anh Mai
duke at openjdk.java.net
Tue Mar 22 10:09:00 UTC 2022
On Mon, 21 Mar 2022 05:03:35 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Hi All,
>>
>> Patch includes following changes:-
>> - New C2 IR nodes to support VectorOperations.REVERSE operation.
>> - X86 backend implementation for targets supporting AVX2, AVX512 and GFNI features.
>>
>> Please find below the performance data of Vector API JMH micros:-
>>
>> System Configuration:
>> ICX: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S)
>> CLX: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (28C 2S)
>>
>> 
>>
>>
>>
>> Kindly review and share your feedback.
>>
>> Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>
> 8283413: Adding Ideal transform for (ReverseV (ReverseV VEC)) => VEC and (ReverseV (ReverseV VEC MASK) MASK)) => VEC
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4397:
> 4395: #endif
> 4396:
> 4397: void C2_MacroAssembler::vector_reverse_bit_avx(BasicType bt, XMMRegister dst, XMMRegister src, XMMRegister xtmp1,
You can do a this bit reverse using lookup table on each nibble and oring the results, the pseudocode would look something like this
lut = broadcasti128(0b0000, 0b1000, 0b0100, 0b1100, 0b0010, 0b1010, 0b0110, 0b1110, 0b0001, ...)
mask = pbroadcastd(0x0f0f0f0f)
hi = pand(src, mask)
hi = pshufb(lut, hi)
hi = pslld(hi, 4)
lo = psrld(src, 4)
lo = pand(lo, mask)
lo = pshufb(lut, lo)
dst = por(lo, hi)
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4498:
> 4496: #endif
> 4497:
> 4498: void C2_MacroAssembler::vector_reverse_byte_avx(BasicType bt, XMMRegister dst, XMMRegister src,
Since this is an in-lane shuffle, can we just use `vpshufb` for this?
Thanks.
-------------
PR: https://git.openjdk.java.net/panama-vector/pull/182
More information about the panama-dev
mailing list