[vectorIntrinsics] RFR: 8283413: Add C2 mid-end and x86 back-end implementation for bit REVERSE and REVERSE_BYTES operation [v6]

Wed Apr 6 23:32:12 UTC 2022

On Wed, 6 Apr 2022 18:54:44 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi All,
>> 
>> Patch includes following changes:-
>> - New C2 IR nodes to support VectorOperations.REVERSE operation.
>> - X86 backend implementation for targets supporting AVX2, AVX512 and GFNI features.
>> 
>> Please find below the performance data of Vector API JMH micros:-
>> 
>> System Configuration:
>> ICX: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S)
>> CLX: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (28C 2S)
>> 
>> ![image](https://user-images.githubusercontent.com/59989778/159196997-fd1ae2ad-37ee-4294-9928-5764707bb456.png)
>> 
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8283413: Review comments resolution.

Rest of the patch looks good to me.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4581:

> 4579:     vpandq(dst, dst, xtmp1, vec_enc);
> 4580:     vpsllq(dst, dst, 2, vec_enc);
> 4581:     vpandq(xtmp2, xtmp2, xtmp1, vec_enc);

This also could be vpandn. You dont need vpternlogd to swap here as well. Just careful register movement and vpandn would work.
    movl(rtmp, 0x33333333);
    evpbroadcastd(xtmp2, rtmp, vec_enc);
    vpandq(dst, xtmp2, xtmp1, vec_enc);
    vpsllq(dst, dst, 2, vec_enc);
    vpandn(xtmp2, xtmp2, xtmp1, vec_enc);

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/182