[vectorIntrinsics] RFR: 8283413: Add C2 mid-end and x86 back-end implementation for bit REVERSE and REVERSE_BYTES operation [v5]
    Sandhya Viswanathan 
    sviswanathan at openjdk.java.net
       
    Wed Apr  6 00:14:13 UTC 2022
    
    
  
On Mon, 4 Apr 2022 19:13:38 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Hi All,
>> 
>> Patch includes following changes:-
>> - New C2 IR nodes to support VectorOperations.REVERSE operation.
>> - X86 backend implementation for targets supporting AVX2, AVX512 and GFNI features.
>> 
>> Please find below the performance data of Vector API JMH micros:-
>> 
>> System Configuration:
>> ICX: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S)
>> CLX: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (28C 2S)
>> 
>> 
>> 
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8283413: Review comments resolutions.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4553:
> 4551:     // Get the reverse bit sequence of upper nibble of each byte.
> 4552:     vpternlogd(xtmp2, 0x1, xtmp2, xtmp2, vec_enc);
> 4553:     vpandq(xtmp2, src, xtmp2, vec_enc);
You could replace these two with vpandn.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4573:
> 4571:     vpandq(dst, xtmp1, src, vec_enc);
> 4572:     vpsllq(dst, dst, 4, vec_enc);
> 4573:     vpandq(xtmp2, xtmp2, src, vec_enc);
This could be vpandn with xtmp1. We then dont need the vpternlog above to complement xtmp1 into xtmp2. All the vpternlog usage below can be removed in similar fashion.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4613:
> 4611:     vpcmpeqd(xtmp3, xtmp3, xtmp3, vec_enc);
> 4612:     vpxor(xtmp3, xtmp2, xtmp3, vec_enc);
> 4613:     vpand(xtmp2, src, xtmp3, vec_enc);
This can be replaced by vpandn.
-------------
PR: https://git.openjdk.java.net/panama-vector/pull/182
    
    
More information about the panama-dev
mailing list