[vectorIntrinsics] RFR: 8284459: Add x86 back-end implementation for LEADING_ZEROS_COUNT operation [v2]

Sandhya Viswanathan sviswanathan at openjdk.java.net
Fri Apr 15 01:44:19 UTC 2022


On Wed, 13 Apr 2022 08:39:57 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Summary of changes:
>> - Patch extends SLP vectorizer to auto-vectorize Integer.numberOfLeadingZeros() and Long.numberOfLeadingZeros() APIs.
>> - Adds optimized target feature specific X86 backend implementation for LEADING_ZEROS_COUNT operation for all integral types. 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
> 
>  - 8284459: Adding an exponent based leading zero count algorithm for integer vectors, its showing around 10-15% gain.
>  - Merge branch 'vectorIntrinsics' of http://github.com/openjdk/panama-vector into JDK-8284459
>  - 8284459: Add x86 back-end implementation for LEADING_ZERO_COUNT operation

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4766:

> 4764:       evpcmpeqb(ktmp, xtmp1, xtmp3, vec_enc);
> 4765:       vpaddb(dst, dst, xtmp2, vec_enc);
> 4766:       evpblendmb(dst, ktmp, xtmp2, dst, true, vec_enc);

Instead of doing a vpaddb and evpblendmb, we could do directly vpaddb with ktmp as mask and merge as true.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4798:

> 4796:    vector_count_leading_zeros_byte_avx(dst, src, xtmp1, xtmp2, xtmp3, rtmp, vec_enc);
> 4797:    // Add zero counts of lower byte and upper byte of a word if
> 4798:    // upper byte holds a zero value.

Need to specify here that xmm1 is set to all zeros by vector_count_leading_zeros_byte_avx.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4803:

> 4801:    vpsllw(xtmp2, dst, 8, vec_enc);
> 4802:    vpaddw(xtmp2, xtmp2, dst, vec_enc);
> 4803:    vpblendvb(dst, dst, xtmp2, xtmp3, vec_enc);

The mask is generated using a word operation, but blend is a byte operation?

src/hotspot/cpu/x86/x86.ad line 1261:

> 1259: };
> 1260: 
> 1261: 

Extra empty line.

src/hotspot/share/opto/loopTransform.cpp line 979:

> 977:       } break;
> 978:       case Op_CountLeadingZerosV:
> 979:       case Op_ReverseV: {

These two cases could merge with the Op_PopCountVL. The body is the same.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/189


More information about the panama-dev mailing list