[vectorIntrinsics] RFR: 8284459: Add x86 back-end implementation for LEADING_ZEROS_COUNT operation [v2]
Sandhya Viswanathan
sviswanathan at openjdk.java.net
Fri Apr 15 01:44:19 UTC 2022
On Wed, 13 Apr 2022 08:39:57 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Summary of changes:
>> - Patch extends SLP vectorizer to auto-vectorize Integer.numberOfLeadingZeros() and Long.numberOfLeadingZeros() APIs.
>> - Adds optimized target feature specific X86 backend implementation for LEADING_ZEROS_COUNT operation for all integral types.
>>
>> Kindly review and share your feedback.
>>
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
>
> - 8284459: Adding an exponent based leading zero count algorithm for integer vectors, its showing around 10-15% gain.
> - Merge branch 'vectorIntrinsics' of http://github.com/openjdk/panama-vector into JDK-8284459
> - 8284459: Add x86 back-end implementation for LEADING_ZERO_COUNT operation
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4766:
> 4764: evpcmpeqb(ktmp, xtmp1, xtmp3, vec_enc);
> 4765: vpaddb(dst, dst, xtmp2, vec_enc);
> 4766: evpblendmb(dst, ktmp, xtmp2, dst, true, vec_enc);
Instead of doing a vpaddb and evpblendmb, we could do directly vpaddb with ktmp as mask and merge as true.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4798:
> 4796: vector_count_leading_zeros_byte_avx(dst, src, xtmp1, xtmp2, xtmp3, rtmp, vec_enc);
> 4797: // Add zero counts of lower byte and upper byte of a word if
> 4798: // upper byte holds a zero value.
Need to specify here that xmm1 is set to all zeros by vector_count_leading_zeros_byte_avx.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4803:
> 4801: vpsllw(xtmp2, dst, 8, vec_enc);
> 4802: vpaddw(xtmp2, xtmp2, dst, vec_enc);
> 4803: vpblendvb(dst, dst, xtmp2, xtmp3, vec_enc);
The mask is generated using a word operation, but blend is a byte operation?
src/hotspot/cpu/x86/x86.ad line 1261:
> 1259: };
> 1260:
> 1261:
Extra empty line.
src/hotspot/share/opto/loopTransform.cpp line 979:
> 977: } break;
> 978: case Op_CountLeadingZerosV:
> 979: case Op_ReverseV: {
These two cases could merge with the Op_PopCountVL. The body is the same.
-------------
PR: https://git.openjdk.java.net/panama-vector/pull/189
More information about the panama-dev
mailing list