[vectorIntrinsics] RFR: 8284459: Add x86 back-end implementation for LEADING and TRAILING ZEROS COUNT operations [v3]

Sandhya Viswanathan sviswanathan at openjdk.java.net
Tue Apr 19 00:21:43 UTC 2022


On Fri, 15 Apr 2022 21:44:53 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Summary of changes:
>> - Patch extends auto-vectorize to vectorize following Java SE APIs.
>>      1) Integer.numberOfLeadingZeros()
>>      2) Long.numberOfLeadingZeros()
>>      3) Integer.numberOfTrailingZeros()
>>      4) Long.numberOfTrailingZeros()
>> 
>> - Adds optimized X86 backend implementation for VectorOperations.LEADING_ZERO_COUNT and VectorOperations.TRAILING_ZEROS_COUNT for AVX512 and legacy targets.
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8284459: Adding auto-vectorizer and x86 backend support for TRAILING_ZERO_COUNT, also some code re-organization.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4441:

> 4439:   if ((is_LP64 || lane_size < 8) &&
> 4440:       ((is_non_subword_integral_type(bt) && VM_Version::supports_avx512vl()) ||
> 4441:        (is_subword_type(bt) && VM_Version::supports_avx512bw()))) {

The vl check is needed for all lane sizes when vector width < 64 bytes. The check doesn't seem to capture that.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4981:

> 4979:   vpor(xtmp3, xtmp3, src, vec_enc);
> 4980:   vector_popcount_integral(bt, dst, xtmp3, xtmp1, xtmp2, rtmp, vec_enc);
> 4981:   vbroadcast(bt, xtmp1, bcast_value[type2aelembytes(bt) - 1], rtmp, vec_enc);

The bcast_value could be replaced by (0x8 * type2aelembytes(bt)).

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/189


More information about the panama-dev mailing list