[vectorIntrinsics] RFR: 8284459: Add x86 back-end implementation for LEADING and TRAILING ZEROS COUNT operations [v3]

Sandhya Viswanathan sviswanathan at openjdk.java.net
Tue Apr 19 03:07:44 UTC 2022


On Fri, 15 Apr 2022 21:44:53 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Summary of changes:
>> - Patch extends auto-vectorize to vectorize following Java SE APIs.
>>      1) Integer.numberOfLeadingZeros()
>>      2) Long.numberOfLeadingZeros()
>>      3) Integer.numberOfTrailingZeros()
>>      4) Long.numberOfTrailingZeros()
>> 
>> - Adds optimized X86 backend implementation for VectorOperations.LEADING_ZERO_COUNT and VectorOperations.TRAILING_ZEROS_COUNT for AVX512 and legacy targets.
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8284459: Adding auto-vectorizer and x86 backend support for TRAILING_ZERO_COUNT, also some code re-organization.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4750:

> 4748:       break;
> 4749:     case T_INT:
> 4750:       evplzcntd(dst, ktmp, src, merge, vec_enc);

The ktmp here should be k0. An assert here or use explicit k0.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4752:

> 4750:       evplzcntd(dst, ktmp, src, merge, vec_enc);
> 4751:       break;
> 4752:     case T_SHORT:

Need an assert to verify that xtmp2 is not xnoreg here.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4757:

> 4755:       evplzcntd(xtmp2, k0, xtmp2, merge, vec_enc);
> 4756:       vpunpckhwd(dst, xtmp1, src, vec_enc);
> 4757:       evplzcntd(dst, k0, dst, merge, vec_enc);

ktmp and k0 usage is mixed here in this function. It is possible to simplify and use always k0 in vector_count_leading_zeros_evex (meaning no mask).

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4769:

> 4767:       evmovdquq(xtmp1, ExternalAddress(StubRoutines::x86::vector_count_leading_zeros_lut()), vec_enc, rtmp);
> 4768:       movl(rtmp, 0x0F0F0F0F);
> 4769:       evpbroadcastd(dst, rtmp, vec_enc);

Use the new vpbroadcast() function here.
Also an assert to verify that rtmp is not noreg, xtmp2, xtmp3 is not noreg.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4777:

> 4775:       vpxor(xtmp1, xtmp1, xtmp1, vec_enc);
> 4776:       evpcmpeqb(ktmp, xtmp1, xtmp3, vec_enc);
> 4777:       evpaddb(dst, ktmp, dst, xtmp2, true, vec_enc);

It is possible to do this without needing xtmp3:
     // Nibble clz table in xtmp1
      evmovdquq(xtmp1, ExternalAddress(StubRoutines::x86::vector_count_leading_zeros_lut()), vec_enc, rtmp);
     // Nibble mask in xtmp2 
      movl(rtmp, 0x0F0F0F0F);
      evpbroadcastd(xtmp2, rtmp, vec_enc);  
      // Get upper nibble in low 4 bits of dst
      vpsrlw(dst, src, 4, vec_enc);
      vpand(dst, dst, xtmp2, vec_enc);
      // Get clz of upper nibble into dst using table in xtmp1
      vpshufb(dst, xtmp1, dst, vec_enc);
      // Get lower nibble in low 4 bits of xtmp2 overwriting the nibble mask
      vpand(xtmp2, xtmp2, src, vec_enc);
      // Get clz of lower nibble in xtmp2 using the table in xtmp1
      vpshufb(xtmp2, xtmp1, xtmp2, vec_enc);
      // Broadcast the clz of 0 into all lanes of xtmp1, note the lowest byte had clz of zero in the xtmp1 table 
      evpbroadcastb(xtmp1, xtmp1, xtmp1, vec_enc);
      // Check if the clz of upper nibble in dst indicates that the upper nibble was all zero
      evpcmpeqb(ktmp, xtmp1, dst, vec_enc);
      // if upper nibble was zero add the clz of lower nibble to dst
      evpaddb(dst, ktmp, dst, xtmp2,  true, vec_enc);

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4964:

> 4962:   vpternlogd(xtmp4, 0x40, xtmp4, src, vec_enc);
> 4963:   vector_count_leading_zeros_evex(bt, dst, xtmp4, xtmp1, xtmp2, xtmp3, ktmp, rtmp, true, vec_enc);
> 4964:   vbroadcast(bt, xtmp4, bcast_value[type2aelembytes(bt) - 1], rtmp, vec_enc);

No need for bcast_value. It is simply 0x8 & type2aelembytes(bt).

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/189


More information about the panama-dev mailing list