[vectorIntrinsics] RFR: 8284459: Add x86 back-end implementation for LEADING and TRAILING ZEROS COUNT operations [v3]

Sandhya Viswanathan sviswanathan at openjdk.java.net
Wed Apr 20 00:12:54 UTC 2022


On Fri, 15 Apr 2022 21:44:53 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Summary of changes:
>> - Patch extends auto-vectorize to vectorize following Java SE APIs.
>>      1) Integer.numberOfLeadingZeros()
>>      2) Long.numberOfLeadingZeros()
>>      3) Integer.numberOfTrailingZeros()
>>      4) Long.numberOfTrailingZeros()
>> 
>> - Adds optimized X86 backend implementation for VectorOperations.LEADING_ZERO_COUNT and VectorOperations.TRAILING_ZEROS_COUNT for AVX512 and legacy targets.
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8284459: Adding auto-vectorizer and x86 backend support for TRAILING_ZERO_COUNT, also some code re-organization.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4789:

> 4787:   movl(rtmp, 0x0F0F0F0F);
> 4788:   movdl(xtmp2, rtmp);
> 4789:   vpbroadcastd(xtmp2, xtmp2, vec_enc);

Use the new vbroadcast() method here.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4803:

> 4801:   vpcmpeqb(xtmp3, xtmp1, xtmp3, vec_enc);
> 4802:   vpaddb(dst, dst, xtmp2, vec_enc);
> 4803:   vpblendvb(dst, xtmp2, dst, xtmp3, vec_enc);

vector_count_leading_zeros_byte_avx can be implemented without xmpt3 as follows:
 // Nibble clz table in xtmp1
 vmovdqu(xtmp1, ExternalAddress(StubRoutines::x86::vector_count_leading_zeros_lut()), rtmp);
 
// Nibble mask in xtmp2
 vbroadcast(T_INT, xtmp2, 0x0F0F0F0F, rtmp, vec_enc);

  // dst = Compute leading zero counts of 4 MSB bits of each byte by
  // accessing the lookup table
  vpsrlw(dst, src, 4, vec_enc);
  vpand(dst, dst, xtmp2, vec_enc);
  vpshufb(dst, xtmp1, dst, vec_enc);

  // xtmp2 = Compute leading zero counts of 4 LSB bits of each byte by
  // accessing the lookup table.
  vpand(xtmp2, xtmp2, src, vec_enc);
  vpshufb(xtmp2, xtmp1, xtmp2, vec_enc);
  
  // Add xtmp2 to dst if 4 MSB bits of byte are all zeros i.e. if the dst had clz of 0
  vpbroadcastb(xtmp1, xtmp1, vec_enc); 
  vpcmpeqb(xtmp1, xtmp1, dst, vec_enc);
  vpaddb(xtmp2, xtmp2, dst, vec_enc);
  vpblendvb(dst, dst, xtmp2, xtmp1, vec_enc);

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/189


More information about the panama-dev mailing list