RFR: 8281375: Accelerate bitCount operation for AVX2 and AVX512 target. [v6]

Jatin Bhateja jbhateja at openjdk.java.net
Tue Mar 1 04:43:00 UTC 2022


On Tue, 1 Mar 2022 01:52:59 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   8281375: Fix a typo.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4375:
> 
>> 4373:     evpunpckldq(xtmp2, k0, xtmp3, xtmp1, true, vec_enc);
>> 4374:     evpsadbw(xtmp2, k0, xtmp2, xtmp1, true, vec_enc);
>> 4375:     vpackuswb(dst, xtmp2, dst, vec_enc);
> 
> This doesn't look correct for say 512 bit vector length. At this point, the xtmp2 has 64-bit popcount results for lower 8 integers and dst has 64-bit popcount results for upper 8 integers. The vpackuswb does interleaving between xtmp2 and dst at 128 bit lanes so the result is not in correct order.

original vector of integer = [ a3  a2  a1  a0]
unpackldq =  [0, a1, 0, a0]
unpackhdq = [ 0, a3 , 0 , a2]
perform sum of absolute difference and store the result into LSB 16 bits of each quad word.
packuswb  packs at lane granularity i.e, 128 bits packed.
      128 bit               128 bit
[0, sa3, 0, sa2]  [ 0, sa1, 0, sa0 ]  =>   [ sa3, sa2 , sa1, sa0]

-------------

PR: https://git.openjdk.java.net/jdk/pull/7373


More information about the hotspot-compiler-dev mailing list