RFR: 8281375: Accelerate bitCount operation for AVX2 and AVX512 target. [v6]

Sandhya Viswanathan sviswanathan at openjdk.java.net
Fri Mar 4 00:17:09 UTC 2022


On Wed, 2 Mar 2022 05:02:02 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> The problem is at 512 bit level.
>
> original 512 bit vector holding 16 integers = [ a15 a14 a13 a12, a11 a10 a9 a8 ,a7 a6 a5 a4 ,a3 a2 a1 a0 ]
>  unpackdq lower:
>                 512 bit (vec1) 
>    128           128        128      128
>  0 a13 0 a12  0 a9 0 a8  0 a5 0 a4  0 a1 0 a0
>  unpackdq higher:
>                 512 bit (vec1) 
>    128           128        128      128
>  0 a15 0 a14  0 a11 0 a10  0 a7 0 a6  0 a3 0 a2
>  Next sum of absolute difference operation followed by pack will squeez each 128 bit lane of two participant vectors
>  and interleave them in resulatant vector.
>          VEC1_L3     VEC1_L2     VEC1_L1   VEC1_L0              VEC2_L3     VEC2_L2      VEC2_L1    VEC2_L0           
>  [ 0 sa13 0 sa12  0 sa9 0 sa8  0 sa5 0 sa4  0 sa1 0 sa0 ] [ 0 sa15 0 sa14  0 sa11 0 sa10  0 sa7 0 sa6  0 sa3 0 sa2 ]
>  [ sa15 sa14 sa13 sa12   sa11 sa10 sa9 sa8  sa7 sa6 sa5 sa4  sa3 sa2 sa1 sa0]

Thanks for the clarification this looks good for 512 bit, I missed the reordering done by punpck.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7373


More information about the hotspot-compiler-dev mailing list