RFR: 8281375: Accelerate bitCount operation for AVX2 and AVX512 target. [v6]
Sandhya Viswanathan
sviswanathan at openjdk.java.net
Fri Mar 4 00:17:09 UTC 2022
On Wed, 2 Mar 2022 05:02:02 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> The problem is at 512 bit level.
>
> original 512 bit vector holding 16 integers = [ a15 a14 a13 a12, a11 a10 a9 a8 ,a7 a6 a5 a4 ,a3 a2 a1 a0 ]
> unpackdq lower:
> 512 bit (vec1)
> 128 128 128 128
> 0 a13 0 a12 0 a9 0 a8 0 a5 0 a4 0 a1 0 a0
> unpackdq higher:
> 512 bit (vec1)
> 128 128 128 128
> 0 a15 0 a14 0 a11 0 a10 0 a7 0 a6 0 a3 0 a2
> Next sum of absolute difference operation followed by pack will squeez each 128 bit lane of two participant vectors
> and interleave them in resulatant vector.
> VEC1_L3 VEC1_L2 VEC1_L1 VEC1_L0 VEC2_L3 VEC2_L2 VEC2_L1 VEC2_L0
> [ 0 sa13 0 sa12 0 sa9 0 sa8 0 sa5 0 sa4 0 sa1 0 sa0 ] [ 0 sa15 0 sa14 0 sa11 0 sa10 0 sa7 0 sa6 0 sa3 0 sa2 ]
> [ sa15 sa14 sa13 sa12 sa11 sa10 sa9 sa8 sa7 sa6 sa5 sa4 sa3 sa2 sa1 sa0]
Thanks for the clarification this looks good for 512 bit, I missed the reordering done by punpck.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7373
More information about the hotspot-compiler-dev
mailing list