RFR: 8290322: Optimize Vector.rearrange over byte vectors for AVX512BW targets. [v2]
Quan Anh Mai
duke at openjdk.org
Thu Jul 28 15:49:59 UTC 2022
On Thu, 28 Jul 2022 11:04:18 GMT, Quan Anh Mai <duke at openjdk.org> wrote:
>> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
>>
>> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8290322
>> - 8290322: Optimize Vector.rearrange over byte vectors for AVX512BW targets.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5568:
>
>> 5566: #endif
>> 5567:
>> 5568: void C2_MacroAssembler::rearrange_bytes(XMMRegister dst, XMMRegister shuffle, XMMRegister src, XMMRegister xtmp1,
>
> Can we use the same approach as that used for 256-bit vector. Something similar to:
>
> vpshufb(xtmp1, src, shuffle); // All elements are at the correct place modulo 16
> vpxor(dst, dst, dst);
> vpslld(xtmp2, shuffle, 3); // Push the digit signifying the parity of 128-bit lane to the sign digit
> vpcmpb(ktmp, xtmp2, dst, lt);
> vshufi32x4(xtmp2, xtmp1, xtmp1, 0b10110001); // Shuffle the 128-bit lanes to get 1 - 0 - 3 - 2
> vpblendmb(xtmp1, ktmp, xtmp1, xtmp2); // All elements are at the correct place modulo 32
> vpslld(xtmp2, shuffle, 2); // Push the digit signifying the parity of 256-bit lane to the sign digit
> vpcmpb(ktmp, xtmp2, dst, lt);
> vshufi32x4(xtmp2, xtmp1, xtmp1, 0b01001110); // Shuffle the 128-bit lanes to get 2 - 3 - 0 - 1
> vpblendmb(dst, ktmp, xtmp1, xtmp2); // All elements are at the correct place modulo 64
Actually, it is my bad, this should not work. Sorry for the noise.
-------------
PR: https://git.openjdk.org/jdk/pull/9498
More information about the hotspot-compiler-dev
mailing list