RFR: 8318650: Optimized subword gather for x86 targets. [v5]
Sandhya Viswanathan
sviswanathan at openjdk.org
Fri Nov 10 03:36:59 UTC 2023
On Fri, 10 Nov 2023 01:25:49 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Review comments resolutions.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1648:
>
>> 1646: vpermd(xtmp3, xtmp1, xtmp3, vlen_enc == Assembler::AVX_512bit ? vlen_enc : Assembler::AVX_256bit);
>> 1647: vpsubd(xtmp1, xtmp1, xtmp2, vlen_enc);
>> 1648: vpor(dst, dst, xtmp3, vlen_enc);
>
> xtmp1 starts out as 0, 1,...
> so vpermd will place the lower 64 bit from xtmp3 to lower 64 bit of dst
> why vpsubd and not vpaddd? It looks to me that vpaddd is more intutive to understand.
> if vpadd, xtmp1 will become 2,3 in next iteration
> so vpermd will place the lower 64 bit from xtmp3 to 127:64 of dst
> and so on so forth
>
> Another point, for avx512 it looks to me that vpermd and vpor could be merged into one single instruction vpermd having dst as destination and merge bit set to true.
Please ignore the last bit about avx512 vpermd merge as we are not using mask registers here.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1388837311
More information about the hotspot-compiler-dev
mailing list