RFR: 8318650: Optimized subword gather for x86 targets. [v16]
Emanuel Peter
epeter at openjdk.org
Tue Feb 27 10:48:06 UTC 2024
On Tue, 27 Feb 2024 10:25:19 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Review comment resolutions.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1672:
>
>> 1670: Label GATHER8_LOOP;
>> 1671: XMMRegister iota = xtmp1;
>> 1672: XMMRegister two_vec = xtmp2;
>
> I'm sorry to bother you so much with this. I think adding aliases that don't mention what register they alias to makes things worse. Now I can't even see if two names might alias to the same register.
As I said: you can also comment / document the use of registers. You don't have to use better names if that is problematic.
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1681:
>
>> 1679: vpsubd(xtmp2, xtmp1, xtmp2, vlen_enc);
>> 1680: vpslld(two_vec, xtmp2, 1, vlen_enc);
>> 1681: load_iota_indices(iota, vector_len * type2aelembytes(elem_ty), T_INT);
>
> Suggestion:
>
> vpxor(xtmp1, xtmp1, xtmp1, vlen_enc); // xtmp1 = {0, ...}
> vpxor(dst, dst, dst, vlen_enc); // dst = {0, ...}
> vallones(xtmp2, vlen_enc);
> vpsubd(xtmp2, xtmp1, xtmp2, vlen_enc);
> vpslld(xtmp2, xtmp2, 1, vlen_enc); // xtmp2 = {2, 2, ...}
> load_iota_indices(xtmp1, vector_len * type2aelembytes(elem_ty), T_INT); // xtmp1 = {0, 1, 2, ...}
vallones(xtmp2, vlen_enc);
vpsubd(xtmp2, xtmp1, xtmp2, vlen_enc);
vpslld(xtmp2, xtmp2, 1, vlen_enc);
This is all to set up `xtmp2 = {2, 2, ...}` ?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1504000796
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1504022599
More information about the hotspot-compiler-dev
mailing list