RFR: 8318650: Optimized subword gather for x86 targets. [v16]

Emanuel Peter epeter at openjdk.org
Tue Feb 27 10:48:06 UTC 2024


On Tue, 27 Feb 2024 10:25:19 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Review comment resolutions.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1672:
> 
>> 1670:   Label GATHER8_LOOP;
>> 1671:   XMMRegister iota = xtmp1;
>> 1672:   XMMRegister two_vec = xtmp2;
> 
> I'm sorry to bother you so much with this. I think adding aliases that don't mention what register they alias to makes things worse. Now I can't even see if two names might alias to the same register.

As I said: you can also comment / document the use of registers. You don't have to use better names if that is problematic.

> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1681:
> 
>> 1679:   vpsubd(xtmp2, xtmp1, xtmp2, vlen_enc);
>> 1680:   vpslld(two_vec, xtmp2, 1, vlen_enc);
>> 1681:   load_iota_indices(iota, vector_len * type2aelembytes(elem_ty), T_INT);
> 
> Suggestion:
> 
>   vpxor(xtmp1, xtmp1, xtmp1, vlen_enc); // xtmp1 = {0, ...}
>   vpxor(dst, dst, dst, vlen_enc); // dst = {0, ...}
>   vallones(xtmp2, vlen_enc);
>   vpsubd(xtmp2, xtmp1, xtmp2, vlen_enc);
>   vpslld(xtmp2, xtmp2, 1, vlen_enc); // xtmp2 = {2, 2, ...}
>   load_iota_indices(xtmp1, vector_len * type2aelembytes(elem_ty), T_INT); // xtmp1 = {0, 1, 2, ...}

vallones(xtmp2, vlen_enc);
  vpsubd(xtmp2, xtmp1, xtmp2, vlen_enc);
  vpslld(xtmp2, xtmp2, 1, vlen_enc);

This is all to set up `xtmp2 = {2, 2, ...}` ?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1504000796
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1504022599


More information about the hotspot-compiler-dev mailing list