RFR: 8318650: Optimized subword gather for x86 targets. [v3]
    Jatin Bhateja 
    jbhateja at openjdk.org
       
    Sun Nov  5 13:17:13 UTC 2023
    
    
  
On Fri, 3 Nov 2023 23:20:49 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Restricting masked sub-word gather to AVX512 target to align with integral gather support.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1576:
> 
>> 1574:     Label* larr[] = { &case0, &case1, &case2, &case3, &case4, &case5, &case6, &case7 };
>> 1575:     for (int i = 0; i < 8; i++) {
>> 1576:       bt(mask, midx);
> 
> Could we not use smaller length bt and inc instructions (e.g. 32 bit one) here as we know that we dont need 64 bits of mask here? That way we will have smaller instruction encoding.
I get your point it may save prefix byte for short vectors in one case, but REX2 may not be avoidable if allocator picks a register from higher register bank (r8-15), mask corresponding to Byte64 does need 64 bits.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1382573870
    
    
More information about the core-libs-dev
mailing list