RFR: 8318650: Optimized subword gather for x86 targets. [v10]

Emanuel Peter epeter at openjdk.org
Mon Jan 15 14:42:32 UTC 2024


On Mon, 15 Jan 2024 14:25:28 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits:
>> 
>>  - Accelerating masked sub-word gathers for AVX2 targets, this gives additional 1.5-4x speedups over existing implementation.
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650
>>  - Removing JDK-8321648 related changes.
>>  - Refined AVX3 implementation with integral gather.
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650
>>  - Fix incorrect comment
>>  - Review comments resolutions.
>>  - Review comments resolutions.
>>  - Review comments resolutions.
>>  - Restricting masked sub-word gather to AVX512 target to align with integral gather support.
>>  - ... and 2 more: https://git.openjdk.org/jdk/compare/518ec971...de47076e
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1776:
> 
>> 1774:     for (int i = 0; i < 4; i++) {
>> 1775:       movl(rtmp, Address(idx_base, i * 4));
>> 1776:       addl(rtmp, offset);
> 
> Can the `offset` not be added to `idx_base` before the loop?

Or would that require too many registers?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1452453827


More information about the core-libs-dev mailing list