RFR: 8318650: Optimized subword gather for x86 targets. [v14]

Emanuel Peter epeter at openjdk.org
Mon Feb 26 09:54:00 UTC 2024


On Sun, 25 Feb 2024 06:27:10 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi All,
>> 
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and AVX512 features.
>> 
>> Following is the summary of changes:-
>> 
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially partially unrolls scalar loop to accumulates values from gather indices into a quadword(64bit) slice followed by vector permutation to place the slice into appropriate vector lanes, it prevents code bloating and generates compact JIT sequence. This coupled with savings from expansive array allocation in existing java implementation translates into significant performance of 1.5-10x gains with included micro.
>> 
>> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d)
>> 
>> 
>> 2) Patch was also compared against modified java fallback implementation by replacing temporary array allocation with zero initialized vector and a scalar loops which inserts gathered values into vector. But, vector insert operation in higher vector lanes is a three step process which first extracts the upper vector 128 bit lane, updates it with gather subword value and then inserts the lane back to its original position. This makes inserts into higher order lanes costly w.r.t to proposed solution. In addition generated JIT code for modified fallback implementation was very bulky. This may impact in-lining decisions into caller contexts.
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review comments resolutions.

Update looks better already!
I have not yet reviewed `C2_MacroAssembler::vgather_subword` because of the variable names.

Increasing the reviewer count, because I'd like to have someone else review that is more familiar with the Vector API (java) code.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1584:

> 1582:   if (elem_bt == T_SHORT) {
> 1583:     Label case0, case1, case2, case3;
> 1584:     Label* larr[] = {&case0, &case1, &case2, &case3};

Not sure if I asked this already: why define them all here, rather than locally in the loop?

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1651:

> 1649:  * permutation to place the slice into appropriate vector lanes
> 1650:  * location in destination vector. Following pseudo code describes the
> 1651:  * algorithm in detail:

Suggestion:

 * Gather using hybrid algorithm which initially partially unrolls scalar loop
 * to accumulate values from gather indices into a quad-word(64bit) slice, a
 * slice may hold 8 bytes or 4 short values. This is followed by a vector
 * permutation to place the slice into appropriate vector lane
 * locations in destination vector. Following pseudo code describes the
 * algorithm in detail:

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1663:

> 1661:  *
> 1662:  * With each iteration, doubleword permute indices (0,1) corresponding
> 1663:  * to gathered quadword gets right shifted by two lane position.

Suggestion:

 * to gathered quadword gets right shifted by two lane positions.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1674:

> 1672:                                         int vector_len, int vlen_enc) {
> 1673:   assert(is_subword_type(elem_ty), "");
> 1674:   Label GATHER8_LOOP;

Why not define it righ before the first use?

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 4840:

> 4838: 
> 4839:         // Check indices are within array bounds.
> 4840:         // FIXME: Check index under mask controlling.

This is a new FIXME in the template. I see that there are others around from before. Still, if you add a new one, I think you should at least add a bug-number that addresses this. That way, we can track the task.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16354#pullrequestreview-1900380735
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502301147
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502303881
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502304352
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502314411
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502313404


More information about the core-libs-dev mailing list