RFR: 8318650: Optimized subword gather for x86 targets. [v3]

Sandhya Viswanathan sviswanathan at openjdk.org
Fri Nov 3 00:28:07 UTC 2023


On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi All,
>> 
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and AVX512 features.
>> 
>> Following is the summary of changes:-
>> 
>> 1) Intrinsify sub-word gather with high performance backend implementation based on hybrid algorithm which initially partially unrolls scalar loop to accumulates values from gather indices into a quadword(64bit) slice followed by vector permutation to place the slice into appropriate vector lanes, it prevents code bloating and generates compact
>> JIT sequence. This coupled with savings from expansive array allocation in existing java implementation translates into significant performance of 1.3-5x gains with included micro.
>> 
>> 
>> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d)
>> 
>> 
>> 2) Patch was also compared against modified java fallback implementation by replacing temporary array allocation with zero initialized vector and a scalar loops which inserts gathered values into vector. But, vector insert operation in higher vector lanes is a three step process which first extracts the upper vector 128 bit lane, updates it with gather subword value and then inserts the lane back to its original position. This makes inserts into higher order lanes costly w.r.t to proposed solution. In addition generated JIT code for modified fallback implementation was very bulky. This may impact in-lining decisions into caller contexts.
>> 
>> 3) Some minor adjustments in existing gather instruction pattens for double/quad words.
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Restricting masked sub-word gather to AVX512 target to align with integral gather support.

@jatin-bhateja I have gone through part of the code. Please see my comments below.

src/hotspot/share/opto/vectornode.hpp line 886:

> 884:     init_class_id(Class_LoadVectorGather);
> 885:     add_req(indices);
> 886:     assert(req() == MemNode::ValueIn + 1, "match_edge expects that index input is in MemNode::ValueIn");

The assert message in "" is questionable. For subword types match_edge expects last input in MemNode::ValueIn+1. I think the assert needs to move in if/else below with correct message.

src/hotspot/share/opto/vectornode.hpp line 890:

> 888:       assert(is_subword_type(vect_type()->element_basic_type()), "");
> 889:       add_req(offset);
> 890:     }

It will be good if we retain the other assert on the else path for non subword:
assert(indices->bottom_type()->is_vect(), "indices must be in vector");

src/hotspot/share/opto/vectornode.hpp line 999:

> 997:     add_req(indices);
> 998:     add_req(mask);
> 999:     assert(req() == MemNode::ValueIn + 2, "match_edge expects that last input is in MemNode::ValueIn+1");

The assert message in "" is questionable. For subword types match_edge expects last input in MemNode::ValueIn+2.

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 3818:

> 3816:         Class<? extends ByteVector> vectorType = vsp.vectorType();
> 3817: 
> 3818:         int loopIncr = 0;

loopIncr is not being used in the method below, could be removed.

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 3830:

> 3828:         // Check indices are within array bounds.
> 3829:         // FIXME: Check index under mask controlling.
> 3830:         for (int i = 0; i < vsp.length(); i += lsp.length()) {

The check (i  < vsp.length() )  could instead be (i + lsp.length() <= vsp.length()).

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 3657:

> 3655:     }
> 3656: #else[byteOrShort]
> 3657: 

Extra blank line on else path could be removed. This will also reduce the number of files changed.

-------------

PR Review: https://git.openjdk.org/jdk/pull/16354#pullrequestreview-1711404057
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380918410
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380912896
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380917754
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380901120
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380900894
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380886441


More information about the hotspot-compiler-dev mailing list