RFR: 8338023: Support two vector selectFrom API [v7]

Emanuel Peter epeter at openjdk.org
Fri Aug 30 15:02:26 UTC 2024


On Thu, 29 Aug 2024 05:42:58 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi All,
>> 
>> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs.
>> 
>> 
>> Declaration:-
>>     Vector<E>.selectFrom(Vector<E> v1, Vector<E> v2)
>> 
>> 
>> Semantics:-
>>     Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types.  The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown.  
>> 
>> Summary of changes:
>> -  Java side implementation of new selectFrom API.
>> -  C2 compiler IR and inline expander changes.
>> -  In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. 
>> -  Optimized x86 backend implementation for AVX512 and legacy target.
>> -  Function tests covering new API.
>> 
>> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :-
>> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server]
>> 
>> 
>>   Benchmark                                     (size)   Mode  Cnt      Score   Error   Units
>> SelectFromBenchmark.rearrangeFromByteVector     1024  thrpt    2   2041.762          ops/ms
>> SelectFromBenchmark.rearrangeFromByteVector     2048  thrpt    2   1028.550          ops/ms
>> SelectFromBenchmark.rearrangeFromIntVector      1024  thrpt    2    962.605          ops/ms
>> SelectFromBenchmark.rearrangeFromIntVector      2048  thrpt    2    479.004          ops/ms
>> SelectFromBenchmark.rearrangeFromLongVector     1024  thrpt    2    359.758          ops/ms
>> SelectFromBenchmark.rearrangeFromLongVector     2048  thrpt    2    178.192          ops/ms
>> SelectFromBenchmark.rearrangeFromShortVector    1024  thrpt    2   1463.459          ops/ms
>> SelectFromBenchmark.rearrangeFromShortVector    2048  thrpt    2    727.556          ops/ms
>> SelectFromBenchmark.selectFromByteVector        1024  thrpt    2  33254.830          ops/ms
>> SelectFromBenchmark.selectFromByteVector        2048  thrpt    2  17313.174          ops/ms
>> SelectFromBenchmark.selectFromIntVector         1024  thrpt    2  10756.804          ops/ms
>> S...
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adding descriptive comments

I left a few comments, hopefully I can spend some more time on this next week

src/hotspot/cpu/x86/matcher_x86.hpp line 215:

> 213:   }
> 214: 
> 215:   static bool vector_indexes_needs_massaging(BasicType ety, int vlen) {

The name "massaging" sounds quite vague. Can we have something more expressive / descriptive? Is it the vector that "needs" massaging or the indices that "need" massaging?

Why `ety` and not `bt`? Is that not the name we use most often?

src/hotspot/cpu/x86/x86.ad line 10490:

> 10488: 
> 10489: 
> 10490: instruct selectFromTwoVec_evex(vec dst, vec src1, vec src2)

You could rename `dst` -> `mask_and_dst`. That would maybe help the reader to more quickly know that it is an input-mask and output-dst.

src/hotspot/share/opto/vectorIntrinsics.cpp line 2716:

> 2714:   C->set_max_vector_size(MAX2(C->max_vector_size(), (uint)(num_elem * type2aelembytes(elem_bt))));
> 2715:   return true;
> 2716: }

The code in these methods are extremely duplicated. Unboxing and boxing in every method around here. Maybe not your problem in this PR.

BTW: your error logging used `v1` in all 3 cases `op1-3`, you probably want to give them useful names. `v1-3` probably?

All this copy-pasting makes it easy to miss updating some cases... like it happenend here.

src/hotspot/share/opto/vectornode.cpp line 2090:

> 2088:   int num_elem = vect_type()->length();
> 2089:   BasicType elem_bt = vect_type()->element_basic_type();
> 2090:   if (Matcher::match_rule_supported_vector(Op_SelectFromTwoVector, num_elem, elem_bt)) {

Suggestion:

  // Keep the node if it is supported, else lower it to other nodes.
  if (Matcher::match_rule_supported_vector(Op_SelectFromTwoVector, num_elem, elem_bt)) {

src/hotspot/share/opto/vectornode.cpp line 2095:

> 2093:   Node* index_vec = in(1);
> 2094:   Node* src1  = in(2);
> 2095:   Node* src2  = in(3);

Suggestion:

  Node* src1 = in(2);
  Node* src2 = in(3);

unnecessary spaces

src/hotspot/share/opto/vectornode.cpp line 2101:

> 2099:   //     (VectorBlend
> 2100:   //         (VectorRearrange SRC1, INDEX)
> 2101:   //         (VectorRearrange SRC2, NORM_INDEX)

Suggestion:

  //         (VectorRearrange SRC1 INDEX)
  //         (VectorRearrange SRC2 NORM_INDEX)

Either consistently use commas or none at all ;)

src/hotspot/share/opto/vectornode.cpp line 2104:

> 2102:   //         MASK)
> 2103:   // This shall prevent an intrinsification failure and associated argument
> 2104:   // boxing penalties.

A quick comment about how the mask is computed could be nice.
`MASK = INDEX < num_elem`

src/hotspot/share/opto/vectornode.cpp line 2126:

> 2124:       case T_FLOAT:
> 2125:         return phase->transform(new VectorCastF2XNode(index_vec, TypeVect::make(T_INT, num_elem)));
> 2126:       break;

`break` after `return`?

src/hotspot/share/opto/vectornode.cpp line 2141:

> 2139:       default: return elem_bt;
> 2140:     }
> 2141:   };

This is definitely a style question. But it might be nice to make these functions member functions. They now kinda disrupt the flow of the `::Ideal` method. And in some cases you use the captured variables, and in other cases you pass them in explicitly, even though they already exist in the captured scope... consistency would be nice.

src/hotspot/share/opto/vectornode.cpp line 2148:

> 2146: 
> 2147:   BoolTest::mask pred = BoolTest::lt;
> 2148:   ConINode* pred_node = (ConINode*)phase->makecon(TypeInt::make(pred));

Would `as_ConI()` be a better alternative to the `(ConINode*)` cast?

src/hotspot/share/opto/vectornode.cpp line 2149:

> 2147:   BoolTest::mask pred = BoolTest::lt;
> 2148:   ConINode* pred_node = (ConINode*)phase->makecon(TypeInt::make(pred));
> 2149:   Node* lane_cnt = phase->makecon(lane_count_type());

Hmm. I don't like to have different names for the same thing. `num_elem` and `lane_count` and `lane_cnt`.

What about a method `make_num_elem_node`, returns a `ConNode*`.
Then you pass it around as `num_elem_scalar`, and broadcast it to `num_elem_vector`.

src/hotspot/share/opto/vectornode.cpp line 2159:

> 2157: 
> 2158:   vmask_type = TypeVect::makemask(elem_bt, num_elem);
> 2159:   mask = phase->transform(new VectorMaskCastNode(mask, vmask_type));

I would just have two variables, and not overwrite it: `integral_vmask_type` and `vmask_type`. Maybe also `mask` could be split into two variables?

src/hotspot/share/opto/vectornode.cpp line 2181:

> 2179:         default: return elem_bt;
> 2180:       }
> 2181:     };

You are now using this twice. Is there not some method that already does this?

src/hotspot/share/opto/vectornode.cpp line 2183:

> 2181:     };
> 2182:     // Targets emulating unsupported permutation for certain vector types
> 2183:     // may need to message the indexes to match the users intent.

Suggestion:

    // may need to massage the indexes to match the users intent.

src/hotspot/share/opto/vectornode.hpp line 1272:

> 1270: };
> 1271: 
> 1272: 

spurious newline

src/hotspot/share/opto/vectornode.hpp line 1621:

> 1619: public:
> 1620:   SelectFromTwoVectorNode(Node* in1, Node* in2, Node* in3, const TypeVect* vt)
> 1621:   : VectorNode(in1, in2, in3, vt) {}

I would prefer more expressive variable names and a short specification what the node does. Otherwise one always has to reverse-engineer what inputs are acceptable etc. I mean you could even require `VectorNode*` as inputs.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2272308274
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738648483
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738738172
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738759799
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738767466
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738768205
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738765017
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738823939
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738808199
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738781635
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738787762
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738806978
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738814420
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738838073
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738835911
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738729168
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738866021


More information about the core-libs-dev mailing list