RFR: 8338023: Support two vector selectFrom API [v10]
Paul Sandoz
psandoz at openjdk.org
Mon Sep 16 21:21:11 UTC 2024
On Mon, 16 Sep 2024 02:58:41 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Hi All,
>>
>> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs.
>>
>>
>> Declaration:-
>> Vector<E>.selectFrom(Vector<E> v1, Vector<E> v2)
>>
>>
>> Semantics:-
>> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown.
>>
>> Summary of changes:
>> - Java side implementation of new selectFrom API.
>> - C2 compiler IR and inline expander changes.
>> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms.
>> - Optimized x86 backend implementation for AVX512 and legacy target.
>> - Function tests covering new API.
>>
>> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :-
>> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server]
>>
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms
>> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms
>> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms
>> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms
>> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms
>> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms
>> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms
>> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms
>> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms
>> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms
>> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms
>> S...
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>
> Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level.
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2970:
> 2968:
> 2969:
> 2970: /*package-private*/
I think we can simplify with:
/*package-private*/
@ForceInline
final $abstractvectortype$ selectFromTemplate(Class<? extends Vector<$Boxbitstype$>> indexVecClass,
$abstractvectortype$ v1, $abstractvectortype$ v2) {
int twoVectorLenMask = (length() << 1) - 1;
#if[FP]
Vector<$Boxbitstype$> wrapped_indexes = this.convert(VectorOperators.{#if[intOrFloat]?F2I:D2L}, 0)
.lanewise(VectorOperators.AND, twoVectorLenMask);
return VectorSupport.selectFromTwoVectorOp(getClass(), indexVecClass , $type$.class, $bitstype$.class,
length(), wrapped_indexes, v1, v2,
(vec1, vec2, vec3) -> selectFromTwoVectorHelper(vec1, vec2, vec3)
);
#else[FP]
$abstractvectortype$ wrapped_indexes = this.lanewise(VectorOperators.AND, twoVectorLenMask);
return VectorSupport.selectFromTwoVectorOp(getClass(), indexVecClass, $type$.class, $type$.class,
length(), wrapped_indexes, v1, v2,
(vec1, vec2, vec3) -> selectFromTwoVectorHelper(vec1, vec2, vec3)
);
#end[FP]
}
(Note that's without the assert - see separate comment).
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1761977004
More information about the hotspot-compiler-dev
mailing list