RFR: 8338023: Support two vector selectFrom API
Jatin Bhateja
jbhateja at openjdk.org
Thu Aug 8 17:02:05 UTC 2024
Hi All,
As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs.
Declaration:-
Vector<E>.selectFrom(Vector<E> v1, Vector<E> v2)
Semantics:-
Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown.
Summary of changes:
- Java side implementation of new selectFrom API.
- C2 compiler IR and inline expander changes.
- In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms.
- Optimized x86 backend implementation for AVX512 and legacy target.
- Function tests covering new API.
JMH micro included with this patch shows around 10-15x gain over existing rearrange API :-
Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server]
Benchmark (size) Mode Cnt Score Error Units
SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms
SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms
SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms
SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms
SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms
SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms
SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms
SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms
SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms
SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms
SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms
SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.244 ops/ms
SelectFromBenchmark.selectFromLongVector 1024 thrpt 2 5856.859 ops/ms
SelectFromBenchmark.selectFromLongVector 2048 thrpt 2 1513.378 ops/ms
SelectFromBenchmark.selectFromShortVector 1024 thrpt 2 17888.617 ops/ms
SelectFromBenchmark.selectFromShortVector 2048 thrpt 2 9079.565 ops/ms
Kindly review and share your feedback.
Best Regards,
Jatin
[1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html
-------------
Commit messages:
- Adding Benchmark
- 8338023: Support two vector selectFrom API
Changes: https://git.openjdk.org/jdk/pull/20508/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8338023
Stats: 2737 lines in 95 files changed: 2719 ins; 17 del; 1 mod
Patch: https://git.openjdk.org/jdk/pull/20508.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508
PR: https://git.openjdk.org/jdk/pull/20508
More information about the core-libs-dev
mailing list