RFR: 8293409: [vectorapi] Intrinsify VectorSupport.indexVector
Xiaohong Gong
xgong at openjdk.org
Mon Sep 19 08:58:46 UTC 2022
"`VectorSupport.indexVector()`" is used to compute a vector that contains the index values based on a given vector and a scale value (`i.e. index = vec + iota * scale`). This function is widely used in other APIs like "`VectorMask.indexInRange`" which is useful to the tail loop vectorization. And it can be easily implemented with the vector instructions.
This patch adds the vector intrinsic implementation of it. The steps are:
1) Load the const "iota" vector.
We extend the "`vector_iota_indices`" stubs from byte to other integral types. For floating point vectors, it needs an additional vector cast to get the right iota values.
2) Compute indexes with "`vec + iota * scale`"
Here is the performance result to the new added micro benchmark on ARM NEON:
Benchmark Gain
IndexVectorBenchmark.byteIndexVector 1.477
IndexVectorBenchmark.doubleIndexVector 5.031
IndexVectorBenchmark.floatIndexVector 5.342
IndexVectorBenchmark.intIndexVector 5.529
IndexVectorBenchmark.longIndexVector 3.177
IndexVectorBenchmark.shortIndexVector 5.841
Please help to review and share the feedback! Thanks in advance!
-------------
Commit messages:
- 8293409: [vectorapi] Intrinsify VectorSupport.indexVector
Changes: https://git.openjdk.org/jdk/pull/10332/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10332&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8293409
Stats: 358 lines in 14 files changed: 328 ins; 6 del; 24 mod
Patch: https://git.openjdk.org/jdk/pull/10332.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/10332/head:pull/10332
PR: https://git.openjdk.org/jdk/pull/10332
More information about the core-libs-dev
mailing list