RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction
Jatin Bhateja
jbhateja at openjdk.org
Fri Jul 25 13:50:55 UTC 2025
Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction.
It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails.
Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java).
Vector API jtreg tests pass at AVX level 2, remaining validation in progress.
Performance numbers:
System : 13th Gen Intel(R) Core(TM) i3-1315U
Baseline:
Benchmark (size) Mode Cnt Score Error Units
VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms
VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms
VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms
VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms
VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms
VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 thrpt 2 9756.573 ops/ms
With opt:
Benchmark (size) Mode Cnt Score Error Units
VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 34122.435 ops/ms
VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 33281.868 ops/ms
VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9345.154 ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 8283.247 ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 8510.695 ops/ms
VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 5626.367 ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 960.958 ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 4155.801 ops/ms
VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1465.953 ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 32748.061 ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 33674.408 ops/ms
VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 thrpt 2 9346.148 ops/ms
Please share your feedback.
Best Regards,
Jatin
-------------
Commit messages:
- Fixes for failing regressions
- Optimizing AVX2 backend and some re-factoring
- new benchmark
- Merge branch 'master' of https://github.com/openjdk/jdk into JDK-8303762
- 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction
Changes: https://git.openjdk.org/jdk/pull/24104/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8303762
Stats: 747 lines in 32 files changed: 664 ins; 0 del; 83 mod
Patch: https://git.openjdk.org/jdk/pull/24104.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/24104/head:pull/24104
PR: https://git.openjdk.org/jdk/pull/24104
More information about the hotspot-compiler-dev
mailing list