RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction

Jatin Bhateja jbhateja at openjdk.org
Fri Jul 25 13:50:55 UTC 2025


Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction.
It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails.

 Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java).

Vector API jtreg tests pass at AVX level 2, remaining validation in progress.

Performance numbers:


System : 13th Gen Intel(R) Core(TM) i3-1315U

Baseline:
Benchmark                                                (size)   Mode  Cnt      Score   Error   Units
VectorSliceBenchmark.byteVectorSliceWithConstantIndex1     1024  thrpt    2   9444.444          ops/ms
VectorSliceBenchmark.byteVectorSliceWithConstantIndex2     1024  thrpt    2  10009.319          ops/ms
VectorSliceBenchmark.byteVectorSliceWithVariableIndex      1024  thrpt    2   9081.926          ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex1      1024  thrpt    2   6085.825          ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex2      1024  thrpt    2   6505.378          ops/ms
VectorSliceBenchmark.intVectorSliceWithVariableIndex       1024  thrpt    2   6204.489          ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex1     1024  thrpt    2   1651.334          ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex2     1024  thrpt    2   1642.784          ops/ms
VectorSliceBenchmark.longVectorSliceWithVariableIndex      1024  thrpt    2   1474.808          ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex1    1024  thrpt    2  10399.394          ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex2    1024  thrpt    2  10502.894          ops/ms
VectorSliceBenchmark.shortVectorSliceWithVariableIndex     1024  thrpt    2   9756.573          ops/ms

With opt:
Benchmark                                                (size)   Mode  Cnt      Score   Error   Units
VectorSliceBenchmark.byteVectorSliceWithConstantIndex1     1024  thrpt    2  34122.435          ops/ms
VectorSliceBenchmark.byteVectorSliceWithConstantIndex2     1024  thrpt    2  33281.868          ops/ms
VectorSliceBenchmark.byteVectorSliceWithVariableIndex      1024  thrpt    2   9345.154          ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex1      1024  thrpt    2   8283.247          ops/ms
VectorSliceBenchmark.intVectorSliceWithConstantIndex2      1024  thrpt    2   8510.695          ops/ms
VectorSliceBenchmark.intVectorSliceWithVariableIndex       1024  thrpt    2   5626.367          ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex1     1024  thrpt    2    960.958          ops/ms
VectorSliceBenchmark.longVectorSliceWithConstantIndex2     1024  thrpt    2   4155.801          ops/ms
VectorSliceBenchmark.longVectorSliceWithVariableIndex      1024  thrpt    2   1465.953          ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex1    1024  thrpt    2  32748.061          ops/ms
VectorSliceBenchmark.shortVectorSliceWithConstantIndex2    1024  thrpt    2  33674.408          ops/ms
VectorSliceBenchmark.shortVectorSliceWithVariableIndex     1024  thrpt    2   9346.148          ops/ms


Please share your feedback.

Best Regards,
Jatin

-------------

Commit messages:
 - Fixes for failing regressions
 -  Optimizing AVX2 backend and some re-factoring
 - new benchmark
 - Merge branch 'master' of https://github.com/openjdk/jdk into JDK-8303762
 - 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction

Changes: https://git.openjdk.org/jdk/pull/24104/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24104&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8303762
  Stats: 747 lines in 32 files changed: 664 ins; 0 del; 83 mod
  Patch: https://git.openjdk.org/jdk/pull/24104.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24104/head:pull/24104

PR: https://git.openjdk.org/jdk/pull/24104


More information about the hotspot-compiler-dev mailing list