RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v14]

Jatin Bhateja jbhateja at openjdk.org
Fri Feb 27 04:47:35 UTC 2026


On Wed, 25 Feb 2026 07:50:03 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

> There are regression for these two cases. Do you know the root cause?
> 
> ```
> Before:
> VectorSliceBenchmark.intVectorSliceWithVariableIndex       1024  thrpt    2   6204.489          ops/ms
> VectorSliceBenchmark.longVectorSliceWithConstantIndex1     1024  thrpt    2   1651.334          ops/ms
> 
> After:
> VectorSliceBenchmark.intVectorSliceWithVariableIndex       1024  thrpt    2   5626.367          ops/ms
> VectorSliceBenchmark.longVectorSliceWithConstantIndex1     1024  thrpt    2    960.958          ops/ms
> ```

Hi @XiaohongGong  I observed that there is quite a lot of run to run variation in these micro even with stock JDK, I collected PMU events and found on AVX512 system there are MISALIGNED vector memory operation in fallback which causes this variation.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24104#issuecomment-3970737916


More information about the core-libs-dev mailing list