RFR: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction [v14]
Jatin Bhateja
jbhateja at openjdk.org
Fri Feb 27 04:47:35 UTC 2026
On Wed, 25 Feb 2026 07:50:03 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
> There are regression for these two cases. Do you know the root cause?
>
> ```
> Before:
> VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms
> VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms
>
> After:
> VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 5626.367 ops/ms
> VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 960.958 ops/ms
> ```
Hi @XiaohongGong I observed that there is quite a lot of run to run variation in these micro even with stock JDK, I collected PMU events and found on AVX512 system there are MISALIGNED vector memory operation in fallback which causes this variation.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24104#issuecomment-3970737916
More information about the core-libs-dev
mailing list