RFR: 8350748: VectorAPI: Method "checkMaskFromIndexSize" should be force inlined
Xiaohong Gong
xgong at openjdk.org
Thu Feb 27 06:58:33 UTC 2025
Method `checkMaskFromIndexSize` is called by some vector masked APIs like `fromArray/intoArray/fromMemorySegment/...`. It is used to check whether the index of any active lanes in a mask will reach out of the boundary of the given Array/MemorySegment. This function should be force inlined, or a VectorMask object is generated once the function call is not inlined by C2 compiler, which affects the API performance a lot.
This patch changed to call the `VectorMask.checkFromIndexSize` method directly inside of these APIs instead of `checkMaskFromIndexSize`. Since it has added the `@ForceInline` annotation already, it will be inlined and intrinsified by C2. And then the expected vector instructions can be generated. With this change, the unused `checkMaskFromIndexSize` can be removed.
Performance of some JMH benchmarks can improve up to 14x on a NVIDIA Grace CPU (AArch64 SVE2, 128-bit vectors). We can also observe the similar performance improvement on a Intel CPU which supports AVX512.
Following is the performance data on Grace:
Benchmark Mode Cnt Units Before After Gain
LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE thrpt 30 ops/ms 31544.304 31610.598 1.002
LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE thrpt 30 ops/ms 3896.202 3903.249 1.001
LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE thrpt 30 ops/ms 570.415 7174.320 12.57
LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE thrpt 30 ops/ms 566.694 7193.520 12.69
LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE thrpt 30 ops/ms 3899.269 3878.258 0.994
LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE thrpt 30 ops/ms 1134.301 16053.847 14.15
StoreMaskedIOOBEBenchmark.byteStoreArrayMaskIOOBE thrpt 30 ops/ms 26449.558 28699.480 1.085
StoreMaskedIOOBEBenchmark.doubleStoreArrayMaskIOOBE thrpt 30 ops/ms 1922.167 5781.077 3.007
StoreMaskedIOOBEBenchmark.floatStoreArrayMaskIOOBE thrpt 30 ops/ms 3784.190 11789.276 3.115
StoreMaskedIOOBEBenchmark.intStoreArrayMaskIOOBE thrpt 30 ops/ms 3694.082 15633.547 4.232
StoreMaskedIOOBEBenchmark.longStoreArrayMaskIOOBE thrpt 30 ops/ms 1966.956 6049.790 3.075
StoreMaskedIOOBEBenchmark.shortStoreArrayMaskIOOBE thrpt 30 ops/ms 7647.309 27412.387 3.584
-------------
Commit messages:
- 8350748: VectorAPI: Method "checkMaskFromIndexSize" should be force inlined
Changes: https://git.openjdk.org/jdk/pull/23817/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23817&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8350748
Stats: 213 lines in 7 files changed: 36 ins; 140 del; 37 mod
Patch: https://git.openjdk.org/jdk/pull/23817.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/23817/head:pull/23817
PR: https://git.openjdk.org/jdk/pull/23817
More information about the core-libs-dev
mailing list