RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v6]

Chiranmoy Bhattacharya duke at openjdk.org
Wed Nov 12 08:54:23 UTC 2025


On Wed, 12 Nov 2025 07:38:17 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

> Internal tests pass (just sanity testing, did not run it on SVE). Code looks reasonable.
> 
> @XiaohongGong Thanks for all the updates and bearing with all the review comments 😊

Tested the patch on AWS Graviton4 with the benchmarks provided, and the results match the reported numbers.

With `VM options: -XX:UseSVE=2 --add-modules=jdk.incubator.vector`

Benchmark                                   bits inputs Mode   Unit     Before         After         Gain
MaskQueryOperationsBenchmark.testToLongByte  128    1  thrpt  ops/s  269101754.957 1154781149.715  4.29
MaskQueryOperationsBenchmark.testToLongByte  128    2  thrpt  ops/s  269106841.271 1020391639.317  3.79
MaskQueryOperationsBenchmark.testToLongByte  128    3  thrpt  ops/s  269108088.073 1178242624.232  4.37
MaskQueryOperationsBenchmark.testToLongInt   128    1  thrpt  ops/s  833720082.241 1183112162.420  1.41
MaskQueryOperationsBenchmark.testToLongInt   128    2  thrpt  ops/s  851866517.512  905381882.385  1.06
MaskQueryOperationsBenchmark.testToLongInt   128    3  thrpt  ops/s  841908544.850 1010800908.258  1.20
MaskQueryOperationsBenchmark.testToLongLong  128    1  thrpt  ops/s  752714074.556 1116755995.074  1.48
MaskQueryOperationsBenchmark.testToLongLong  128    2  thrpt  ops/s  733777062.242 1117923992.880  1.52
MaskQueryOperationsBenchmark.testToLongLong  128    3  thrpt  ops/s  755390508.217 1125159886.042  1.48
MaskQueryOperationsBenchmark.testToLongShort 128    1  thrpt  ops/s  915079922.329 1183247213.309  1.29
MaskQueryOperationsBenchmark.testToLongShort 128    2  thrpt  ops/s  898902990.501 1157778493.700  1.28
MaskQueryOperationsBenchmark.testToLongShort 128    3  thrpt  ops/s  913979902.412 1183483647.121  1.29


With `VM options: -XX:UseSVE=1 --add-modules=jdk.incubator.vector`

Benchmark                                   bits inputs Mode   Unit     Before         After         Gain
MaskQueryOperationsBenchmark.testToLongByte  128    1  thrpt  ops/s  578862813.032  674722742.273  1.16
MaskQueryOperationsBenchmark.testToLongByte  128    2  thrpt  ops/s  577292103.016  671339970.996  1.16
MaskQueryOperationsBenchmark.testToLongByte  128    3  thrpt  ops/s  576827529.288  673882123.264  1.16
MaskQueryOperationsBenchmark.testToLongInt   128    1  thrpt  ops/s  792212973.997  957781054.650  1.20
MaskQueryOperationsBenchmark.testToLongInt   128    2  thrpt  ops/s  790683237.790  965247861.666  1.22
MaskQueryOperationsBenchmark.testToLongInt   128    3  thrpt  ops/s  794710366.832  981858552.787  1.23
MaskQueryOperationsBenchmark.testToLongLong  128    1  thrpt  ops/s  738425667.560  994493069.759  1.34
MaskQueryOperationsBenchmark.testToLongLong  128    2  thrpt  ops/s  736805923.837  979981983.578  1.33
MaskQueryOperationsBenchmark.testToLongLong  128    3  thrpt  ops/s  740591712.584  972150308.391  1.31
MaskQueryOperationsBenchmark.testToLongShort 128    1  thrpt  ops/s  784464050.733  994221594.464  1.26
MaskQueryOperationsBenchmark.testToLongShort 128    2  thrpt  ops/s  789528903.130  994094688.740  1.25
MaskQueryOperationsBenchmark.testToLongShort 128    3  thrpt  ops/s  779944943.316  979813192.314  1.25

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27481#issuecomment-3520532925


More information about the hotspot-compiler-dev mailing list