Exploring Opportunities to Speed Up Vector API Performance on AArch64

Xiaohong Gong xiaohongg at nvidia.com
Mon Nov 3 06:14:00 UTC 2025


Hi Chiranmoy,

The API "VectorMask.fromLong()" is not supported by HotSpot on AArch64 if the platform lacks the SVE2's bitperm CPU feature, which significantly impacts performance.
If possible, could you please test this case on a CPU that supports SVE2? Thanks!

For existing benchmarks, you can refer to the following:
- https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/jdk/incubator/vector/MaskFromLongBenchmark.java
- https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/jdk/incubator/vector/MaskQueryOperationsBenchmark.java

Regarding the APIs' implementation with the "LDR" and "STR" instructions, I believe they can be utilized. However, these instructions require a memory address, not a long input/output.
Therefore, a temporary address is needed between the long variable and the predicate register, which is currently incompatible with the IR definition in C2.

We are dedicated to enhancing the performance of the Vector API on AArch64. If you have any questions or suggestions regarding the current API, please feel free to reach out. Your input is greatly appreciated!

Thanks,
Xiaohong

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20251103/3ae9d869/attachment-0001.htm>


More information about the panama-dev mailing list