Exploring Opportunities to Speed Up Vector API Performance on AArch64
Paul Sandoz
paul.sandoz at oracle.com
Fri Oct 31 16:40:39 UTC 2025
Hi Chiranmoy,
The following PR is seems directly related:
https://github.com/openjdk/jdk/pull/27481
If so you could verify the code gen from this PR. Instead of benchmarks the PR provides IR tests which asserts that C2 generates the correct IR nodes.
Paul.
On Oct 30, 2025, at 11:17 PM, Chiranmoy.Bhattacharya at fujitsu.com wrote:
Hi all,
This is regarding Vector API performance for AArch64 CPUs. We have recently
used the Vector API to implement bit packing and unpacking of boolean values.
For benchmarking, we've used JMH with JDK 24.
Bit-packing: We've used VectorMask.fromArray(…).toLong(…) and observed
some improvement in throughput.
Unpacking: We've used VectorMask.fromLong(…).intoArray(…), but noticed
a sharp performance degradation.
On inspecting the assembly with the HotSpot disassembler, we noticed that
SVE instructions such as STR-predicate [0] and LDR-predicate [1], which
match well with this use case, are not being generated. Instead, it relies
on shifts, rotations, and bitwise operations.
With this mail, we’d like to explore opportunities for improving the
performance of VectorMask operations on Arm by leveraging direct predicate
instructions (STR/LDR) rather than bitwise operations.
Please suggest if we can reuse any existing JMH benchmark to replicate this
issue or we can contribute a new one to the OSS benchmark to collaborate on
this further.
[0] https://dougallj.github.io/asil/doc/str_p_bi_8.html
[1] https://dougallj.github.io/asil/doc/ldr_p_bi_8.html
Regards,
Chiranmoy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20251031/15edd5a6/attachment-0001.htm>
More information about the panama-dev
mailing list