Variability of the performance of Vector<E>

Mon Jan 5 15:12:34 UTC 2026

Hi Peter,

It seems you are worried about array alignment when using the Vector API.
This is a known issue, and comes up from time to time.

See for example some detailed explanations in this PR:
https://github.com/openjdk/jdk/pull/25065
(it is about auto vectorization, but also has a section on the Vector API).

I also mentioned alignment in my JVMLS2025 talk:
https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/

I think you won't get around explaining alignment to users. It is just a the nature of vector instructions on the hardware.
Sometimes you can manage to align and get better performance. In other cases, alignment is not possible for a variety of reasons.

There are several issues with array alignment:

  *
ObjectAlignmentInBytes only aligns the header of an object. And not the payload of the array. In some cases, the header size is 12 bytes (for ints with UseCompactObjectHeaders). So even if you set ObjectAlignmentInBytes=64, your elements are still only 12 byte aligned. There has been discussion about at some far time in the future, where we could hyper-align large arrays, and make sure the payload is aligned, instead of the header. But that probably won't happen any time soon.
  *
GC can move your arrays, and change hence alignment can change over time.

When I have asked Paul Sandoz and others more involved with the API design, they told me that users should just use native memory, where alignment can be controlled at the allocation.

Personally, I'm half-satisfied with this suggestion. I could imagine some users would want to actively set the alignment of an array. That is difficult. Others may be satisfied with querying the alignment relative to the last cache line. So I suggested having something like "Arrays.alignment(arr)" that would give an alignment "hint", so the user could implement an alignment loop, followed by a vectorized loop, followed by a clean-up loop (classic pre-main-post, like the auto vectorizer generates). Others were hesitant, because that would suggest the users should now write 3 loops, which is going to be a nuisance. I think this discussion will come up again, but it is not a priority as far as I gather.

And: even if you get a performance penalty for misalignment: it is still very profitable to vectorize in most cases. A 25% regression due to misalignment only loses you a little on the large factors (e.g. 2x, 4x, 8x) you get from vectorizing in the first place.

There are also other "variability of performance" issues. For example availability of hardware instructions: if vector instructions are available you get nice speedups in comparison to a scalar implementation. But some instructions are not available, the Vector API falls back to a scalar implementation, which can drastically hurt performance, and even get you much worse performance compared to a scalar implementation of the algorithm.
See https://github.com/openjdk/jdk/pull/28639

It could be beneficial to improve the "performance notes" section of the Vector API docs:
https://download.java.net/java/early_access/jdk26/docs/api/jdk.incubator.vector/jdk/incubator/vector/package-summary.html#performance-notes-heading

Kind regards,
Emanuel

Confidential- Oracle Internal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20260105/39d12c3a/attachment-0001.htm>