Variability of the performance of Vector<E>

Wed Jan 7 23:22:56 UTC 2026

----- Original Message -----
> From: "John Rose" <john.r.rose at oracle.com>
> To: "Peter Kessler OS" <peter.kessler at os.amperecomputing.com>
> Cc: hotspot-gc-dev at openjdk.org, "panama-dev" <panama-dev at openjdk.org>
> Sent: Thursday, January 8, 2026 12:03:00 AM
> Subject: Re: Variability of the performance of Vector<E>

> On 5 Jan 2026, Peter Kessler OS wrote:
> 
>> I am worried about the variability of the performance of Vector<E>.  Worse, I am
>> worried about how to explain to users the variability of the performance of
>> Vector<E>.
> …
>> Aligning arrays to avoid occasional 25% performance loss seems like a worthwhile
>> goal.  I would like to open a discussion about how that end might be achieved.
> 
> full message:
> https://mail.openjdk.org/pipermail/hotspot-gc-dev/2026-January/056951.html
> 
> Much of this is a question for panama-dev, where we are very aware
> of the difficulties of building a user model for vector programming.
> I appreciate the similar thread from your here on panama-dev:
> https://mail.openjdk.org/pipermail/panama-dev/2025-September/021141.html
> 
> But the final question is very important for the GC folks, and is indeed
> worth a discussion.  Actually we have been discussing it one way or another
> for years, at a low level of urgency.  Maybe there are enough concurrent
> factors to justify taking the plunge (in 2026?) towards hyper-aligned
> Java heap objects, at least large arrays.
> 
> Predictable vector performance requires aligned arrays, aligned either
> to cache lines or to some hardware vector size.
> 
> For Valhalla, if we decide to work with 128-bit value containers,
> we would need not only arrays but also class instances that are
> aligned to 128 bits.  (Why 128-bit containers?  Well, they are
> efficiently atomic on ARM and maybe x86, and many Valhalla types
> would use them if they could.  But misaligning them spoils
> atomicity.  Valhalla is limited to 64-bit flattening as long as
> the existing heap alignment schemes are present.)
> 
> Aligning an array is not exactly the same task as aligning an object,
> since for arrays you should align the address &a[0], while for an
> object o you must align some field &o.f, but you can get away with
> aligning &o._header (and put padding after the object header).
> 
> In this space, there are lots of ways to pick out a set of
> requirements and techniques.
> 
> Fundamentally, though, the GC needs to recognize that some array or
> (maybe) object is subject to hyper-alignment, and perform special-case
> allocation on it.  There’s lots of bookkeeping around that fundamental,
> including sizing logic (might we need more space for inter-object padding?)
> and of course the initial contracts.  (I.e., how does the user request
> hyper-alignment?)
> 
> And there is the delicate question of optimization:  How do we keep
> hot loops in the GC from acquiring an extra data-dependent test (for
> the "hyper-align bit").  Can we hide the test under another pre-existing
> test?  Can be batch things so that normally aligned objects are segregated
> from the hyper-aligned ones, and version our hot loops accordingly?
> 
> ("Another pre-existing test" — I’m thinking something like an object
> header test that already exists, where some rare object header bit
> configuration must already be tested for, and is expected to be
> rare.  In that case, all hyper-aligned objects, whether arrays or
> not, would be put into that rare header state, and on the rarely
> taken path in the GC loop that handles the pre-existing rare state,
> we’d also handle the case of hyper-alignment.  Seems likely that
> would be an option…)
> 
> On top of all that is portability — we have to do this work several
> times, once for each GC.  Or, if a particular GC configuration cannot
> support hyper-alignment, the user model must offer a fallback.
> 
> (The fallback might look like, "If you use the vector API you should
> really enable Z or G1".  And also, "You get better flattening for
> larger value objects if you run on ARM with Z or G1.")
> 
> I haven’t even begun to assign header bits here.  The problems are
> deeper than that!
> 
> I will say one more thing about arrays:  I think it would be very
> reasonable to align all arrays larger than a certain threshold size,
> fully to the platform cache line size, so that the &a[0] starts at
> a cache line boundary.  This goes for primitives, values, whatever.
> 
> Call this particular feature "large array hyper alignment".
> 
> It might be a first feature to implement in this space.
> 
> I have filed an RFE here: https://bugs.openjdk.org/browse/JDK-8374748
> 
> Note that many hot GC loops that process arrays are O(a.length).
> This means that doing a little extra work for long lengths is
> almost by definition a negligible overhead.
> 
> Large array hyper alignment would neatly solve Peter’s problem.
> 
> And it would give us a head start towards Valhalla atomics,
> as long as we didn’t paint ourselves into some corner.
> The RFE mentions some possible follow-on features.

For fields, at least from the Java side, at some point in the past, volatile was our marker,
but we lost it when we allowed VarHandle on any fields.

I wonder if we should not revisit that.

I think there is a possible future where we first disallow VarHandle on field typed with a value type that are not explicitly tagged as volatile
(i would help to avoid to consider fields in abstract value class as 64 bits) and then disallow VarHandle on all fields that are now marked as volatile (using a runtime warning as for final).

Basically, the same way we are making final really final, volatile could be used to say please align/pad because we may access that field atomically.

regards,
Rémi