Java value layout constants

Tue Nov 30 13:05:36 UTC 2021

Hi all,
there has been a twist in this story. When looking at the PR [1], Paul 
and I were reminded of a failure mode which occurred last year [2], 
where accessing double elements copied into a segment backed by a byte[] 
could sometimes fail (e.g. on x86 platform) because of misaligned access.

The moral of the story (more details below) is that enforcing alignment 
on heap segments is hit-and-miss in the current implementation, and can 
reveal sharp edges (e.g. some operation might reveal alignment decisions 
which might be JVM implementation dependent).

Because of this, I'd like to revise our plan, and leave Java layout 
constants as they are now (e.g. unaligned) in Java 18, while we fix 
handling of alignment and heap segments under the hood. Given the 
timeframe, this seems the most sensible choice.

If you are interested in more details, please continue reading below.

The issue in [2] revealed that, while on x64 platform we can rely on the 
first element of an array T[], for any T to be 64-bit aligned, that is 
not the case in x86. The issue has to do with how array object headers 
are laid out. The layout of a Java array is typically defined as follows 
(see [3])

1. 4-byte mark
2. 4/8-byte class pointer
3. 4-byte length
4. optional padding
5. elements

Now, in x64 platforms, the class pointer in (2) is typically 64-bits. 
This means that the header part of an array is 16 bytes in total, which 
in turn means that the first element of the array is always 8-byte 
aligned (because all heap objects are at least 8-byte aligned, 
regardless of platform, see [4]).

What about x86? Well, in x86 the class pointer is only 4 bytes, which 
means the header is 12 bytes. This gives a 32-bit VM more options: if 
the array is a int[], a VM might just store that element at offset 12, 
as that offset is 4-byte aligned. But if the array is a long[], the VM 
needs to insert some padding, so that the first element of the array 
will be at least 8-byte aligned (otherwise atomic operation will fail). 
This logic is reflected in [5], where the VM makes sure that for long[] 
and double[], elements are always (regardless of 32 vs. 64 bits) stored 
at offsets that are 64-bit aligned.

This obviously creates an asymmetry: we could create a memory segment 
backed by a double[], copy its elements into a segment backed by a 
byte[], and then try to retrieve 64-bit aligned double values from the 
second memory segment. This operation will succeed on 64-bit platforms 
(as byte[] and double[] have same alignment constraints there), but will 
fail spuriously on x86 platforms. But this is not just about 32bit vs 
64bit - other VM implementations might have different opinion on what 
alignment of array elements should be, and enhancements such as those 
proposed by Project Lilliput [6] can have profound implications in this 
area.

Where does this leave us? Checking for alignment is definitively useful 
to prevent bugs - but the simple check carried out by the memory segment 
API ends up *leaking* implementation decisions as to how array elements 
are laid out. Ideally we'd like to have an API whose failures are 
predictable, so the status quo isn't great. Note that the real issue 
here is not whether layout constants should be aligned or not - or what 
their alignment (if any) should be. The real issue is that the memory 
segment API does not enforce alignment in all situations, especially 
around memory copy. It is in fact possible to copy elements from a 
segment backed from an array that has _more_ alignment constraints into 
a segment backed by an array that has _less_ alignment constraints, w/o 
errors, which is a potential source of (alignment) bugs.

We believe (thanks John!) we have a story to generalize the alignment 
checks to heap segments, in a way that no implementation-dependent 
information is leaked - the basic idea is to observe that native 
segments and heap segments are different beasts: when working with a 
native segment we can always know the alignment properties of any 
address inside that segment (the alignment is a property of the bit 
pattern of that address - e.g. how many zeros appear at the end of the 
address). But heap segments addresses are *virtualized* - so there is 
nothing for the API to check (e.g. heap segments do not have a base 
address, so to speak). In order to have reliable alignment checks which 
work on both native and heap segments, our API should assume that memory 
addresses produced by an heap segments can never be more aligned than 
the element size of the Java array backing that heap segment. This means 
that if we have a segment backed by a short[], the *maximum alignment* 
constraint supported by this segment is, for instance, 2. If we try to 
store an aligned int inside this segment, an error should occur (whether 
the store occur as a result of dereference, or bulk-copy), as there is 
no guarantee that this operation is well-defined across all platforms. 
Conversely, a native segment has _no_ maximum alignment.

This strategy allows the API to implement alignment checks on heap 
segments in a predictable fashion, so that the outcome of an alignment 
check does not depend on the assumptions of a particular architecture, 
or on the set of enabled VM features. When this underlying issue is 
fixed, we can then have a discussion as to whether layout constants in 
ValueLayout should be aligned-by-default or not. Having aligned layout 
constants might be useful to prevent bugs, but will limit the 
flexibility of the API. But that's a decision for another day.

[1] - https://git.openjdk.java.net/jdk/pull/6589
[2] - https://bugs.openjdk.java.net/browse/JDK-8255343
[3] - 
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/oops/arrayOop.hpp#L35
[4] - 
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/globals.hpp#L132
[5] - 
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/oops/arrayOop.hpp#L70
[6] - https://openjdk.java.net/projects/lilliput/

Maurizio

On 25/11/2021 15:52, Maurizio Cimadamore wrote:
> Hi,
> This is a followup of the disccussion that started in [1]. In the new 
> changes slated for Java 18, the set of Java value layout constants are 
> all byte-aligned (e.g. alignment constraints are not set). The 
> motivation for this is mostly historical (but there's also a 
> performance twist, see below): the dereference primitives in 
> MemoryAccess used to setup var handles based on non-aligned layouts. 
> So, to preserve compatibility with what we had before, we opted to 
> "relax" alignment constraints on the JAVA_XYZ layout constants in 
> ValueLayout. During the development of the new dereference API, some 
> issues arised around alignment checks and memory copy [2]; which also 
> contributed to consolidate the feeling that Java layout constants 
> should be unaligned.
>
> Now, while it's always possible, for clients, to go back to the 
> desired alignment constraints (e.g. by defining custom layout 
> constants), from the discussion it emerged that it can be somewhat 
> confusing/surprising having a layout constant called JAVA_INT, whose 
> alignment is not the VM alignment for a Java int value.
>
> For this reason, I'd like to propose a small tweak, which would 
> essentially revert alignment constraints for Java layout constants to 
> what they were in 17. In other words, let's keep the "good" JAVA_XYZ 
> names for the _true_ Java layouts (including alignment as seen by VM). 
> If clients want to create unaligned constants they can do so, as they 
> can also create big-endian constants where needed. In the majority of 
> cases, since access will be aligned (for performance reasons), this 
> will not really change much for clients. But some of those clients 
> that need to pack data structures more (Lucene?) will need to define 
> their own packed/unaligned layout constants.
>
> Does that seem like an acceptable compromise?
>
> A patch for these changes is available here:
>
> https://github.com/mcimadamore/jdk/tree/value_layout_align
>
> While testing it, I was reminded (once more) that access with 
> alignment constraints is currently slower than access w/o alignment 
> constraints - which has to do with C2 not hoisting alignment checks in 
> cases like this:
>
> ((segmentBaseAddress + accessedOffset) & alignmentMask) == 0
>
> Here, segmentBaseAddress is a loop invariant, and the accessedOffset 
> depends on the loop variable. So, it is in principle possible for the 
> VM to hoist the check for baseAddress and to eliminate the alignment 
> check for the offset (which would come from BCE analysis). But this is 
> not how things work today. The patch works around this, by using 
> different var handles for when the accessed offset is provably aligned 
> (e.g. when using the getAtIndex/setAtIndex APIs). Even with those 
> workarounds, calling getAtIndex/setAtIndex on a MemoryAddress is still 
> slower than on a MemorySegment, because of the way in which we try to 
> workaround the long loop optimization problem. Luckily a fix for that 
> problem [3] has been integrated in JDK 18, which means we will remove 
> these implementation workaround, which will help making performance 
> more stable across the board.
>
> If the changes in this patch seem good, I'm happy to try and integrate 
> this into 18.
>
> Cheers
> Maurizio
>
> [1] - 
> https://mail.openjdk.java.net/pipermail/panama-dev/2021-November/015805.html
> [2] - 
> https://github.com/openjdk/panama-foreign/pull/555#issuecomment-865115787
> [3] - https://github.com/openjdk/jdk/pull/2045
>
>
>