[lworld] RFR: 8370479: [lworld] OOP-related crashes via mvn surefire

Paul Hübner phubner at openjdk.org
Tue Nov 4 08:42:18 UTC 2025


Hi all, 

This PR fixes [JDK-8370479](https://bugs.openjdk.org/browse/JDK-8370479), as we currently don't emit barriers correctly. Tested tiers 1-3.

## Context

If we consider a value class/record that contains a single field, which is a reference to an identity object:

public static value record Element(Identity underlying) {}
public static class Identity {}

We can create a flattened array via the JDK-internal `ValueClass` API:

Object[] array = ValueClass.newNullableAtomicArray(Element.class, 16);

This will indeed be flattened when running with compressed oops. T the reference to `underlying` will be four bytes, and the null-marker an additional byte. Hence, we are below the 64-bit limit. Copying this array via `Arrays.copyOf` will trigger Valhalla-specific copying.  

When running with G1, there are various crashes and verification errors. This should not impact ZGC, as the pointers are too large to be flattened in a nullable array.

## New Barrier Emission

We do not emit a post-write barrier when copying to an uninitialized memory destination. The tables below summarize what barriers, if any, are emitted both in the old and new versions of the copy implementation. Note that Serial, Parallel and G1 have the notion of post-write barriers to track intergenerational references. G1 is the only GC requiring pre-write barriers. 

**Old G1 barrier emission during flat array copying:**
|  | oopless | contains oops |
| --- | --- | --- |
| uninitialized | | |
| initialized | | pre, post |

**New G1 barrier emission during flat array copying:**
| | oopless | contains oops |
| --- | --- | --- |
| uninitialized | | post |
| initialized | | pre, post |

As mentioned, when copying to uninitialized memory, a cross-generational could be "lost" due to the lack of a post-write barrier. We should *not* use a pre-write barrier when copying to uninitialized memory when running with G1. Doing so means we may get garbage in our SATB buffers.

## New Test Cases

I introduce a test scenario where we grow a flat array similar to how one would grow an `ArrayList`. This should generate plenty of garbage, and triggers this crash even without the whitebox GC. I test the three GCs that use `ModRefBarrierSet`: Serial, Parallel and G1. These are tweaked to be less concurrent/parallel to aid with reproducability in case of crashes.

## Fixed Oop Printing

When G1 verification fails, it tries to print diagnostic information. This will eventually end up printing oops. We handle the case of `String` oops specially, and for that we need to check the klass. However, in this failed verification state, we can't guarantee that the class isn't garbage (either through a race or literal garbage). While debugging this issue, I ran into a scenario where the klass does not pass assertion. Consequently, we crash before the helpful diagnostic error messages finish printing. I've introduced a `klass_without_asserts` version of the string check, intended to be used only for diagnostics, which will perform the `String` check even if the VM is metaphorically on fire after a failed GC. That way, G1 is able to finish printing what it wants to print.

-------------

Commit messages:
 - Rework the G1 barrier model with flat oops.
 - Improve flat array copying test.
 - Clean up test.
 - Add test.
 - Update Access API to not check uninitialized memory.
 - Fix oop error printing bug.

Changes: https://git.openjdk.org/valhalla/pull/1713/files
  Webrev: https://webrevs.openjdk.org/?repo=valhalla&pr=1713&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8370479
  Stats: 131 lines in 5 files changed: 128 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/valhalla/pull/1713.diff
  Fetch: git fetch https://git.openjdk.org/valhalla.git pull/1713/head:pull/1713

PR: https://git.openjdk.org/valhalla/pull/1713


More information about the valhalla-dev mailing list