RFR: 8293252: Shenandoah: ThreadMXBean synchronizer tests crash with IU+aggressive mode

William Kemper wkemper at openjdk.org
Fri Sep 16 18:54:48 UTC 2022


On Fri, 16 Sep 2022 01:33:31 GMT, Ashutosh Mehra <duke at openjdk.org> wrote:

>> Hmm, `OopHandle::resolve` should have gone through the LRB, which should have evacuated the object. If the object is in the `cset` and it didn't get evacuated by the LRB, then it should be true that the gc is cancelled (out-of-memory during evac). Did this happen under low memory conditions? I wonder if the VM thread did _try_ to evacuate it and fail?
>
> If `OopHandle::resolve()` is hit, then the object gets evacuated through LRB, but the assertions happen before the code calls `resolve`.
> The assertion `assert_not_in_cset` is hit when `OopHandle` is created (see frame 16 in the stack trace mentioned [here](https://github.com/openjdk/jdk/pull/10268#issuecomment-1247282078))
> The new assertion that I mentioned in previous comment also happens before the `OopHandle` is resolved.

Had some time to look at this a little more closely. The bad access comes from here. The heap walk gives out references to objects without putting them through the load barrier:
https://github.com/openjdk/jdk/blob/6beeb8471ccf140e59864d7f983e1d9ad741309c/src/hotspot/share/memory/heapInspection.cpp#L633-L641

The assert happens when the heap inspection tries to create a pointer to an object in the collection set that was visited during this heap walk. The comment and the effort to keep the doomed object alive are concerning. I can think of two ways to address  this:
1. Have the inspection/heap walk put objects through the LRB before passing them on to closures.
2. Disable the assertion if it is running on a safepoint.

Option 1. is probably the Right Thing to do, but I'm no sure what would happen if the VMThread is unable to find memory for the evacuated object. Option 2. _might_ be safe, but I don't know what it means to be `published` here and can't be 100% certain the reference to the doomed object won't survive after the safepoint without knowing that.

-------------

PR: https://git.openjdk.org/jdk/pull/10268


More information about the shenandoah-dev mailing list