RFR: 8293252: Shenandoah: ThreadMXBean synchronizer tests crash with IU+aggressive mode
William Kemper
wkemper at openjdk.org
Fri Sep 16 20:34:43 UTC 2022
On Fri, 16 Sep 2022 19:48:02 GMT, Ashutosh Mehra <duke at openjdk.org> wrote:
>> Had some time to look at this a little more closely. The bad access comes from here. The heap walk gives out references to objects without putting them through the load barrier:
>> https://github.com/openjdk/jdk/blob/6beeb8471ccf140e59864d7f983e1d9ad741309c/src/hotspot/share/memory/heapInspection.cpp#L633-L641
>>
>> The assert happens when the heap inspection tries to create a pointer to an object in the collection set that was visited during this heap walk. The comment and the effort to keep the doomed object alive are concerning. I can think of two ways to address this:
>> 1. Have the inspection/heap walk put objects through the LRB before passing them on to closures.
>> 2. Disable the assertion if it is running on a safepoint.
>>
>> Option 1. is probably the Right Thing to do, but I'm no sure what would happen if the VMThread is unable to find memory for the evacuated object. Option 2. _might_ be safe, but I don't know what it means to be `published` here and can't be 100% certain the reference to the doomed object won't survive after the safepoint without knowing that.
>
>> The assert happens when the heap inspection tries to create a pointer to an object in the collection set that was visited during this heap walk
>
> That's right.
> I am inclined towards option 1 as it should (theoretically speaking) solve the other assertion failure as well.
>
>> Option 1. is probably the Right Thing to do, but I'm no sure what would happen if the VMThread is unable to find memory for the evacuated object.
>
> Can you please help me understand if we add LRB during heap inspection, why would VMThread ever be in situation where it fails to find memory for the evacuated object?
I just meant it could suffer from an out-of-memory error just like any other evacuating thread, but I think it'd be fine. The LRB would cancel the GC and the from-space reference would be returned (cancelling the GC would nullify the assert). If the from-space reference is still reachable after the thread dump completes, the degenerated cycle should fix it up.
-------------
PR: https://git.openjdk.org/jdk/pull/10268
More information about the hotspot-gc-dev
mailing list