RFR: 8293252: Shenandoah: ThreadMXBean synchronizer tests crash with aggressive heuristics [v2]
William Kemper
wkemper at openjdk.org
Thu Sep 22 16:41:20 UTC 2022
On Thu, 22 Sep 2022 01:45:42 GMT, Ashutosh Mehra <duke at openjdk.org> wrote:
>> Please review the fix for the assertion failure when running ThreadMXBean synchronizer code in fastdebug mode.
>>
>> Please note that the assertion failure happens in regular (satb) mode as well, not just in iu mode.
>>
>> At the point of assertion failure, the JVM is at a safepoint as the VMThread is executing VM_ThreadDump operation and the GC worker threads are paused after the mark phase.
>> Assertion happened when the VMThread performs `NativeAccess<>::oop_store()` (as part of creating an `OopHandle`) on an object which has been marked and is in the collection set. But the failing assertion in `oop_store_not_in_heap()` expects the oop is not in the collection set.
>>
>> The assertion is conditional on the statement `value != NULL && !ShenandoahHeap::heap()->cancelled_gc()`
>>
>> shenandoah_assert_not_in_cset_if(addr, value, value != NULL && !ShenandoahHeap::heap()->cancelled_gc());
>>
>> But it looks like it is missing another important condition - the GC should be in marking phase.
>> i.e. the assertion is valid only if it is protected by `ShenandoahHeap::heap()->is_concurrent_mark_in_progress()`.
>> So the correct assertion should be:
>>
>> shenandoah_assert_not_in_cset_if(addr, value, value != NULL && !ShenandoahHeap::heap()->cancelled_gc() && ShenandoahHeap::heap()->is_concurrent_mark_in_progress());
>>
>> In fact this assertion is already present in one of its caller (`oop_store_in_heap()1) in its negative form:
>>
>> shenandoah_assert_not_in_cset_except (addr, value, value == NULL || ShenandoahHeap::heap()->cancelled_gc() || !ShenandoahHeap::heap()->is_concurrent_mark_in_progress());
>>
>>
>> Also, it reads strange that `oop_store_in_heap()` would call `oop_store_not_in_heap()`.
>> So I have moved the code to a separate method `oop_store_common()` that gets called by both `oop_store_in_heap()` and `oop_store_not_in_heap()`.
>>
>> Tested it by running following tests in fastdebug mode:
>>
>> - hotspot_gc_shenandoah
>> - java/lang/management/ThreadMXBean/MyOwnSynchronizer.java (50 times with -XX:ShenandoahGCHeuristics=aggressive)
>> - java/lang/management/ThreadMXBean/LockedSynchronizers.java (50 times with -XX:ShenandoahGCHeuristics=aggressive)
>>
>> Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
>
> Ashutosh Mehra has updated the pull request incrementally with two additional commits since the last revision:
>
> - Use HeapAccess instead of RawAccess when iterating the heap objects
>
> Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
> - Make oop_store_common() as private method and remove extraneous condition
>
> Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1340:
> 1338: template <class T>
> 1339: void do_oop_work(T* p) {
> 1340: oop obj = HeapAccess<AS_NO_KEEPALIVE>::oop_load(p);
After taking a closer look at the LRB code, I think we should `or` in `ON_WEAK_OOP_REF` to the access decorators. With only the `AS_NO_KEEPALIVE` decorator, the LRB will fall through all the cases here and end up evacuating a doomed weak referent (i.e., keeping the object alive).
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp#L119
I'm worried that this access here could defeat the logic which is meant to keep weak references from being resurrected. It also looks like if we add in the `ON_WEAK_OOP_REF` decorator, the LRB will handle this case for the object iterator too:
if (_heap->is_concurrent_weak_root_in_progress() && !_marking_context->is_marked(obj)) {
// There may be dead oops in weak roots in concurrent root phase, do not touch them.
return;
}
-------------
PR: https://git.openjdk.org/jdk/pull/10268
More information about the shenandoah-dev
mailing list