RFR: 8293252: Shenandoah: ThreadMXBean synchronizer tests crash with aggressive heuristics [v2]

Thu Sep 22 16:41:20 UTC 2022

On Thu, 22 Sep 2022 01:45:42 GMT, Ashutosh Mehra <duke at openjdk.org> wrote:

>> Please review the fix for the assertion failure when running ThreadMXBean synchronizer code in fastdebug mode.
>> 
>> Please note that the assertion failure happens in regular (satb) mode as well, not just in iu mode.
>> 
>> At the point of assertion failure, the JVM is at a safepoint as the VMThread is executing VM_ThreadDump operation and the GC worker threads are paused after the mark phase.
>> Assertion happened when the VMThread performs `NativeAccess<>::oop_store()` (as part of creating an `OopHandle`) on an object which has been marked and is in the collection set. But the failing assertion in `oop_store_not_in_heap()` expects the oop is not in the collection set.
>> 
>> The assertion is conditional on the statement `value != NULL && !ShenandoahHeap::heap()->cancelled_gc()`
>> 
>> 	shenandoah_assert_not_in_cset_if(addr, value, value != NULL && !ShenandoahHeap::heap()->cancelled_gc());
>> 
>> But it looks like it is missing another important condition - the GC should be in marking phase.
>> i.e. the assertion is valid only if it is protected by `ShenandoahHeap::heap()->is_concurrent_mark_in_progress()`.
>> So the correct assertion should be:
>> 
>> 	shenandoah_assert_not_in_cset_if(addr, value, value != NULL && !ShenandoahHeap::heap()->cancelled_gc() && ShenandoahHeap::heap()->is_concurrent_mark_in_progress());
>> 
>> In fact this assertion is already present in one of its caller (`oop_store_in_heap()1) in its negative form:
>> 
>> 	shenandoah_assert_not_in_cset_except    (addr, value, value == NULL || ShenandoahHeap::heap()->cancelled_gc() || !ShenandoahHeap::heap()->is_concurrent_mark_in_progress());
>> 
>> 
>> Also, it reads strange that `oop_store_in_heap()` would call `oop_store_not_in_heap()`.
>> So I have moved the code to a separate method `oop_store_common()` that gets called by both `oop_store_in_heap()` and `oop_store_not_in_heap()`.
>> 
>> Tested it by running following tests in fastdebug mode:
>> 
>> - hotspot_gc_shenandoah
>> - java/lang/management/ThreadMXBean/MyOwnSynchronizer.java (50 times with -XX:ShenandoahGCHeuristics=aggressive)
>> - java/lang/management/ThreadMXBean/LockedSynchronizers.java (50 times with -XX:ShenandoahGCHeuristics=aggressive)
>> 
>> Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
>
> Ashutosh Mehra has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Use HeapAccess instead of RawAccess when iterating the heap objects
>    
>    Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>
>  - Make oop_store_common() as private method and remove extraneous condition
>    
>    Signed-off-by: Ashutosh Mehra <asmehra at redhat.com>

src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1340:

> 1338:   template <class T>
> 1339:   void do_oop_work(T* p) {
> 1340:     oop obj = HeapAccess<AS_NO_KEEPALIVE>::oop_load(p);

After taking a closer look at the LRB code, I think we should `or` in `ON_WEAK_OOP_REF` to the access decorators. With only the `AS_NO_KEEPALIVE` decorator, the LRB will fall through all the cases here and end up evacuating a doomed weak referent (i.e., keeping the object alive). 
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp#L119

I'm worried that this access here could defeat the logic which is meant to keep weak references from being resurrected. It also looks like if we add in the `ON_WEAK_OOP_REF` decorator, the LRB will handle this case for the object iterator too:

if (_heap->is_concurrent_weak_root_in_progress() && !_marking_context->is_marked(obj)) {
        // There may be dead oops in weak roots in concurrent root phase, do not touch them.
        return;
}

-------------

PR: https://git.openjdk.org/jdk/pull/10268