RFR (S): Fix shutdown/cancelled races

Aleksey Shipilev shade at redhat.com
Thu Dec 8 13:25:03 UTC 2016


Hi,

The recent change for early cancellation introduced/exposed a few interesting
races in shutdown/cancellation sequence.

First race is on shutdown, and goes like this:
 a) SHHeap::stop() is called.
 b) SHHeap::stop() sets cancelled_gc to "true"
 c) SHConcThread loop detects canceled GC, and tries to exit
 d) SHConcThread fails, because neither full GC nor "terminate" is set
    assert (_do_full_gc || should_terminate(),
       "Either exiting, or impending Full GC");
 e) SHHeap::stop() eventually calls SHConcThread::stop() to set "terminate", but
it is too late.

Fixed by introducing the "graceful shutdown" flag.

Second race is between canceling GC and scheduling a full GC. Goes like this:
 a) ShenandoahHeap::collect() cancels GC
 b) SHConcThread loop detects canceled GC, and tries to exit
 c) SHConcThread fails, because neither full GC nor "terminate" is set
    assert (_do_full_gc || should_terminate(),
       "Either exiting, or impending Full GC");
 d) ShenandoahHeap::collect() eventually calls into do_full_gc() to set
_do_full_gc, but it is too late.

Solved by moving GC cancellation within the do_full_gc method, and canceling
after Full GC is scheduled.

Both fixes:
 http://cr.openjdk.java.net/~shade/shenandoah/cancel-races/webrev.01/

Testing: hs_gc_shenandoah (with sleeps in critical places to exacerbate races),
jcstress (tests-all) that was failing before.

Note that in last week's code both races could have tried to start concurrent
mark, or dived to sleep for 10ms, before SHConcThread could not detect it was
stopped. It would have exited early by detecting the canceled GC. New code
checks that early before doing the GC cycle, in case we slip like that again.

Thanks,
-Aleksey



More information about the shenandoah-dev mailing list