RFR (S): Fix shutdown/cancelled races

Roman Kennke rkennke at redhat.com
Thu Dec 8 14:57:47 UTC 2016


Patch looks good.

Did you need to change any of the tests? E.g. "with sleeps in critical
places to exacerbate races" ?? I can't tell you how often we have
'fixed' this code before... having a test triggering on the bug would
be awesome!

Roman


Am Donnerstag, den 08.12.2016, 14:25 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> The recent change for early cancellation introduced/exposed a few
> interesting
> races in shutdown/cancellation sequence.
> 
> First race is on shutdown, and goes like this:
>  a) SHHeap::stop() is called.
>  b) SHHeap::stop() sets cancelled_gc to "true"
>  c) SHConcThread loop detects canceled GC, and tries to exit
>  d) SHConcThread fails, because neither full GC nor "terminate" is
> set
>     assert (_do_full_gc || should_terminate(),
>        "Either exiting, or impending Full GC");
>  e) SHHeap::stop() eventually calls SHConcThread::stop() to set
> "terminate", but
> it is too late.
> 
> Fixed by introducing the "graceful shutdown" flag.
> 
> Second race is between canceling GC and scheduling a full GC. Goes
> like this:
>  a) ShenandoahHeap::collect() cancels GC
>  b) SHConcThread loop detects canceled GC, and tries to exit
>  c) SHConcThread fails, because neither full GC nor "terminate" is
> set
>     assert (_do_full_gc || should_terminate(),
>        "Either exiting, or impending Full GC");
>  d) ShenandoahHeap::collect() eventually calls into do_full_gc() to
> set
> _do_full_gc, but it is too late.
> 
> Solved by moving GC cancellation within the do_full_gc method, and
> canceling
> after Full GC is scheduled.
> 
> Both fixes:
>  http://cr.openjdk.java.net/~shade/shenandoah/cancel-races/webrev.01/
> 
> Testing: hs_gc_shenandoah (with sleeps in critical places to
> exacerbate races),
> jcstress (tests-all) that was failing before.
> 
> Note that in last week's code both races could have tried to start
> concurrent
> mark, or dived to sleep for 10ms, before SHConcThread could not
> detect it was
> stopped. It would have exited early by detecting the canceled GC. New
> code
> checks that early before doing the GC cycle, in case we slip like
> that again.
> 
> Thanks,
> -Aleksey
> 


More information about the shenandoah-dev mailing list