RFR (S): Fix shutdown/cancelled races
Roman Kennke
rkennke at redhat.com
Thu Dec 8 14:57:47 UTC 2016
Patch looks good.
Did you need to change any of the tests? E.g. "with sleeps in critical
places to exacerbate races" ?? I can't tell you how often we have
'fixed' this code before... having a test triggering on the bug would
be awesome!
Roman
Am Donnerstag, den 08.12.2016, 14:25 +0100 schrieb Aleksey Shipilev:
> Hi,
>
> The recent change for early cancellation introduced/exposed a few
> interesting
> races in shutdown/cancellation sequence.
>
> First race is on shutdown, and goes like this:
> a) SHHeap::stop() is called.
> b) SHHeap::stop() sets cancelled_gc to "true"
> c) SHConcThread loop detects canceled GC, and tries to exit
> d) SHConcThread fails, because neither full GC nor "terminate" is
> set
> assert (_do_full_gc || should_terminate(),
> "Either exiting, or impending Full GC");
> e) SHHeap::stop() eventually calls SHConcThread::stop() to set
> "terminate", but
> it is too late.
>
> Fixed by introducing the "graceful shutdown" flag.
>
> Second race is between canceling GC and scheduling a full GC. Goes
> like this:
> a) ShenandoahHeap::collect() cancels GC
> b) SHConcThread loop detects canceled GC, and tries to exit
> c) SHConcThread fails, because neither full GC nor "terminate" is
> set
> assert (_do_full_gc || should_terminate(),
> "Either exiting, or impending Full GC");
> d) ShenandoahHeap::collect() eventually calls into do_full_gc() to
> set
> _do_full_gc, but it is too late.
>
> Solved by moving GC cancellation within the do_full_gc method, and
> canceling
> after Full GC is scheduled.
>
> Both fixes:
> http://cr.openjdk.java.net/~shade/shenandoah/cancel-races/webrev.01/
>
> Testing: hs_gc_shenandoah (with sleeps in critical places to
> exacerbate races),
> jcstress (tests-all) that was failing before.
>
> Note that in last week's code both races could have tried to start
> concurrent
> mark, or dived to sleep for 10ms, before SHConcThread could not
> detect it was
> stopped. It would have exited early by detecting the canceled GC. New
> code
> checks that early before doing the GC cycle, in case we slip like
> that again.
>
> Thanks,
> -Aleksey
>
More information about the shenandoah-dev
mailing list