RFR: Safepooint suspendible worker threads

Roman Kennke rkennke at redhat.com
Thu Jun 1 10:44:58 UTC 2017


Am 31.05.2017 um 22:41 schrieb Roman Kennke:
> Am 31.05.2017 um 22:30 schrieb Roman Kennke:
>> Currently, our GC workers are unaffected by (non-GC) safepoints: they
>> happily carry on working their stuff. This is usually not a problem.
>> However, lately we found that concurrent code cache marking sometimes
>> barfs between the nmethod marking/sweeping sometimes steps in between
>> (during a non-GC safepoint). Also, heapdump seems to rely on the heap
>> holding still (naturally). We don't really know what other VM_Ops might
>> depend on the heap/GC holding still and the general assumption is that
>> nothing moves at a safepoint except the VMThread and the workers it spawns.
>>
>> This change makes Shenandoah's GC worker threads suspend at safepoints.
>>
>> It uses G1's (not G1-specific) SuspendibleThreadSet.
>>
>> It requires some extra magic to coordinate with full-gc. Specifically,
>> when we check for cancelled heap, we must not yield when being
>> cancelled. In order to do so, we first CAS _cancelled_gc to
>> NOT_CANCELLED (prevents cancelling threads to bump it to CANCELLED while
>> we yield()), then yield() (check for safepoint and suspend).
>>
>> I spared making ShenandoahConcurrentThread to take part in this dance.
>> I'll probably add that later, however it seems not important (and will
>> require some additional complicated co-ordination).
>>
>> It's enabled by -XX:+ShenandoahSuspendibleWorkers and off by default.
>> Works like a charm for me on or off. We should check if/how it affects
>> performance.
>>
>> Testing: hotspot_gc_shenandoah and specjvm
>>
>> http://cr.openjdk.java.net/~rkennke/suspendibleworkers/webrev.00/
>> <http://cr.openjdk.java.net/%7Erkennke/suspendibleworkers/webrev.00/>
>>
>> Roman
>>
> Need to suspend (no pun!!) this RFR. Found a deadlock.

Ok, found it. We need to leave the STS before offer_termination(),
otherwise it will wait forever when other GC threads go into
cancel/yield path.

Tested using SPECjvm with fastdebug and release plus jcstress -m quick
with fastdebug. No crashes.

Performance in SPECjvm seems unaffected.
-XX:+PrintSafepointStatistcs shows a slight increase in max sync time.
Could also be noise. This certainly needs further investigation before
turning this on by default.

http://cr.openjdk.java.net/~rkennke/suspendibleworkers/webrev.01/
<http://cr.openjdk.java.net/%7Erkennke/suspendibleworkers/webrev.01/>

Ok now?

Roman


More information about the shenandoah-dev mailing list