RFR [8]: Fix shutdown deadlock due to blocking SATB flush periodic task
Roman Kennke
rkennke at redhat.com
Tue Oct 2 14:18:27 UTC 2018
> Found the reason for rare shutdown deadlock in sh/jdk8.
>
> It goes like this. There is a DestroyVMThread that does the shutdown sequence. Before notifying
> ShHeap about termination via ShHeap::stop(), it stops WatcherThread that executes periodic tasks.
> One of those periodic tasks is from Shenandoah, which calls ShHeap::force_satb_flush_all_threads.
>
> The deadlock happens when GC cycle wants to synchronize for safepoint at the same time. To do that,
> safepoint synchronizer acquires Threads_lock for iterating the threads, and waits for threads to
> report back. At the same time, DestroyVMThread enters the shutdown sequence, and tries to stop
> WatcherThread, blocking until WatcherThread is dead, without checking for pending safepoint. But
> WatcherThread is locked on Threads_lock inside ShHeap::force_satb_flush_all_threads, which is
> already held by safepoint synchronizer, so the whole thing deadlocks. -XX:+SafepointTimeout then
> predictably fails with "DestroyVMThread did not reach safepoint".
>
> I think this is our fault: we should not really have blocking periodic tasks. In sh/jdk11 and later,
> we have Thread SMR facility that does thread iteration without locking. In sh/jdk8 we can avoid
> indefinite blocking by try_lock'ing the Threads_lock. We can give up on setting the flush flag,
> because it is a hint anyway.
>
> Goes like this:
>
> diff -r ff9961cece0a src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp
> --- a/src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp Tue Oct 02 13:17:33 2018 +0200
> +++ b/src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp Tue Oct 02 15:49:45 2018 +0200
> @@ -1588,18 +1588,28 @@
> void ShenandoahHeap::force_satb_flush_all_threads() {
> if (!is_concurrent_mark_in_progress()) {
> // No need to flush SATBs
> return;
> }
>
> - MutexLocker ml(Threads_lock);
> - JavaThread::set_force_satb_flush_all_threads(true);
> -
> - // The threads are not "acquiring" their thread-local data, but it does not
> - // hurt to "release" the updates here anyway.
> - OrderAccess::fence();
> + // Do not block if Threads lock is busy. This avoids the potential deadlock
> + // when this code is called from the periodic task, and something else is
> + // expecting the periodic task to complete without blocking. On the off-chance
> + // Threads lock is busy momentarily, try to acquire several times.
> + for (int t = 0; t < 10; t++) {
> + if (Threads_lock->try_lock()) {
> + JavaThread::set_force_satb_flush_all_threads(true);
> + Threads_lock->unlock();
> +
> + // The threads are not "acquiring" their thread-local data, but it does not
> + // hurt to "release" the updates here anyway.
> + OrderAccess::fence();
> + break;
> + }
> + os::naked_short_sleep(1);
> + }
> }
Very good! The fix looks reasonable.
> Testing: hotspot_tier3_gc_shenandoah, deadlock stress tests (changed config for EvilSyncBug, shall
> contribute it separately to sh/jdk first)
That'd be good.
Thanks, Roman
More information about the shenandoah-dev
mailing list