RFR [8]: Fix shutdown deadlock due to blocking SATB flush periodic task

Tue Oct 2 14:18:27 UTC 2018

> Found the reason for rare shutdown deadlock in sh/jdk8.
> 
> It goes like this. There is a DestroyVMThread that does the shutdown sequence. Before notifying
> ShHeap about termination via ShHeap::stop(), it stops WatcherThread that executes periodic tasks.
> One of those periodic tasks is from Shenandoah, which calls ShHeap::force_satb_flush_all_threads.
> 
> The deadlock happens when GC cycle wants to synchronize for safepoint at the same time. To do that,
> safepoint synchronizer acquires Threads_lock for iterating the threads, and waits for threads to
> report back. At the same time, DestroyVMThread enters the shutdown sequence, and tries to stop
> WatcherThread, blocking until WatcherThread is dead, without checking for pending safepoint. But
> WatcherThread is locked on Threads_lock inside ShHeap::force_satb_flush_all_threads, which is
> already held by safepoint synchronizer, so the whole thing deadlocks. -XX:+SafepointTimeout then
> predictably fails with "DestroyVMThread did not reach safepoint".
> 
> I think this is our fault: we should not really have blocking periodic tasks. In sh/jdk11 and later,
> we have Thread SMR facility that does thread iteration without locking. In sh/jdk8 we can avoid
> indefinite blocking by try_lock'ing the Threads_lock. We can give up on setting the flush flag,
> because it is a hint anyway.
> 
> Goes like this:
> 
> diff -r ff9961cece0a src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp
> --- a/src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp      Tue Oct 02 13:17:33 2018 +0200
> +++ b/src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp      Tue Oct 02 15:49:45 2018 +0200
> @@ -1588,18 +1588,28 @@
>  void ShenandoahHeap::force_satb_flush_all_threads() {
>    if (!is_concurrent_mark_in_progress()) {
>      // No need to flush SATBs
>      return;
>    }
> 
> -  MutexLocker ml(Threads_lock);
> -  JavaThread::set_force_satb_flush_all_threads(true);
> -
> -  // The threads are not "acquiring" their thread-local data, but it does not
> -  // hurt to "release" the updates here anyway.
> -  OrderAccess::fence();
> +  // Do not block if Threads lock is busy. This avoids the potential deadlock
> +  // when this code is called from the periodic task, and something else is
> +  // expecting the periodic task to complete without blocking. On the off-chance
> +  // Threads lock is busy momentarily, try to acquire several times.
> +  for (int t = 0; t < 10; t++) {
> +    if (Threads_lock->try_lock()) {
> +      JavaThread::set_force_satb_flush_all_threads(true);
> +      Threads_lock->unlock();
> +
> +      // The threads are not "acquiring" their thread-local data, but it does not
> +      // hurt to "release" the updates here anyway.
> +      OrderAccess::fence();
> +      break;
> +    }
> +    os::naked_short_sleep(1);
> +  }
>  }

Very good! The fix looks reasonable.

> Testing: hotspot_tier3_gc_shenandoah, deadlock stress tests (changed config for EvilSyncBug, shall
> contribute it separately to sh/jdk first)

That'd be good.

Thanks, Roman