RFR (XS) fix for a safepoint deadlock (8047720)

David Holmes david.holmes at oracle.com
Mon Jun 30 05:54:00 UTC 2014


Correction ...

On 30/06/2014 3:33 PM, David Holmes wrote:
> Hi Dan,
>
> I see this has already gone in but I think it is worth looking closer at
> this.
>
> On 28/06/2014 2:18 AM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I have a fix ready for the following bug:
>>
>>      8047720 Xprof hangs on Solaris
>>      https://bugs.openjdk.java.net/browse/JDK-8047720
>>
>> Here is the webrev URL:
>>
>> http://cr.openjdk.java.net/~dcubed/8047720-webrev/0-jdk9-hs-rt/
>>
>> This deadlock occurred between the following threads:
>>
>>      Main thread   - Trying to stop the WatcherThread as part of
>>                      shutting down the VM; this thread is blocked
>>                      on the PeriodicTask_lock which keeps it from
>>                      reaching a safepoint.
>>      WatcherThread - Requested a VM_ForceSafepoint to complete
>>                      a JavaThread::java_suspend() call as part
>>                      of a FlatProfiler record_thread_ticks()
>>                      call; this thread owns the PeriodicTask_lock
>>                      since it is processing a periodic task.
>>      VMThread      - Trying to start a safepoint; this thread is
>>                      blocked waiting for the Main thread to reach
>>                      a safepoint.
>>
>> The PeriodicTask_lock is one of the VM internal locks and is
>> typically managed using Mutex::_no_safepoint_check_flag to
>> avoid deadlocks. Yes, the irony is dripping on the floor... :-)
>
> What was overlooked here is that the holder of a lock that is acquired
> without safepoint checks, must never block at a safepoint whilst holding
> that lock. In this case the blocking is indirect, caused by the
> synchronous nature of the VM_Operation, rather than a direct result of
> "blocking for the safepoint" (which the WatcherThread does not
> participate in). I wonder if the WatcherThread should really be using
> the async variant of VM_ForceSafepoint here?
>
>> The interesting part of this deadlock is that I think that it
>> is possible for other periodic tasks to hit it. Anything that
>> causes the WatcherThread to start a safepoint while processing
>> a periodic task should be susceptible to this race. Think about
>> the -XX:+DeoptimizeALot option and how it causes VM_Deopt
>> requests on thread state transitions... Interesting...
>
> I don't think so. You need three threads involved to get the deadlock.

But that isn't the point. As you state this deadlock, at VM shutdown, 
could impact any synchronous safepoint operations executed by the 
WatcherThread.

That aside ...

> In the current case the main thread's locking of the PeriodicTask_lock
> without a safepoint check is what causes the problem - that violates the
> rules surrounding use of "no safepoint checks". The other methods that a
> JavaThread might call that acquire the PeriodicTask_lock do perform the
> safepoint checks, so they wouldn't deadlock. Hence it seems to me that
> only WatcherThread::stop can lead to this problem. And as
> WatcherThread::stop is only called from before_exit, and that can only
> be called once, it seems to me that we could/should actually acquire the
> lock with a safepoint check.

David
-----

> Cheers,
> David
>
>>
>> Testing:
>>      - I found a way to add delays to the right spots in the
>>        VM to make the deadlock reproduce in just about every
>>        run of the test associated with the bug. The new
>>        os::naked_short_sleep() function is your friend. Thanks
>>        to Fred for adding that! See the bug report for the
>>        debugging diffs.
>>      - 72 hours of running the test in the bug report with
>>        delays enabled for product, fastdebug and jvmg bits
>>        in parallel on my Solaris X86 server.
>>      - JPRT test run
>>      - Aurora Adhoc results are in process; we're having issues
>>        with both a broken testbase build and infra problems
>>        with results not being uploaded.
>>


More information about the serviceability-dev mailing list