RFR (XS) fix for a safepoint deadlock (8047720)
Daniel D. Daugherty
daniel.daugherty at oracle.com
Fri Jun 27 20:27:26 UTC 2014
Thanks for the review!
Dan
On 6/27/14 2:19 PM, Coleen Phillimore wrote:
>
> This looks good. Good to track down this deadlock!
> Coleen
>
> On 6/27/14, 12:18 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I have a fix ready for the following bug:
>>
>> 8047720 Xprof hangs on Solaris
>> https://bugs.openjdk.java.net/browse/JDK-8047720
>>
>> Here is the webrev URL:
>>
>> http://cr.openjdk.java.net/~dcubed/8047720-webrev/0-jdk9-hs-rt/
>>
>> This deadlock occurred between the following threads:
>>
>> Main thread - Trying to stop the WatcherThread as part of
>> shutting down the VM; this thread is blocked
>> on the PeriodicTask_lock which keeps it from
>> reaching a safepoint.
>> WatcherThread - Requested a VM_ForceSafepoint to complete
>> a JavaThread::java_suspend() call as part
>> of a FlatProfiler record_thread_ticks()
>> call; this thread owns the PeriodicTask_lock
>> since it is processing a periodic task.
>> VMThread - Trying to start a safepoint; this thread is
>> blocked waiting for the Main thread to reach
>> a safepoint.
>>
>> The PeriodicTask_lock is one of the VM internal locks and is
>> typically managed using Mutex::_no_safepoint_check_flag to
>> avoid deadlocks. Yes, the irony is dripping on the floor... :-)
>>
>> The interesting part of this deadlock is that I think that it
>> is possible for other periodic tasks to hit it. Anything that
>> causes the WatcherThread to start a safepoint while processing
>> a periodic task should be susceptible to this race. Think about
>> the -XX:+DeoptimizeALot option and how it causes VM_Deopt
>> requests on thread state transitions... Interesting...
>>
>> Testing:
>> - I found a way to add delays to the right spots in the
>> VM to make the deadlock reproduce in just about every
>> run of the test associated with the bug. The new
>> os::naked_short_sleep() function is your friend. Thanks
>> to Fred for adding that! See the bug report for the
>> debugging diffs.
>> - 72 hours of running the test in the bug report with
>> delays enabled for product, fastdebug and jvmg bits
>> in parallel on my Solaris X86 server.
>> - JPRT test run
>> - Aurora Adhoc results are in process; we're having issues
>> with both a broken testbase build and infra problems
>> with results not being uploaded.
>>
>
More information about the hotspot-runtime-dev
mailing list