RFR (XS) fix for a safepoint deadlock (8047720)
Coleen Phillimore
coleen.phillimore at oracle.com
Fri Jun 27 20:19:47 UTC 2014
This looks good. Good to track down this deadlock!
Coleen
On 6/27/14, 12:18 PM, Daniel D. Daugherty wrote:
> Greetings,
>
> I have a fix ready for the following bug:
>
> 8047720 Xprof hangs on Solaris
> https://bugs.openjdk.java.net/browse/JDK-8047720
>
> Here is the webrev URL:
>
> http://cr.openjdk.java.net/~dcubed/8047720-webrev/0-jdk9-hs-rt/
>
> This deadlock occurred between the following threads:
>
> Main thread - Trying to stop the WatcherThread as part of
> shutting down the VM; this thread is blocked
> on the PeriodicTask_lock which keeps it from
> reaching a safepoint.
> WatcherThread - Requested a VM_ForceSafepoint to complete
> a JavaThread::java_suspend() call as part
> of a FlatProfiler record_thread_ticks()
> call; this thread owns the PeriodicTask_lock
> since it is processing a periodic task.
> VMThread - Trying to start a safepoint; this thread is
> blocked waiting for the Main thread to reach
> a safepoint.
>
> The PeriodicTask_lock is one of the VM internal locks and is
> typically managed using Mutex::_no_safepoint_check_flag to
> avoid deadlocks. Yes, the irony is dripping on the floor... :-)
>
> The interesting part of this deadlock is that I think that it
> is possible for other periodic tasks to hit it. Anything that
> causes the WatcherThread to start a safepoint while processing
> a periodic task should be susceptible to this race. Think about
> the -XX:+DeoptimizeALot option and how it causes VM_Deopt
> requests on thread state transitions... Interesting...
>
> Testing:
> - I found a way to add delays to the right spots in the
> VM to make the deadlock reproduce in just about every
> run of the test associated with the bug. The new
> os::naked_short_sleep() function is your friend. Thanks
> to Fred for adding that! See the bug report for the
> debugging diffs.
> - 72 hours of running the test in the bug report with
> delays enabled for product, fastdebug and jvmg bits
> in parallel on my Solaris X86 server.
> - JPRT test run
> - Aurora Adhoc results are in process; we're having issues
> with both a broken testbase build and infra problems
> with results not being uploaded.
>
More information about the hotspot-runtime-dev
mailing list