RFR (XS) fix for a safepoint deadlock (8047720)

Daniel D. Daugherty daniel.daugherty at oracle.com
Fri Jun 27 17:46:00 UTC 2014


Markus,

Thanks for the fast review!

Dan


On 6/27/14 11:46 AM, Markus Grönlund wrote:
> Hi Dan,
>
> This looks good, thanks for chasing this down!
>
> Cheers
> Markus
>
>
>
> -----Original Message-----
> From: Daniel D. Daugherty
> Sent: den 27 juni 2014 18:18
> To: hotspot-runtime-dev at openjdk.java.net; serviceability-dev at openjdk.java.net
> Subject: RFR (XS) fix for a safepoint deadlock (8047720)
>
> Greetings,
>
> I have a fix ready for the following bug:
>
>       8047720 Xprof hangs on Solaris
>       https://bugs.openjdk.java.net/browse/JDK-8047720
>
> Here is the webrev URL:
>
> http://cr.openjdk.java.net/~dcubed/8047720-webrev/0-jdk9-hs-rt/
>
> This deadlock occurred between the following threads:
>
>       Main thread   - Trying to stop the WatcherThread as part of
>                       shutting down the VM; this thread is blocked
>                       on the PeriodicTask_lock which keeps it from
>                       reaching a safepoint.
>       WatcherThread - Requested a VM_ForceSafepoint to complete
>                       a JavaThread::java_suspend() call as part
>                       of a FlatProfiler record_thread_ticks()
>                       call; this thread owns the PeriodicTask_lock
>                       since it is processing a periodic task.
>       VMThread      - Trying to start a safepoint; this thread is
>                       blocked waiting for the Main thread to reach
>                       a safepoint.
>
> The PeriodicTask_lock is one of the VM internal locks and is typically managed using Mutex::_no_safepoint_check_flag to avoid deadlocks. Yes, the irony is dripping on the floor... :-)
>
> The interesting part of this deadlock is that I think that it is possible for other periodic tasks to hit it. Anything that causes the WatcherThread to start a safepoint while processing a periodic task should be susceptible to this race. Think about the -XX:+DeoptimizeALot option and how it causes VM_Deopt requests on thread state transitions... Interesting...
>
> Testing:
>       - I found a way to add delays to the right spots in the
>         VM to make the deadlock reproduce in just about every
>         run of the test associated with the bug. The new
>         os::naked_short_sleep() function is your friend. Thanks
>         to Fred for adding that! See the bug report for the
>         debugging diffs.
>       - 72 hours of running the test in the bug report with
>         delays enabled for product, fastdebug and jvmg bits
>         in parallel on my Solaris X86 server.
>       - JPRT test run
>       - Aurora Adhoc results are in process; we're having issues
>         with both a broken testbase build and infra problems
>         with results not being uploaded.
>



More information about the serviceability-dev mailing list