RFR round 0 JDK8u backport of ObjectMonitor-JVM/TI hang fix (8028073)

Daniel D. Daugherty daniel.daugherty at oracle.com
Fri Feb 28 13:53:43 PST 2014


Greetings,

This is a code review request for the JDK7u-hs-dev backport of the
following ObjectMonitor-JVM/TI hang fix:

     8028073 race condition in ObjectMonitor implementation causing 
deadlocks
     https://bugs.openjdk.java.net/browse/JDK-8028073

Here is the JDK7u-hs-dev webrev URL:

     http://cr.openjdk.java.net/~dcubed/8028073-webrev/0-jdk7u-hs-dev/

This is a straight forward backport of the JDK8u-hs-dev version of the
fix modulo lines numbers and context diff anchors. The diff of the two
'hg diff --nodates' files is below.

Same testing as for the JDK9 and JDK8u versions with the same results.

I need one sanity check review from someone.

Dan

$ diff 8028073-diffs-jdk[78]u-hs.txt.00
1c1
< diff -r bdb3798edd11 src/share/vm/prims/jvm.cpp
---
 > diff -r 231051bff068 src/share/vm/prims/jvm.cpp
11c11
< @@ -527,6 +527,12 @@ JVM_ENTRY(void, JVM_MonitorWait(JNIEnv*
---
 > @@ -518,6 +518,12 @@ JVM_ENTRY(void, JVM_MonitorWait(JNIEnv*
22c22
<    ObjectSynchronizer::wait(obj, ms, THREAD);
---
 >    ObjectSynchronizer::wait(obj, ms, CHECK);
24c24
< diff -r bdb3798edd11 src/share/vm/runtime/objectMonitor.cpp
---
 > diff -r 231051bff068 src/share/vm/runtime/objectMonitor.cpp
34c34
< @@ -385,6 +385,12 @@ void ATTR ObjectMonitor::enter(TRAPS) {
---
 > @@ -382,6 +382,12 @@ void ATTR ObjectMonitor::enter(TRAPS) {
47c47
< @@ -442,6 +448,12 @@ void ATTR ObjectMonitor::enter(TRAPS) {
---
 > @@ -439,6 +445,12 @@ void ATTR ObjectMonitor::enter(TRAPS) {
60c60
< @@ -1459,6 +1471,14 @@ void ObjectMonitor::wait(jlong millis, b
---
 > @@ -1456,6 +1468,14 @@ void ObjectMonitor::wait(jlong millis, b
75c75
< @@ -1501,21 +1521,6 @@ void ObjectMonitor::wait(jlong millis, b
---
 > @@ -1498,21 +1518,6 @@ void ObjectMonitor::wait(jlong millis, b
97c97
< @@ -1598,6 +1603,33 @@ void ObjectMonitor::wait(jlong millis, b
---
 > @@ -1595,6 +1600,33 @@ void ObjectMonitor::wait(jlong millis, b


On 2/21/14 8:40 PM, Daniel D. Daugherty wrote:
> Greetings,
>
> This is a code review request for the JDK8u-hs-dev backport of the
> following ObjectMonitor-JVM/TI hang fix:
>
>     8028073 race condition in ObjectMonitor implementation causing 
> deadlocks
>     https://bugs.openjdk.java.net/browse/JDK-8028073
>
> Here is the JDK8u-hs-dev webrev URL:
>
> http://cr.openjdk.java.net/~dcubed/8028073-webrev/0-jdk8u-hs-dev/
>
> This is _almost_ a straight forward backport of the JDK9 fix. The only
> difference to the fix was discussed at the end of the JDK9 review and
> was determined to only be needed in versions of HotSpot without the
> fix for 8028280:
>
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-February/010745.html 
>
>
> 8028280 has not yet been backported to JDK8u-hs-dev.
>
> The easiest way to review the backport is to download the two patch
> files from the webrevs and compare them with something like:
>
>     jfilemerge -r -w 8028073_exp.patch 8028073_exp_for_jdk8u_hs.patch
>
> The same testing has been performed on the JDK8u-hs-dev version as
> with the JDK9-hs-runtime version.
>
> Thanks, in advance, for any comments, questions or suggestions.
>
> Dan
>
>
> On 2/1/14 11:38 AM, Daniel D. Daugherty wrote:
> > Greetings,
> >
> > I have a fix ready for the following bug:
> >
> >     8028073 race condition in ObjectMonitor implementation causing 
> deadlocks
> >     https://bugs.openjdk.java.net/browse/JDK-8028073
> >
> > On the surface, this is a very simple fix that relocates a few lines of
> > code, relocates and rewrites the comments associated with that code and
> > adds several new comments.
> >
> > Of course, in reality, the issue is much more complicated, but I'm
> > hoping to make it easy for anyone not acquainted with this issue to
> > understand what's going on.
> >
> > Here are the JDK9 webrev URLs:
> >
> > OpenJDK:
> > http://cr.openjdk.java.net/~dcubed/8028073-webrev/0-jdk9-hs-runtime/
> >
> > Oracle internal:
> > 
> http://javaweb.us.oracle.com/~ddaugher/8028073-webrev/0-jdk9-hs-runtime/
> >
> > The simple summary:
> >
> > - since Java Monitors and JVM/TI RawMonitors share a ParkEvent,
> >   it is possible for a JVM/TI monitor event handler to accidentally
> >   consume a ParkEvent.unpark() call meant for Java Monitor layer
> > - the original code fix was made on 2005.07.04 using this bug ID:
> >   https://bugs.openjdk.java.net/browse/JDK-5030359
> > - it's the right fix, but it's in the wrong place
> > - the fix needs to be after the JVMTI_EVENT_MONITOR_WAITED
> >   event handler is called because it is that event handler
> >   that can cause the hang
> >
> >
> > Testing
> > -------
> >
> > - a new StessMonitorWait test has been created that reliably
> >   reproduces the hang in JDK[6789]; see the bug's gory details
> >   for the specific versions where the hang has been reproduced
> >   - the test reliably reproduces the hang in 5 seconds on my
> >     T7600 running Solaris 10u11 X86; 1 minute runs reproduce
> >     the hang reliably on other machines
> >   - 12 hour stress run of the new test on Linux-X64, MacOS X-X64,
> >     Solaris-SPARCV9, Solaris-X64, and Win7-X86 with the JPRT
> >     bits did not reproduce the hang
> > - JPRT test job
> > - VM/SQE Adhoc test job on Server VM, fastdebug bits on Linux-X86,
> >   Linux-X64, MacOS X-X64, Solaris-SPARCV9, Solaris-X64, Windows-X86,
> >   and Windows-X64:
> >   - vm.quick
> >   - Kitchensink (bigapps)
> >   - Weblogic+medrec (bigapps)
> >   - runThese (bigapps)
> >
> >
> > The Gory Details Start Here
> > ---------------------------
> >
> > This is the old location of block of code that's being moved:
> >
> > src/share/vm/runtime/objectMonitor.cpp:
> >
> > 1440 void ObjectMonitor::wait(jlong millis, bool interruptible, 
> TRAPS) {
> > <snip>
> > 1499    exit (true, Self) ;                    // exit the monitor
> > <snip>
> > 1513    if (node._notified != 0 && _succ == Self) {
> > 1514       node._event->unpark();
> > 1515    }
> >
> >
> > This is the new location of block of code that's being moved:
> >
> > src/share/vm/runtime/objectMonitor.cpp:
> >
> > 1452 void ObjectMonitor::wait(jlong millis, bool interruptible, 
> TRAPS) {
> > <snip>
> > 1601      if (JvmtiExport::should_post_monitor_waited()) {
> > 1602        JvmtiExport::post_monitor_waited(jt, this, ret == 
> OS_TIMEOUT);
> > <snip>
> > 1604        if (node._notified != 0 && _succ == Self) {
> > <snip>
> > 1620          node._event->unpark();
> > 1621        }
> >
> >
> > The Risks
> > ---------
> >
> > - The code now executes only when the JVMTI_EVENT_MONITOR_WAITED event
> >   is enabled:
> >   - previously it was always executed
> >   - while the old code was not effective for the hang that is being
> >     fixed with this bug, it is possible that the old code prevented
> >     a different bug in the successor protocol from manifesting
> >   - thorough analysis of the successor protocol did not reveal a
> >     case where the old code was needed in the old location
> > - Thorough analysis indicates that the other JVM/TI monitor events
> >   do not need a fix like the one for JVMTI_EVENT_MONITOR_WAITED:
> >   - the successor protocol is complicated and the analysis could
> >     be wrong when certain options are used
> >   - comments were added to each location where a JVM/TI monitor
> >     event handler is called documenting why a fix like this one
> >     is not needed there
> >   - if the analysis is wrong, the new comments show where a new
> >     code change would be needed
> >
> >
> > The Scenario
> > ------------
> >
> > I've created a scenario that reproduces this hang:
> >
> > T1 - enters monitor and calls monitor.wait()
> > T2 - enters the monitor, calls monitor.notify() and exits the monitor
> > T3 - enters and exits the monitor
> > T4 - enters the monitor, delays for 5 seconds, exits the monitor
> >
> > A JVM/TI agent that enables JVMTI_EVENT_MONITOR_WAITED and has a
> > handler that: enters a raw monitor, waits for 1ms, exits a raw monitor.
> >
> > Here are the six events necessary to make this hang happen:
> >
> > // KEY-EVENT-1a: After being unparked(), T1 has cleared the _succ 
> field, but
> > // KEY-EVENT-1b: T3 is exiting the monitor and makes T1 the 
> successor again.
> >
> > // KEY-EVENT-2a: The unpark() done by T3 when it made T1 the successor
> > // KEY-EVENT-2b: is consumed by the JVM/TI event handler.
> >
> > // KEY-EVENT-3a: T3 made T1 the successor
> > // KEY-EVENT-3b: but before T1 could reenter the monitor T4 grabbed it.
> >
> > // KEY-EVENT-4a: T1's TrySpin() call sees T4 as NotRunnable so
> > // KEY-EVENT-4b: T1 bails from TrySpin without touching _succ.
> >
> > // KEY-EVENT-5a: T4 sees that T1 is still the successor so
> > // KEY-EVENT-5b: T4 takes the quick exit path (no ExitEpilog)
> >
> > // KEY-EVENT-6a: T1 is about to park and it is the successor, but
> > // KEY-EVENT-6b: T3's unpark has been eaten by the JVM/TI event handler
> > // KEY-EVENT-6c: and T4 took the quick exit path. T1 is about to be 
> stuck.
> >
> >
> > This bug is intertwined with:
> >
> > - The ObjectMonitor successor protocol
> > - the sharing of a ParkEvent between Java Monitors and JVM/TI 
> RawMonitors
> >
> > There is a very long successor.notes attachment to JDK-8028073 that
> > attempts to describe the ObjectMonitor successor protocol. It's good
> > for putting pretty much anyone to sleep.
> >
> > Since this hang reproduces back to JDK6, this bug is taking the easily
> > backported solution of moving the original fix to the right location.
> > The following new bug has been filed for possible future work in this
> > area by the Serviceability Team:
> >
> >     8033399 add a separate ParkEvent for JVM/TI RawMonitor use
> >     https://bugs.openjdk.java.net/browse/JDK-8033399
> >
> >
> > The Symptoms
> > ------------
> >
> > With intermittent hangs like this, it is useful to know what to look
> > for in order to determine if you are running into this issue:
> >
> > - if you aren't using a debugger or a profiler or some other
> >   JVM/TI agent, then this hang is not the same as yours
> > - if your JVM/TI agent isn't using a JVMTI_EVENT_MONITOR_WAITED
> >   event handler, then this hang is not the same as yours
> > - if your JVMTI_EVENT_MONITOR_WAITED event handler is not using
> >   JVM/TI RawMonitors, then this hang is not the same as yours
> > - if your JVMTI_EVENT_MONITOR_WAITED event handler is calling
> >   back into Java code, then you might just be insane and this
> >   hang might be similar to yours. However, using a Java callback
> >   in an event handler is an even bigger problem/risk so fix that
> >   first.
> > - if you one or more threads blocked like this and making no
> >   progress, then this hang might be the same as yours:
> >
> > "T1" #22 prio=5 os_prio=64 tid=0x00000000009ca800 nid=0x2f waiting 
> for monitor e
> > ntry [0xfffffd7fc0231000]
> >    java.lang.Thread.State: BLOCKED (on object monitor)
> >    JavaThread state: _thread_blocked
> > Thread: 0x00000000009ca800  [0x2f] State: _at_safepoint 
> _has_called_back 0 _at_p
> > oll_safepoint 0
> >    JavaThread state: _thread_blocked
> >         at java.lang.Object.wait(Native Method)
> >         - waiting on <0xfffffd7e6a2b6ff0> (a java.lang.String)
> >         at java.lang.Object.wait(Object.java:502)
> >         at SMW_WorkerThread.run(StressMonitorWait.java:103)
> >         - locked <0xfffffd7e6a2b6ff0> (a java.lang.String)
> >
> > "T2" #23 prio=5 os_prio=64 tid=0x00000000009cc000 nid=0x30 waiting 
> for monitor e
> > ntry [0xfffffd7fc0130000]
> >    java.lang.Thread.State: BLOCKED (on object monitor)
> >    JavaThread state: _thread_blocked
> > Thread: 0x00000000009cc000  [0x30] State: _at_safepoint 
> _has_called_back 0 _at_p
> > oll_safepoint 0
> >    JavaThread state: _thread_blocked
> >         at SMW_WorkerThread.run(StressMonitorWait.java:120)
> >         - waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String)
> >
> > "T3" #24 prio=5 os_prio=64 tid=0x00000000009ce000 nid=0x31 waiting 
> for monitor e
> > ntry [0xfffffd7fc002f000]
> >    java.lang.Thread.State: BLOCKED (on object monitor)
> >    JavaThread state: _thread_blocked
> > Thread: 0x00000000009ce000  [0x31] State: _at_safepoint 
> _has_called_back 0 _at_p
> > oll_safepoint 0
> >    JavaThread state: _thread_blocked
> >         at SMW_WorkerThread.run(StressMonitorWait.java:139)
> >         - waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String)
> >
> > Key symptoms in thread T1:
> >
> > - had the object locked:
> >
> >   locked <0xfffffd7e6a2b6ff0> (a java.lang.String)
> >
> > - did an Object.wait():
> >
> >   waiting on <0xfffffd7e6a2b6ff0> (a java.lang.String)
> >
> > - is blocked on reentry:
> >
> >   waiting for monitor entry [0xfffffd7fc0231000]
> >
> > Key symptoms in thread T2:
> >
> > - is blocked waiting to lock the object:
> >
> >   waiting for monitor entry [0xfffffd7fc0130000]
> >   waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String)
> >
> > Key symptoms in thread T3:
> >
> > - is blocked waiting to lock the object:
> >
> >   waiting for monitor entry [0xfffffd7fc002f000]
> >   waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String)
>
>
>



More information about the hotspot-runtime-dev mailing list