code review round 0 for ObjectMonitor-JVM/TI hang fix (8028073)

Mon Feb 10 12:31:20 PST 2014

On 2/10/14 1:20 PM, Karen Kinnear wrote:
> Dan,
>
> Thank you so much. My bad - I was looking at a jdk8 repo, not a jdk9 one.

No problem... I had the advantage of wanting Mr Simms changes so
that I could (more easily) develop the debug code flow hooks that
I'm planning to add to the "debug tips and tricks" wiki...

> So I agree that the JDK9 fix as is works. Code change reviewed.

Thanks for confirmation!

> For JDK8:
> I don't believe we were planning to backport this to 8 given risks of changes in this area.

Ummm.... Not JDK8-GA, but definitely a JDK8-Update... As usual,
I plan to do the backport engineering and I'll let someone else
worry about the politics... :-)

> I did reach the same conclusion you did, that the WaitSetLock acquirers who already own
> the lock don't have this issue, but those that don't already own the lock do have
> the problem, and the timed wait could trigger this.
> And that a JDK8 fix would take the change out of the jvmti conditional, or need the 8028280
> fix, which I also believe we do not plan to backport.

Yeah, I'll chat with Mr Simms about backporting 8028280... That
os::naked_short_sleep() function is so very useful...

> thank you for the detailed walk-through,

No problem. Thank you for slogging through the details here.

Dan

> Karen
>
> On Feb 10, 2014, at 1:55 PM, Daniel D. Daugherty wrote:
>
>> On 2/9/14 8:37 PM, David Holmes wrote:
>>> trimming content ...
>>>
>>> On 8/02/2014 9:45 AM, Daniel D. Daugherty wrote:
>>>> On 2/7/14 2:56 PM, Karen Kinnear wrote:
>>>>> 3. Did I read the code correctly that the Thread::SpinAcquire can make
>>>>> a timed park
>>>>> call on the same thread's _ParkEvent? And that this is used to get on
>>>>> and off the wait queue,
>>>>> i.e. to acquire the WaitSetLock?
>>>>>     Is there the same risk that a notify might be eaten here also?
>>>> As far as I can see, Thread::SpinAcquire() does not use a ParkEvent
>>> It sure does:
>>>
>>> void Thread::SpinAcquire (volatile int * adr, const char * LockName) {
>>>   if (Atomic::cmpxchg (1, adr, 0) == 0) {
>>>      return ;   // normal fast-path return
>>>   }
>>>
>>>   // Slow-path : We've encountered contention -- Spin/Yield/Block strategy.
>>>   TEVENT (SpinAcquire - ctx) ;
>>>   int ctr = 0 ;
>>>   int Yields = 0 ;
>>>   for (;;) {
>>>      while (*adr != 0) {
>>>         ++ctr ;
>>>         if ((ctr & 0xFFF) == 0 || !os::is_MP()) {
>>>            if (Yields > 5) {
>>>              // Consider using a simple NakedSleep() instead.
>>>              // Then SpinAcquire could be called by non-JVM threads
>>>              Thread::current()->_ParkEvent->park(1) ;
>> Ummmm... that's not the code I'm seeing...
>>
>> src/share/vm/runtime/thread.cpp:
>>
>>   4417  void Thread::SpinAcquire (volatile int * adr, const char * LockName) {
>>   4418    if (Atomic::cmpxchg (1, adr, 0) == 0) {
>>   4419       return ;   // normal fast-path return
>>   4420    }
>>   4421
>>   4422    // Slow-path : We've encountered contention -- Spin/Yield/Block strategy.
>>   4423    TEVENT (SpinAcquire - ctx) ;
>>   4424    int ctr = 0 ;
>>   4425    int Yields = 0 ;
>>   4426    for (;;) {
>>   4427       while (*adr != 0) {
>>   4428          ++ctr ;
>>   4429          if ((ctr & 0xFFF) == 0 || !os::is_MP()) {
>>   4430             if (Yields > 5) {
>>   4431               os::naked_short_sleep(1);
>>   4432             } else {
>>   4433               os::NakedYield() ;
>>   4434               ++Yields ;
>>   4435             }
>>   4436          } else {
>>   4437             SpinPause() ;
>>   4438          }
>>   4439       }
>>   4440       if (Atomic::cmpxchg (1, adr, 0) == 0) return ;
>>   4441    }
>>   4442  }
>>
>> Mr Simms recently changed the above code via:
>>
>> changeset:   5832:5944dba4badc
>> user:        dsimms
>> date:        Fri Jan 24 09:28:47 2014 +0100
>> summary:     8028280: ParkEvent leak when running modified runThese which only loads classes
>>
>> os::naked_short_sleep() is new:
>>
>> - BSD/MacOS X, Linux - uses nanosleep()
>> - Solaris - uses usleep()
>> - Windows - uses Sleep()
>>
>> The fix for 8028280 was pushed to JDK9/hs-rt on 2014.01.24 and to JDK9/hs
>> on 2014.01.29. I don't see any signs that Mr Simm's fix will be backported
>> to JDK8u/HSX-25u (yet) so this part of the review thread might impact the
>> backport of my fix to earlier releases.
>>
>>
>>> So considering Karen's question ... I can't tell for certain. :(
>>>
>>> I do not think the SpinAcquire on grabbing the wait-set lock to add to the wait-set can be an issue because we will only park in response to the actual wait, and hence only get unparked due to a notify/notifyAll, but at this point we still own the monitor so no notify/notifyAll is possible.
>>>
>>> However, for the removal from the wait-set a more complex analysis is needed. To do the SpinAcquire we must still be flagged as TS_WAIT - which means we have not been notified, but must be returning due to a timeout (or spurious wakeup?). In such circumstances could we be _succ? I don't think so but I'll leave it to Dan to confirm that part :)
>> So for HSX-25 and probably older...
>>
>> There are four Thread::SpinAcquire() calls in the objectMonitor code:
>>
>>     Thread::SpinAcquire (&_WaitSetLock, "WaitSet - add") ;
>>     Thread::SpinAcquire (&_WaitSetLock, "WaitSet - unlink") ;
>>     Thread::SpinAcquire (&_WaitSetLock, "WaitSet - notify") ;
>>     Thread::SpinAcquire (&_WaitSetLock, "WaitSet - notifyall") ;
>>
>> We can easily rule out the "notify" and "notifyAll" uses since the
>> current thread owns the Java-level monitor and there are no events
>> to post in this part of the notify() or notifyAll() protocols.
>>
>> For the "WaitSet - add" use, the current thread owns the Java-level
>> monitor and the thread has not been added as a waiter yet so another
>> thread cannot do the notify-exit-make-successor part of the protocol
>> yet.
>>
>> For the "WaitSet - unlink" use:
>>
>> src/share/vm/runtime/objectMonitor.cpp:
>>
>>   1569       if (node.TState == ObjectWaiter::TS_WAIT) {
>>   1570           Thread::SpinAcquire (&_WaitSetLock, "WaitSet - unlink") ;
>>   1571           if (node.TState == ObjectWaiter::TS_WAIT) {
>>   1572              DequeueSpecificWaiter (&node) ;       // unlink from WaitSet
>>   1573              assert(node._notified == 0, "invariant");
>>   1574              node.TState = ObjectWaiter::TS_RUN ;
>>   1575           }
>>   1576           Thread::SpinRelease (&_WaitSetLock) ;
>>   1577       }
>>
>> It is the call on line 1570 above that gets us into this code:
>>
>> src/share/vm/runtime/thread.cpp:
>>
>>   4435  void Thread::SpinAcquire (volatile int * adr, const char * LockName) {
>>   4436    if (Atomic::cmpxchg (1, adr, 0) == 0) {
>>   4437       return ;   // normal fast-path return
>>   4438    }
>>   4439
>>   4440    // Slow-path : We've encountered contention -- Spin/Yield/Block strategy.
>>   4441    TEVENT (SpinAcquire - ctx) ;
>>   4442    int ctr = 0 ;
>>   4443    int Yields = 0 ;
>>   4444    for (;;) {
>>   4445       while (*adr != 0) {
>>   4446          ++ctr ;
>>   4447          if ((ctr & 0xFFF) == 0 || !os::is_MP()) {
>>   4448             if (Yields > 5) {
>>   4449               // Consider using a simple NakedSleep() instead.
>>   4450               // Then SpinAcquire could be called by non-JVM threads
>>   4451               Thread::current()->_ParkEvent->park(1) ;
>>   4452             } else {
>>   4453               os::NakedYield() ;
>>   4454               ++Yields ;
>>   4455             }
>>   4456          } else {
>>   4457             SpinPause() ;
>>   4458          }
>>   4459       }
>>   4460       if (Atomic::cmpxchg (1, adr, 0) == 0) return ;
>>   4461    }
>>   4462  }
>>
>> And the above code can consume the unpark() on line 4451.
>>
>> So how the heck do we get to line 1570???
>>
>> Well, the target thread would have to be both notified and unparked
>> to be executing this code path. When the notify() code runs, the
>> target of the notify() is changed from ObjectWaiter::TS_WAIT to
>> ObjectWaiter::TS_ENTER unless Knob_MoveNotifyee == 4. The default
>> for Knob_MoveNotifyee == 2 so we're in non default mode here...
>>
>> Here are the Knob_MoveNotifyee policy values:
>>
>>    1717      if (Policy == 0) {       // prepend to EntryList
>>    1728      if (Policy == 1) {      // append to EntryList
>>    1744      if (Policy == 2) {      // prepend to cxq
>>    1760      if (Policy == 3) {      // append to cxq
>>
>> For Knob_MoveNotifyee == 4 (or higher), we use the old mechanism
>> where we just unpark the target thread and let it run. Part of
>> that code changes from ObjectWaiter::TS_WAIT to ObjectWaiter::TS_RUN.
>>
>> The code works the same for notifyAll() for the thread picked
>> to be notified. For the Knob_MoveNotifyee == 4 (or higher) case,
>> we just unpark all the waiters and we a free-for-all.
>>
>> So it looks like the code block from lines 1569-1577 is never
>> used... or is it? Well... you have to remember two things:
>>
>> 1) spurious unpark()
>> 2) timed wait()
>>
>> The caller might have called wait(0), but that doesn't mean that
>> the underlying park() mechanism won't have a spurious unpark().
>> Or better, the caller might have called wait(1) and be running
>> again after a millisecond.
>>
>> So in the HSX25 and older system (i.e., without Mr Simms fix for
>> 8028280), it is possible for this call:
>>
>>   1570           Thread::SpinAcquire (&_WaitSetLock, "WaitSet - unlink") ;
>>
>> to consume the unpark(). The gauntlet that has to be traversed
>> to get to this call:
>>
>>   4451               Thread::current()->_ParkEvent->park(1) ;
>>
>> is impressive:
>>
>> - fast-path acquisition of the _WaitSetLock has to fail:
>>
>>   4436    if (Atomic::cmpxchg (1, adr, 0) == 0) {
>>   4437       return ;   // normal fast-path return
>>   4438    }
>>
>> - if the machine is a uniprocessor, then 6 os::NakedYield()
>>   call-loop-recheck attempts have to fail:
>>
>>   4447          if ((ctr & 0xFFF) == 0 || !os::is_MP()) {
>>   4448             if (Yields > 5) {
>>   4449               // Consider using a simple NakedSleep() instead.
>>   4450               // Then SpinAcquire could be called by non-JVM threads
>>   4451               Thread::current()->_ParkEvent->park(1) ;
>>   4452             } else {
>>   4453               os::NakedYield() ;
>>   4454               ++Yields ;
>>   4455             }
>>
>> - if the machine is a multi-processor, then 6 rounds of { 4095 SpinPause()
>>   attempts, 1 os::NakedYield() attempt}  have to fail:
>>
>>   4446          ++ctr ;
>>   4447          if ((ctr & 0xFFF) == 0 || !os::is_MP()) {
>>   4448             if (Yields > 5) {
>>   4449               // Consider using a simple NakedSleep() instead.
>>   4450               // Then SpinAcquire could be called by non-JVM threads
>>   4451               Thread::current()->_ParkEvent->park(1) ;
>>   4452             } else {
>>   4453               os::NakedYield() ;
>>   4454               ++Yields ;
>>   4455             }
>>   4456          } else {
>>   4457             SpinPause() ;
>>   4458          }
>>
>> But it is possible. It is one of those once-in-a-blue moon type
>> windows where everything has to line up just so.
>>
>> So how do we address this issue in HSX-25 and possibly older?
>>
>> If Mr Simms fix for 8028280 is also backported, then there is no
>> issue. If it is not backported, then applying the fix for this
>> bug like so:
>>
>> src/share/vm/runtime/objectMonitor.cpp:
>>
>>   1596       if (JvmtiExport::should_post_monitor_waited()) {
>>   1597         JvmtiExport::post_monitor_waited(jt, this, ret == OS_TIMEOUT);
>>   1598       }
>>
>>   1604       if (node._notified != 0 && _succ == Self) {
>>   1605         // In this part of the monitor wait-notify-reenter protocol it
>>   1606         // is possible (and normal) for another thread to do a fastpath
>>   1607         // monitor enter-exit while this thread is still trying to get
>>   1608         // to the reenter portion of the protocol.
>>   1609         //
>>   1610         // The ObjectMonitor was notified and the current thread is
>>   1611         // the successor which also means that an unpark() has already
>>   1612         // been done. The JVMTI_EVENT_MONITOR_WAITED event handler can
>>   1613         // consume the unpark() that was done when the successor was
>>   1614         // set because the same ParkEvent is shared between Java
>>   1615         // monitors and JVM/TI RawMonitors (for now).
>>   1616         //
>>   1617         // We redo the unpark() to ensure forward progress, i.e., we
>>   1618         // don't want all pending threads hanging (parked) with none
>>   1619         // entering the unlocked monitor.
>>   1620         node._event->unpark();
>>   1621       }
>>
>> Of course the line numbers for the "fix" would be different and the comment
>> would need to be updated to reflect that the:
>>
>>   1570           Thread::SpinAcquire (&_WaitSetLock, "WaitSet - unlink") ;
>>
>> call above could also consume an unpark(), but it should work.
>>
>> If you've read this far, then I'm impressed. If you've read this far
>> and only fallen asleep a couple of times, then I'm still impressed.
>>
>> Summary: I don't think we have an issue in JDK9, but we'll have to do
>>          the fix in JDK8/HSX25 and older a little differently.
>>
>> Dan
>>
>>
>>> David
>>> -----
>>>
>>>> at all. However, Thread::muxAcquire() does use a ParkEvent, but it
>>>> is a different ParkEvent. From src/share/vm/runtime/thread.hpp:
>>>>
>>>>    ParkEvent * _ParkEvent ;               // for synchronized()
>>>>    ParkEvent * _SleepEvent ;              // for Thread.sleep
>>>>    ParkEvent * _MutexEvent ;              // for native internal
>>>> Mutex/Monitor
>>>>    ParkEvent * _MuxEvent ;                // for low-level
>>>> muxAcquire-muxRelease
>>>>
>>>> So ObjectMonitor uses the _ParkEvent field and Thread::muxAcquire()
>>>> uses the _MuxEvent. There are some comments in thread.cpp about
>>>> how _MuxEvent could be eliminated and _ParkEvent shared, but I don't
>>>> think we ever want to go there.
>>>>
>>>> I also filed this RFE:
>>>>
>>>>      8033399 add a separate ParkEvent for JVM/TI RawMonitor use
>>>> https://bugs.openjdk.java.net/browse/JDK-8033399
>>>>
>>>> just in case the Serviceability team wants to migrate JVM/TI RawMonitors
>>>> to a separate ParkEvent.
>>>>
>>>> Please let me know if you concur that I've resolved issue #3.
>>>>
>>>>
>>>>> If so, I wonder if we want this added unpark to not just be called if
>>>>> JVMTI_EVENT_MONITOR_WAITED
>>>>> is enabled?
>>>> I don't think we need it, but I've noted its removal as a risk.
>>>>
>>>> Again, thanks for the review!
>>>>
>>>> Dan
>>>>
>>>>
>>>>> thanks,
>>>>> Karen
>>>>>
>>>>> On Feb 1, 2014, at 1:38 PM, Daniel D. Daugherty wrote:
>>>>>
>>>>>> Greetings,
>>>>>>
>>>>>> I have a fix ready for the following bug:
>>>>>>
>>>>>>     8028073 race condition in ObjectMonitor implementation causing
>>>>>> deadlocks
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8028073
>>>>>>
>>>>>> On the surface, this is a very simple fix that relocates a few lines of
>>>>>> code, relocates and rewrites the comments associated with that code and
>>>>>> adds several new comments.
>>>>>>
>>>>>> Of course, in reality, the issue is much more complicated, but I'm
>>>>>> hoping to make it easy for anyone not acquainted with this issue to
>>>>>> understand what's going on.
>>>>>>
>>>>>> Here are the JDK9 webrev URLs:
>>>>>>
>>>>>> OpenJDK:
>>>>>> http://cr.openjdk.java.net/~dcubed/8028073-webrev/0-jdk9-hs-runtime/
>>>>>>
>>>>>> Oracle internal:
>>>>>> http://javaweb.us.oracle.com/~ddaugher/8028073-webrev/0-jdk9-hs-runtime/
>>>>>>
>>>>>> The simple summary:
>>>>>>
>>>>>> - since Java Monitors and JVM/TI RawMonitors share a ParkEvent,
>>>>>>   it is possible for a JVM/TI monitor event handler to accidentally
>>>>>>   consume a ParkEvent.unpark() call meant for Java Monitor layer
>>>>>> - the original code fix was made on 2005.07.04 using this bug ID:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-5030359
>>>>>> - it's the right fix, but it's in the wrong place
>>>>>> - the fix needs to be after the JVMTI_EVENT_MONITOR_WAITED
>>>>>>   event handler is called because it is that event handler
>>>>>>   that can cause the hang
>>>>>>
>>>>>>
>>>>>> Testing
>>>>>> -------
>>>>>>
>>>>>> - a new StessMonitorWait test has been created that reliably
>>>>>>   reproduces the hang in JDK[6789]; see the bug's gory details
>>>>>>   for the specific versions where the hang has been reproduced
>>>>>>   - the test reliably reproduces the hang in 5 seconds on my
>>>>>>     T7600 running Solaris 10u11 X86; 1 minute runs reproduce
>>>>>>     the hang reliably on other machines
>>>>>>   - 12 hour stress run of the new test on Linux-X64, MacOS X-X64,
>>>>>>     Solaris-SPARCV9, Solaris-X64, and Win7-X86 with the JPRT
>>>>>>     bits did not reproduce the hang
>>>>>> - JPRT test job
>>>>>> - VM/SQE Adhoc test job on Server VM, fastdebug bits on Linux-X86,
>>>>>>   Linux-X64, MacOS X-X64, Solaris-SPARCV9, Solaris-X64, Windows-X86,
>>>>>>   and Windows-X64:
>>>>>>   - vm.quick
>>>>>>   - Kitchensink (bigapps)
>>>>>>   - Weblogic+medrec (bigapps)
>>>>>>   - runThese (bigapps)
>>>>>>
>>>>>>
>>>>>> The Gory Details Start Here
>>>>>> ---------------------------
>>>>>>
>>>>>> This is the old location of block of code that's being moved:
>>>>>>
>>>>>> src/share/vm/runtime/objectMonitor.cpp:
>>>>>>
>>>>>> 1440 void ObjectMonitor::wait(jlong millis, bool interruptible, TRAPS) {
>>>>>> <snip>
>>>>>> 1499    exit (true, Self) ;                    // exit the monitor
>>>>>> <snip>
>>>>>> 1513    if (node._notified != 0 && _succ == Self) {
>>>>>> 1514       node._event->unpark();
>>>>>> 1515    }
>>>>>>
>>>>>>
>>>>>> This is the new location of block of code that's being moved:
>>>>>>
>>>>>> src/share/vm/runtime/objectMonitor.cpp:
>>>>>>
>>>>>> 1452 void ObjectMonitor::wait(jlong millis, bool interruptible, TRAPS) {
>>>>>> <snip>
>>>>>> 1601      if (JvmtiExport::should_post_monitor_waited()) {
>>>>>> 1602        JvmtiExport::post_monitor_waited(jt, this, ret ==
>>>>>> OS_TIMEOUT);
>>>>>> <snip>
>>>>>> 1604        if (node._notified != 0 && _succ == Self) {
>>>>>> <snip>
>>>>>> 1620          node._event->unpark();
>>>>>> 1621        }
>>>>>>
>>>>>>
>>>>>> The Risks
>>>>>> ---------
>>>>>>
>>>>>> - The code now executes only when the JVMTI_EVENT_MONITOR_WAITED event
>>>>>>   is enabled:
>>>>>>   - previously it was always executed
>>>>>>   - while the old code was not effective for the hang that is being
>>>>>>     fixed with this bug, it is possible that the old code prevented
>>>>>>     a different bug in the successor protocol from manifesting
>>>>>>   - thorough analysis of the successor protocol did not reveal a
>>>>>>     case where the old code was needed in the old location
>>>>>> - Thorough analysis indicates that the other JVM/TI monitor events
>>>>>>   do not need a fix like the one for JVMTI_EVENT_MONITOR_WAITED:
>>>>>>   - the successor protocol is complicated and the analysis could
>>>>>>     be wrong when certain options are used
>>>>>>   - comments were added to each location where a JVM/TI monitor
>>>>>>     event handler is called documenting why a fix like this one
>>>>>>     is not needed there
>>>>>>   - if the analysis is wrong, the new comments show where a new
>>>>>>     code change would be needed
>>>>>>
>>>>>>
>>>>>> The Scenario
>>>>>> ------------
>>>>>>
>>>>>> I've created a scenario that reproduces this hang:
>>>>>>
>>>>>> T1 - enters monitor and calls monitor.wait()
>>>>>> T2 - enters the monitor, calls monitor.notify() and exits the monitor
>>>>>> T3 - enters and exits the monitor
>>>>>> T4 - enters the monitor, delays for 5 seconds, exits the monitor
>>>>>>
>>>>>> A JVM/TI agent that enables JVMTI_EVENT_MONITOR_WAITED and has a
>>>>>> handler that: enters a raw monitor, waits for 1ms, exits a raw monitor.
>>>>>>
>>>>>> Here are the six events necessary to make this hang happen:
>>>>>>
>>>>>> // KEY-EVENT-1a: After being unparked(), T1 has cleared the _succ
>>>>>> field, but
>>>>>> // KEY-EVENT-1b: T3 is exiting the monitor and makes T1 the successor
>>>>>> again.
>>>>>>
>>>>>> // KEY-EVENT-2a: The unpark() done by T3 when it made T1 the successor
>>>>>> // KEY-EVENT-2b: is consumed by the JVM/TI event handler.
>>>>>>
>>>>>> // KEY-EVENT-3a: T3 made T1 the successor
>>>>>> // KEY-EVENT-3b: but before T1 could reenter the monitor T4 grabbed it.
>>>>>>
>>>>>> // KEY-EVENT-4a: T1's TrySpin() call sees T4 as NotRunnable so
>>>>>> // KEY-EVENT-4b: T1 bails from TrySpin without touching _succ.
>>>>>>
>>>>>> // KEY-EVENT-5a: T4 sees that T1 is still the successor so
>>>>>> // KEY-EVENT-5b: T4 takes the quick exit path (no ExitEpilog)
>>>>>>
>>>>>> // KEY-EVENT-6a: T1 is about to park and it is the successor, but
>>>>>> // KEY-EVENT-6b: T3's unpark has been eaten by the JVM/TI event handler
>>>>>> // KEY-EVENT-6c: and T4 took the quick exit path. T1 is about to be
>>>>>> stuck.
>>>>>>
>>>>>>
>>>>>> This bug is intertwined with:
>>>>>>
>>>>>> - The ObjectMonitor successor protocol
>>>>>> - the sharing of a ParkEvent between Java Monitors and JVM/TI
>>>>>> RawMonitors
>>>>>>
>>>>>> There is a very long successor.notes attachment to JDK-8028073 that
>>>>>> attempts to describe the ObjectMonitor successor protocol. It's good
>>>>>> for putting pretty much anyone to sleep.
>>>>>>
>>>>>> Since this hang reproduces back to JDK6, this bug is taking the easily
>>>>>> backported solution of moving the original fix to the right location.
>>>>>> The following new bug has been filed for possible future work in this
>>>>>> area by the Serviceability Team:
>>>>>>
>>>>>>     8033399 add a separate ParkEvent for JVM/TI RawMonitor use
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8033399
>>>>>>
>>>>>>
>>>>>> The Symptoms
>>>>>> ------------
>>>>>>
>>>>>> With intermittent hangs like this, it is useful to know what to look
>>>>>> for in order to determine if you are running into this issue:
>>>>>>
>>>>>> - if you aren't using a debugger or a profiler or some other
>>>>>>   JVM/TI agent, then this hang is not the same as yours
>>>>>> - if your JVM/TI agent isn't using a JVMTI_EVENT_MONITOR_WAITED
>>>>>>   event handler, then this hang is not the same as yours
>>>>>> - if your JVMTI_EVENT_MONITOR_WAITED event handler is not using
>>>>>>   JVM/TI RawMonitors, then this hang is not the same as yours
>>>>>> - if your JVMTI_EVENT_MONITOR_WAITED event handler is calling
>>>>>>   back into Java code, then you might just be insane and this
>>>>>>   hang might be similar to yours. However, using a Java callback
>>>>>>   in an event handler is an even bigger problem/risk so fix that
>>>>>>   first.
>>>>>> - if you one or more threads blocked like this and making no
>>>>>>   progress, then this hang might be the same as yours:
>>>>>>
>>>>>> "T1" #22 prio=5 os_prio=64 tid=0x00000000009ca800 nid=0x2f waiting
>>>>>> for monitor e
>>>>>> ntry [0xfffffd7fc0231000]
>>>>>>    java.lang.Thread.State: BLOCKED (on object monitor)
>>>>>>    JavaThread state: _thread_blocked
>>>>>> Thread: 0x00000000009ca800  [0x2f] State: _at_safepoint
>>>>>> _has_called_back 0 _at_p
>>>>>> oll_safepoint 0
>>>>>>    JavaThread state: _thread_blocked
>>>>>>         at java.lang.Object.wait(Native Method)
>>>>>>         - waiting on <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>>         at java.lang.Object.wait(Object.java:502)
>>>>>>         at SMW_WorkerThread.run(StressMonitorWait.java:103)
>>>>>>         - locked <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>>
>>>>>> "T2" #23 prio=5 os_prio=64 tid=0x00000000009cc000 nid=0x30 waiting
>>>>>> for monitor e
>>>>>> ntry [0xfffffd7fc0130000]
>>>>>>    java.lang.Thread.State: BLOCKED (on object monitor)
>>>>>>    JavaThread state: _thread_blocked
>>>>>> Thread: 0x00000000009cc000  [0x30] State: _at_safepoint
>>>>>> _has_called_back 0 _at_p
>>>>>> oll_safepoint 0
>>>>>>    JavaThread state: _thread_blocked
>>>>>>         at SMW_WorkerThread.run(StressMonitorWait.java:120)
>>>>>>         - waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>>
>>>>>> "T3" #24 prio=5 os_prio=64 tid=0x00000000009ce000 nid=0x31 waiting
>>>>>> for monitor e
>>>>>> ntry [0xfffffd7fc002f000]
>>>>>>    java.lang.Thread.State: BLOCKED (on object monitor)
>>>>>>    JavaThread state: _thread_blocked
>>>>>> Thread: 0x00000000009ce000  [0x31] State: _at_safepoint
>>>>>> _has_called_back 0 _at_p
>>>>>> oll_safepoint 0
>>>>>>    JavaThread state: _thread_blocked
>>>>>>         at SMW_WorkerThread.run(StressMonitorWait.java:139)
>>>>>>         - waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>>
>>>>>> Key symptoms in thread T1:
>>>>>>
>>>>>> - had the object locked:
>>>>>>
>>>>>>   locked <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>>
>>>>>> - did an Object.wait():
>>>>>>
>>>>>>   waiting on <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>>
>>>>>> - is blocked on reentry:
>>>>>>
>>>>>>   waiting for monitor entry [0xfffffd7fc0231000]
>>>>>>
>>>>>> Key symtoms in thread T2:
>>>>>>
>>>>>> - is blocked waiting to lock the object:
>>>>>>
>>>>>>   waiting for monitor entry [0xfffffd7fc0130000]
>>>>>>   waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>>
>>>>>> Key symtoms in thread T3:
>>>>>>
>>>>>> - is blocked waiting to lock the object:
>>>>>>
>>>>>>   waiting for monitor entry [0xfffffd7fc002f000]
>>>>>>   waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>>