code review round 0 for ObjectMonitor-JVM/TI hang fix (8028073)
Daniel D. Daugherty
daniel.daugherty at oracle.com
Mon Feb 10 12:31:20 PST 2014
On 2/10/14 1:20 PM, Karen Kinnear wrote:
> Dan,
>
> Thank you so much. My bad - I was looking at a jdk8 repo, not a jdk9 one.
No problem... I had the advantage of wanting Mr Simms changes so
that I could (more easily) develop the debug code flow hooks that
I'm planning to add to the "debug tips and tricks" wiki...
> So I agree that the JDK9 fix as is works. Code change reviewed.
Thanks for confirmation!
> For JDK8:
> I don't believe we were planning to backport this to 8 given risks of changes in this area.
Ummm.... Not JDK8-GA, but definitely a JDK8-Update... As usual,
I plan to do the backport engineering and I'll let someone else
worry about the politics... :-)
> I did reach the same conclusion you did, that the WaitSetLock acquirers who already own
> the lock don't have this issue, but those that don't already own the lock do have
> the problem, and the timed wait could trigger this.
> And that a JDK8 fix would take the change out of the jvmti conditional, or need the 8028280
> fix, which I also believe we do not plan to backport.
Yeah, I'll chat with Mr Simms about backporting 8028280... That
os::naked_short_sleep() function is so very useful...
> thank you for the detailed walk-through,
No problem. Thank you for slogging through the details here.
Dan
> Karen
>
> On Feb 10, 2014, at 1:55 PM, Daniel D. Daugherty wrote:
>
>> On 2/9/14 8:37 PM, David Holmes wrote:
>>> trimming content ...
>>>
>>> On 8/02/2014 9:45 AM, Daniel D. Daugherty wrote:
>>>> On 2/7/14 2:56 PM, Karen Kinnear wrote:
>>>>> 3. Did I read the code correctly that the Thread::SpinAcquire can make
>>>>> a timed park
>>>>> call on the same thread's _ParkEvent? And that this is used to get on
>>>>> and off the wait queue,
>>>>> i.e. to acquire the WaitSetLock?
>>>>> Is there the same risk that a notify might be eaten here also?
>>>> As far as I can see, Thread::SpinAcquire() does not use a ParkEvent
>>> It sure does:
>>>
>>> void Thread::SpinAcquire (volatile int * adr, const char * LockName) {
>>> if (Atomic::cmpxchg (1, adr, 0) == 0) {
>>> return ; // normal fast-path return
>>> }
>>>
>>> // Slow-path : We've encountered contention -- Spin/Yield/Block strategy.
>>> TEVENT (SpinAcquire - ctx) ;
>>> int ctr = 0 ;
>>> int Yields = 0 ;
>>> for (;;) {
>>> while (*adr != 0) {
>>> ++ctr ;
>>> if ((ctr & 0xFFF) == 0 || !os::is_MP()) {
>>> if (Yields > 5) {
>>> // Consider using a simple NakedSleep() instead.
>>> // Then SpinAcquire could be called by non-JVM threads
>>> Thread::current()->_ParkEvent->park(1) ;
>> Ummmm... that's not the code I'm seeing...
>>
>> src/share/vm/runtime/thread.cpp:
>>
>> 4417 void Thread::SpinAcquire (volatile int * adr, const char * LockName) {
>> 4418 if (Atomic::cmpxchg (1, adr, 0) == 0) {
>> 4419 return ; // normal fast-path return
>> 4420 }
>> 4421
>> 4422 // Slow-path : We've encountered contention -- Spin/Yield/Block strategy.
>> 4423 TEVENT (SpinAcquire - ctx) ;
>> 4424 int ctr = 0 ;
>> 4425 int Yields = 0 ;
>> 4426 for (;;) {
>> 4427 while (*adr != 0) {
>> 4428 ++ctr ;
>> 4429 if ((ctr & 0xFFF) == 0 || !os::is_MP()) {
>> 4430 if (Yields > 5) {
>> 4431 os::naked_short_sleep(1);
>> 4432 } else {
>> 4433 os::NakedYield() ;
>> 4434 ++Yields ;
>> 4435 }
>> 4436 } else {
>> 4437 SpinPause() ;
>> 4438 }
>> 4439 }
>> 4440 if (Atomic::cmpxchg (1, adr, 0) == 0) return ;
>> 4441 }
>> 4442 }
>>
>> Mr Simms recently changed the above code via:
>>
>> changeset: 5832:5944dba4badc
>> user: dsimms
>> date: Fri Jan 24 09:28:47 2014 +0100
>> summary: 8028280: ParkEvent leak when running modified runThese which only loads classes
>>
>> os::naked_short_sleep() is new:
>>
>> - BSD/MacOS X, Linux - uses nanosleep()
>> - Solaris - uses usleep()
>> - Windows - uses Sleep()
>>
>> The fix for 8028280 was pushed to JDK9/hs-rt on 2014.01.24 and to JDK9/hs
>> on 2014.01.29. I don't see any signs that Mr Simm's fix will be backported
>> to JDK8u/HSX-25u (yet) so this part of the review thread might impact the
>> backport of my fix to earlier releases.
>>
>>
>>> So considering Karen's question ... I can't tell for certain. :(
>>>
>>> I do not think the SpinAcquire on grabbing the wait-set lock to add to the wait-set can be an issue because we will only park in response to the actual wait, and hence only get unparked due to a notify/notifyAll, but at this point we still own the monitor so no notify/notifyAll is possible.
>>>
>>> However, for the removal from the wait-set a more complex analysis is needed. To do the SpinAcquire we must still be flagged as TS_WAIT - which means we have not been notified, but must be returning due to a timeout (or spurious wakeup?). In such circumstances could we be _succ? I don't think so but I'll leave it to Dan to confirm that part :)
>> So for HSX-25 and probably older...
>>
>> There are four Thread::SpinAcquire() calls in the objectMonitor code:
>>
>> Thread::SpinAcquire (&_WaitSetLock, "WaitSet - add") ;
>> Thread::SpinAcquire (&_WaitSetLock, "WaitSet - unlink") ;
>> Thread::SpinAcquire (&_WaitSetLock, "WaitSet - notify") ;
>> Thread::SpinAcquire (&_WaitSetLock, "WaitSet - notifyall") ;
>>
>> We can easily rule out the "notify" and "notifyAll" uses since the
>> current thread owns the Java-level monitor and there are no events
>> to post in this part of the notify() or notifyAll() protocols.
>>
>> For the "WaitSet - add" use, the current thread owns the Java-level
>> monitor and the thread has not been added as a waiter yet so another
>> thread cannot do the notify-exit-make-successor part of the protocol
>> yet.
>>
>> For the "WaitSet - unlink" use:
>>
>> src/share/vm/runtime/objectMonitor.cpp:
>>
>> 1569 if (node.TState == ObjectWaiter::TS_WAIT) {
>> 1570 Thread::SpinAcquire (&_WaitSetLock, "WaitSet - unlink") ;
>> 1571 if (node.TState == ObjectWaiter::TS_WAIT) {
>> 1572 DequeueSpecificWaiter (&node) ; // unlink from WaitSet
>> 1573 assert(node._notified == 0, "invariant");
>> 1574 node.TState = ObjectWaiter::TS_RUN ;
>> 1575 }
>> 1576 Thread::SpinRelease (&_WaitSetLock) ;
>> 1577 }
>>
>> It is the call on line 1570 above that gets us into this code:
>>
>> src/share/vm/runtime/thread.cpp:
>>
>> 4435 void Thread::SpinAcquire (volatile int * adr, const char * LockName) {
>> 4436 if (Atomic::cmpxchg (1, adr, 0) == 0) {
>> 4437 return ; // normal fast-path return
>> 4438 }
>> 4439
>> 4440 // Slow-path : We've encountered contention -- Spin/Yield/Block strategy.
>> 4441 TEVENT (SpinAcquire - ctx) ;
>> 4442 int ctr = 0 ;
>> 4443 int Yields = 0 ;
>> 4444 for (;;) {
>> 4445 while (*adr != 0) {
>> 4446 ++ctr ;
>> 4447 if ((ctr & 0xFFF) == 0 || !os::is_MP()) {
>> 4448 if (Yields > 5) {
>> 4449 // Consider using a simple NakedSleep() instead.
>> 4450 // Then SpinAcquire could be called by non-JVM threads
>> 4451 Thread::current()->_ParkEvent->park(1) ;
>> 4452 } else {
>> 4453 os::NakedYield() ;
>> 4454 ++Yields ;
>> 4455 }
>> 4456 } else {
>> 4457 SpinPause() ;
>> 4458 }
>> 4459 }
>> 4460 if (Atomic::cmpxchg (1, adr, 0) == 0) return ;
>> 4461 }
>> 4462 }
>>
>> And the above code can consume the unpark() on line 4451.
>>
>> So how the heck do we get to line 1570???
>>
>> Well, the target thread would have to be both notified and unparked
>> to be executing this code path. When the notify() code runs, the
>> target of the notify() is changed from ObjectWaiter::TS_WAIT to
>> ObjectWaiter::TS_ENTER unless Knob_MoveNotifyee == 4. The default
>> for Knob_MoveNotifyee == 2 so we're in non default mode here...
>>
>> Here are the Knob_MoveNotifyee policy values:
>>
>> 1717 if (Policy == 0) { // prepend to EntryList
>> 1728 if (Policy == 1) { // append to EntryList
>> 1744 if (Policy == 2) { // prepend to cxq
>> 1760 if (Policy == 3) { // append to cxq
>>
>> For Knob_MoveNotifyee == 4 (or higher), we use the old mechanism
>> where we just unpark the target thread and let it run. Part of
>> that code changes from ObjectWaiter::TS_WAIT to ObjectWaiter::TS_RUN.
>>
>> The code works the same for notifyAll() for the thread picked
>> to be notified. For the Knob_MoveNotifyee == 4 (or higher) case,
>> we just unpark all the waiters and we a free-for-all.
>>
>> So it looks like the code block from lines 1569-1577 is never
>> used... or is it? Well... you have to remember two things:
>>
>> 1) spurious unpark()
>> 2) timed wait()
>>
>> The caller might have called wait(0), but that doesn't mean that
>> the underlying park() mechanism won't have a spurious unpark().
>> Or better, the caller might have called wait(1) and be running
>> again after a millisecond.
>>
>> So in the HSX25 and older system (i.e., without Mr Simms fix for
>> 8028280), it is possible for this call:
>>
>> 1570 Thread::SpinAcquire (&_WaitSetLock, "WaitSet - unlink") ;
>>
>> to consume the unpark(). The gauntlet that has to be traversed
>> to get to this call:
>>
>> 4451 Thread::current()->_ParkEvent->park(1) ;
>>
>> is impressive:
>>
>> - fast-path acquisition of the _WaitSetLock has to fail:
>>
>> 4436 if (Atomic::cmpxchg (1, adr, 0) == 0) {
>> 4437 return ; // normal fast-path return
>> 4438 }
>>
>> - if the machine is a uniprocessor, then 6 os::NakedYield()
>> call-loop-recheck attempts have to fail:
>>
>> 4447 if ((ctr & 0xFFF) == 0 || !os::is_MP()) {
>> 4448 if (Yields > 5) {
>> 4449 // Consider using a simple NakedSleep() instead.
>> 4450 // Then SpinAcquire could be called by non-JVM threads
>> 4451 Thread::current()->_ParkEvent->park(1) ;
>> 4452 } else {
>> 4453 os::NakedYield() ;
>> 4454 ++Yields ;
>> 4455 }
>>
>> - if the machine is a multi-processor, then 6 rounds of { 4095 SpinPause()
>> attempts, 1 os::NakedYield() attempt} have to fail:
>>
>> 4446 ++ctr ;
>> 4447 if ((ctr & 0xFFF) == 0 || !os::is_MP()) {
>> 4448 if (Yields > 5) {
>> 4449 // Consider using a simple NakedSleep() instead.
>> 4450 // Then SpinAcquire could be called by non-JVM threads
>> 4451 Thread::current()->_ParkEvent->park(1) ;
>> 4452 } else {
>> 4453 os::NakedYield() ;
>> 4454 ++Yields ;
>> 4455 }
>> 4456 } else {
>> 4457 SpinPause() ;
>> 4458 }
>>
>> But it is possible. It is one of those once-in-a-blue moon type
>> windows where everything has to line up just so.
>>
>> So how do we address this issue in HSX-25 and possibly older?
>>
>> If Mr Simms fix for 8028280 is also backported, then there is no
>> issue. If it is not backported, then applying the fix for this
>> bug like so:
>>
>> src/share/vm/runtime/objectMonitor.cpp:
>>
>> 1596 if (JvmtiExport::should_post_monitor_waited()) {
>> 1597 JvmtiExport::post_monitor_waited(jt, this, ret == OS_TIMEOUT);
>> 1598 }
>>
>> 1604 if (node._notified != 0 && _succ == Self) {
>> 1605 // In this part of the monitor wait-notify-reenter protocol it
>> 1606 // is possible (and normal) for another thread to do a fastpath
>> 1607 // monitor enter-exit while this thread is still trying to get
>> 1608 // to the reenter portion of the protocol.
>> 1609 //
>> 1610 // The ObjectMonitor was notified and the current thread is
>> 1611 // the successor which also means that an unpark() has already
>> 1612 // been done. The JVMTI_EVENT_MONITOR_WAITED event handler can
>> 1613 // consume the unpark() that was done when the successor was
>> 1614 // set because the same ParkEvent is shared between Java
>> 1615 // monitors and JVM/TI RawMonitors (for now).
>> 1616 //
>> 1617 // We redo the unpark() to ensure forward progress, i.e., we
>> 1618 // don't want all pending threads hanging (parked) with none
>> 1619 // entering the unlocked monitor.
>> 1620 node._event->unpark();
>> 1621 }
>>
>> Of course the line numbers for the "fix" would be different and the comment
>> would need to be updated to reflect that the:
>>
>> 1570 Thread::SpinAcquire (&_WaitSetLock, "WaitSet - unlink") ;
>>
>> call above could also consume an unpark(), but it should work.
>>
>> If you've read this far, then I'm impressed. If you've read this far
>> and only fallen asleep a couple of times, then I'm still impressed.
>>
>> Summary: I don't think we have an issue in JDK9, but we'll have to do
>> the fix in JDK8/HSX25 and older a little differently.
>>
>> Dan
>>
>>
>>> David
>>> -----
>>>
>>>> at all. However, Thread::muxAcquire() does use a ParkEvent, but it
>>>> is a different ParkEvent. From src/share/vm/runtime/thread.hpp:
>>>>
>>>> ParkEvent * _ParkEvent ; // for synchronized()
>>>> ParkEvent * _SleepEvent ; // for Thread.sleep
>>>> ParkEvent * _MutexEvent ; // for native internal
>>>> Mutex/Monitor
>>>> ParkEvent * _MuxEvent ; // for low-level
>>>> muxAcquire-muxRelease
>>>>
>>>> So ObjectMonitor uses the _ParkEvent field and Thread::muxAcquire()
>>>> uses the _MuxEvent. There are some comments in thread.cpp about
>>>> how _MuxEvent could be eliminated and _ParkEvent shared, but I don't
>>>> think we ever want to go there.
>>>>
>>>> I also filed this RFE:
>>>>
>>>> 8033399 add a separate ParkEvent for JVM/TI RawMonitor use
>>>> https://bugs.openjdk.java.net/browse/JDK-8033399
>>>>
>>>> just in case the Serviceability team wants to migrate JVM/TI RawMonitors
>>>> to a separate ParkEvent.
>>>>
>>>> Please let me know if you concur that I've resolved issue #3.
>>>>
>>>>
>>>>> If so, I wonder if we want this added unpark to not just be called if
>>>>> JVMTI_EVENT_MONITOR_WAITED
>>>>> is enabled?
>>>> I don't think we need it, but I've noted its removal as a risk.
>>>>
>>>> Again, thanks for the review!
>>>>
>>>> Dan
>>>>
>>>>
>>>>> thanks,
>>>>> Karen
>>>>>
>>>>> On Feb 1, 2014, at 1:38 PM, Daniel D. Daugherty wrote:
>>>>>
>>>>>> Greetings,
>>>>>>
>>>>>> I have a fix ready for the following bug:
>>>>>>
>>>>>> 8028073 race condition in ObjectMonitor implementation causing
>>>>>> deadlocks
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8028073
>>>>>>
>>>>>> On the surface, this is a very simple fix that relocates a few lines of
>>>>>> code, relocates and rewrites the comments associated with that code and
>>>>>> adds several new comments.
>>>>>>
>>>>>> Of course, in reality, the issue is much more complicated, but I'm
>>>>>> hoping to make it easy for anyone not acquainted with this issue to
>>>>>> understand what's going on.
>>>>>>
>>>>>> Here are the JDK9 webrev URLs:
>>>>>>
>>>>>> OpenJDK:
>>>>>> http://cr.openjdk.java.net/~dcubed/8028073-webrev/0-jdk9-hs-runtime/
>>>>>>
>>>>>> Oracle internal:
>>>>>> http://javaweb.us.oracle.com/~ddaugher/8028073-webrev/0-jdk9-hs-runtime/
>>>>>>
>>>>>> The simple summary:
>>>>>>
>>>>>> - since Java Monitors and JVM/TI RawMonitors share a ParkEvent,
>>>>>> it is possible for a JVM/TI monitor event handler to accidentally
>>>>>> consume a ParkEvent.unpark() call meant for Java Monitor layer
>>>>>> - the original code fix was made on 2005.07.04 using this bug ID:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-5030359
>>>>>> - it's the right fix, but it's in the wrong place
>>>>>> - the fix needs to be after the JVMTI_EVENT_MONITOR_WAITED
>>>>>> event handler is called because it is that event handler
>>>>>> that can cause the hang
>>>>>>
>>>>>>
>>>>>> Testing
>>>>>> -------
>>>>>>
>>>>>> - a new StessMonitorWait test has been created that reliably
>>>>>> reproduces the hang in JDK[6789]; see the bug's gory details
>>>>>> for the specific versions where the hang has been reproduced
>>>>>> - the test reliably reproduces the hang in 5 seconds on my
>>>>>> T7600 running Solaris 10u11 X86; 1 minute runs reproduce
>>>>>> the hang reliably on other machines
>>>>>> - 12 hour stress run of the new test on Linux-X64, MacOS X-X64,
>>>>>> Solaris-SPARCV9, Solaris-X64, and Win7-X86 with the JPRT
>>>>>> bits did not reproduce the hang
>>>>>> - JPRT test job
>>>>>> - VM/SQE Adhoc test job on Server VM, fastdebug bits on Linux-X86,
>>>>>> Linux-X64, MacOS X-X64, Solaris-SPARCV9, Solaris-X64, Windows-X86,
>>>>>> and Windows-X64:
>>>>>> - vm.quick
>>>>>> - Kitchensink (bigapps)
>>>>>> - Weblogic+medrec (bigapps)
>>>>>> - runThese (bigapps)
>>>>>>
>>>>>>
>>>>>> The Gory Details Start Here
>>>>>> ---------------------------
>>>>>>
>>>>>> This is the old location of block of code that's being moved:
>>>>>>
>>>>>> src/share/vm/runtime/objectMonitor.cpp:
>>>>>>
>>>>>> 1440 void ObjectMonitor::wait(jlong millis, bool interruptible, TRAPS) {
>>>>>> <snip>
>>>>>> 1499 exit (true, Self) ; // exit the monitor
>>>>>> <snip>
>>>>>> 1513 if (node._notified != 0 && _succ == Self) {
>>>>>> 1514 node._event->unpark();
>>>>>> 1515 }
>>>>>>
>>>>>>
>>>>>> This is the new location of block of code that's being moved:
>>>>>>
>>>>>> src/share/vm/runtime/objectMonitor.cpp:
>>>>>>
>>>>>> 1452 void ObjectMonitor::wait(jlong millis, bool interruptible, TRAPS) {
>>>>>> <snip>
>>>>>> 1601 if (JvmtiExport::should_post_monitor_waited()) {
>>>>>> 1602 JvmtiExport::post_monitor_waited(jt, this, ret ==
>>>>>> OS_TIMEOUT);
>>>>>> <snip>
>>>>>> 1604 if (node._notified != 0 && _succ == Self) {
>>>>>> <snip>
>>>>>> 1620 node._event->unpark();
>>>>>> 1621 }
>>>>>>
>>>>>>
>>>>>> The Risks
>>>>>> ---------
>>>>>>
>>>>>> - The code now executes only when the JVMTI_EVENT_MONITOR_WAITED event
>>>>>> is enabled:
>>>>>> - previously it was always executed
>>>>>> - while the old code was not effective for the hang that is being
>>>>>> fixed with this bug, it is possible that the old code prevented
>>>>>> a different bug in the successor protocol from manifesting
>>>>>> - thorough analysis of the successor protocol did not reveal a
>>>>>> case where the old code was needed in the old location
>>>>>> - Thorough analysis indicates that the other JVM/TI monitor events
>>>>>> do not need a fix like the one for JVMTI_EVENT_MONITOR_WAITED:
>>>>>> - the successor protocol is complicated and the analysis could
>>>>>> be wrong when certain options are used
>>>>>> - comments were added to each location where a JVM/TI monitor
>>>>>> event handler is called documenting why a fix like this one
>>>>>> is not needed there
>>>>>> - if the analysis is wrong, the new comments show where a new
>>>>>> code change would be needed
>>>>>>
>>>>>>
>>>>>> The Scenario
>>>>>> ------------
>>>>>>
>>>>>> I've created a scenario that reproduces this hang:
>>>>>>
>>>>>> T1 - enters monitor and calls monitor.wait()
>>>>>> T2 - enters the monitor, calls monitor.notify() and exits the monitor
>>>>>> T3 - enters and exits the monitor
>>>>>> T4 - enters the monitor, delays for 5 seconds, exits the monitor
>>>>>>
>>>>>> A JVM/TI agent that enables JVMTI_EVENT_MONITOR_WAITED and has a
>>>>>> handler that: enters a raw monitor, waits for 1ms, exits a raw monitor.
>>>>>>
>>>>>> Here are the six events necessary to make this hang happen:
>>>>>>
>>>>>> // KEY-EVENT-1a: After being unparked(), T1 has cleared the _succ
>>>>>> field, but
>>>>>> // KEY-EVENT-1b: T3 is exiting the monitor and makes T1 the successor
>>>>>> again.
>>>>>>
>>>>>> // KEY-EVENT-2a: The unpark() done by T3 when it made T1 the successor
>>>>>> // KEY-EVENT-2b: is consumed by the JVM/TI event handler.
>>>>>>
>>>>>> // KEY-EVENT-3a: T3 made T1 the successor
>>>>>> // KEY-EVENT-3b: but before T1 could reenter the monitor T4 grabbed it.
>>>>>>
>>>>>> // KEY-EVENT-4a: T1's TrySpin() call sees T4 as NotRunnable so
>>>>>> // KEY-EVENT-4b: T1 bails from TrySpin without touching _succ.
>>>>>>
>>>>>> // KEY-EVENT-5a: T4 sees that T1 is still the successor so
>>>>>> // KEY-EVENT-5b: T4 takes the quick exit path (no ExitEpilog)
>>>>>>
>>>>>> // KEY-EVENT-6a: T1 is about to park and it is the successor, but
>>>>>> // KEY-EVENT-6b: T3's unpark has been eaten by the JVM/TI event handler
>>>>>> // KEY-EVENT-6c: and T4 took the quick exit path. T1 is about to be
>>>>>> stuck.
>>>>>>
>>>>>>
>>>>>> This bug is intertwined with:
>>>>>>
>>>>>> - The ObjectMonitor successor protocol
>>>>>> - the sharing of a ParkEvent between Java Monitors and JVM/TI
>>>>>> RawMonitors
>>>>>>
>>>>>> There is a very long successor.notes attachment to JDK-8028073 that
>>>>>> attempts to describe the ObjectMonitor successor protocol. It's good
>>>>>> for putting pretty much anyone to sleep.
>>>>>>
>>>>>> Since this hang reproduces back to JDK6, this bug is taking the easily
>>>>>> backported solution of moving the original fix to the right location.
>>>>>> The following new bug has been filed for possible future work in this
>>>>>> area by the Serviceability Team:
>>>>>>
>>>>>> 8033399 add a separate ParkEvent for JVM/TI RawMonitor use
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8033399
>>>>>>
>>>>>>
>>>>>> The Symptoms
>>>>>> ------------
>>>>>>
>>>>>> With intermittent hangs like this, it is useful to know what to look
>>>>>> for in order to determine if you are running into this issue:
>>>>>>
>>>>>> - if you aren't using a debugger or a profiler or some other
>>>>>> JVM/TI agent, then this hang is not the same as yours
>>>>>> - if your JVM/TI agent isn't using a JVMTI_EVENT_MONITOR_WAITED
>>>>>> event handler, then this hang is not the same as yours
>>>>>> - if your JVMTI_EVENT_MONITOR_WAITED event handler is not using
>>>>>> JVM/TI RawMonitors, then this hang is not the same as yours
>>>>>> - if your JVMTI_EVENT_MONITOR_WAITED event handler is calling
>>>>>> back into Java code, then you might just be insane and this
>>>>>> hang might be similar to yours. However, using a Java callback
>>>>>> in an event handler is an even bigger problem/risk so fix that
>>>>>> first.
>>>>>> - if you one or more threads blocked like this and making no
>>>>>> progress, then this hang might be the same as yours:
>>>>>>
>>>>>> "T1" #22 prio=5 os_prio=64 tid=0x00000000009ca800 nid=0x2f waiting
>>>>>> for monitor e
>>>>>> ntry [0xfffffd7fc0231000]
>>>>>> java.lang.Thread.State: BLOCKED (on object monitor)
>>>>>> JavaThread state: _thread_blocked
>>>>>> Thread: 0x00000000009ca800 [0x2f] State: _at_safepoint
>>>>>> _has_called_back 0 _at_p
>>>>>> oll_safepoint 0
>>>>>> JavaThread state: _thread_blocked
>>>>>> at java.lang.Object.wait(Native Method)
>>>>>> - waiting on <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>> at java.lang.Object.wait(Object.java:502)
>>>>>> at SMW_WorkerThread.run(StressMonitorWait.java:103)
>>>>>> - locked <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>>
>>>>>> "T2" #23 prio=5 os_prio=64 tid=0x00000000009cc000 nid=0x30 waiting
>>>>>> for monitor e
>>>>>> ntry [0xfffffd7fc0130000]
>>>>>> java.lang.Thread.State: BLOCKED (on object monitor)
>>>>>> JavaThread state: _thread_blocked
>>>>>> Thread: 0x00000000009cc000 [0x30] State: _at_safepoint
>>>>>> _has_called_back 0 _at_p
>>>>>> oll_safepoint 0
>>>>>> JavaThread state: _thread_blocked
>>>>>> at SMW_WorkerThread.run(StressMonitorWait.java:120)
>>>>>> - waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>>
>>>>>> "T3" #24 prio=5 os_prio=64 tid=0x00000000009ce000 nid=0x31 waiting
>>>>>> for monitor e
>>>>>> ntry [0xfffffd7fc002f000]
>>>>>> java.lang.Thread.State: BLOCKED (on object monitor)
>>>>>> JavaThread state: _thread_blocked
>>>>>> Thread: 0x00000000009ce000 [0x31] State: _at_safepoint
>>>>>> _has_called_back 0 _at_p
>>>>>> oll_safepoint 0
>>>>>> JavaThread state: _thread_blocked
>>>>>> at SMW_WorkerThread.run(StressMonitorWait.java:139)
>>>>>> - waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>>
>>>>>> Key symptoms in thread T1:
>>>>>>
>>>>>> - had the object locked:
>>>>>>
>>>>>> locked <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>>
>>>>>> - did an Object.wait():
>>>>>>
>>>>>> waiting on <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>>
>>>>>> - is blocked on reentry:
>>>>>>
>>>>>> waiting for monitor entry [0xfffffd7fc0231000]
>>>>>>
>>>>>> Key symtoms in thread T2:
>>>>>>
>>>>>> - is blocked waiting to lock the object:
>>>>>>
>>>>>> waiting for monitor entry [0xfffffd7fc0130000]
>>>>>> waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>>
>>>>>> Key symtoms in thread T3:
>>>>>>
>>>>>> - is blocked waiting to lock the object:
>>>>>>
>>>>>> waiting for monitor entry [0xfffffd7fc002f000]
>>>>>> waiting to lock <0xfffffd7e6a2b6ff0> (a java.lang.String)
>>>>>>
More information about the serviceability-dev
mailing list