RFR: 8231289: Disentangle JvmtiRawMonitor from ObjectMonitor and clean it up
serguei.spitsyn at oracle.com
serguei.spitsyn at oracle.com
Mon Oct 7 07:58:44 UTC 2019
Hi Dan,
On 10/3/19 17:01, Daniel D. Daugherty wrote:
> On 10/3/19 7:35 PM, serguei.spitsyn at oracle.com wrote:
>>
>>
>> On 10/3/19 3:33 PM, David Holmes wrote:
>>> On 4/10/2019 3:15 am, serguei.spitsyn at oracle.com wrote:
>>>> On 10/3/19 03:13, David Holmes wrote:
>>>>> Hi Dan,
>>>>>
>>>>> On 3/10/2019 3:20 am, Daniel D. Daugherty wrote:
>>>>>> Sorry for the delay in reviewing this one... I've been playing
>>>>>> whack-a-mole
>>>>>> with Robbin's MoCrazy test and my AsyncMonitorDeflation bits...
>>>>>
>>>>> No problem - your contribution made the wait worthwhile :)
>>>>>
>>>>>> On 9/24/19 1:09 AM, David Holmes wrote:
>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231289
>>>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8231289/webrev/
>>>>>>
>>>>>> src/hotspot/share/prims/jvmtiEnv.cpp
>>>>>> Thanks for removing the PROPER_TRANSITIONS stuff. That was old
>>>>>> and crufty stuff.
>>>>>>
>>>>>> src/hotspot/share/prims/jvmtiEnvBase.cpp
>>>>>> No comments.
>>>>>>
>>>>>> src/hotspot/share/prims/jvmtiRawMonitor.cpp
>>>>>> L39: new (ResourceObj::C_HEAP, mtInternal)
>>>>>> GrowableArray<JvmtiRawMonitor*>(1,true);
>>>>>> nit - need a space between ',' and 'true'.
>>>>>>
>>>>>> Update: leave for your follow-up bug.
>>>>>
>>>>> Fixed now so I don't forget later. :)
>>>>>
>>>>>> src/hotspot/share/prims/jvmtiRawMonitor.hpp
>>>>>> No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/objectMonitor.hpp
>>>>>> Glad I added those 'protected for JvmtiRawMonitor' in one
>>>>>> of my recent cleanup bugs. Obviously I'll have to merge
>>>>>> with Async Monitor Deflation. :-)
>>>>>>
>>>>>> src/hotspot/share/runtime/thread.cpp
>>>>>> No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/thread.hpp
>>>>>> No comments.
>>>>>>
>>>>>> src/hotspot/share/services/threadService.cpp
>>>>>> L397: waitingToLockMonitor = jt->current_pending_monitor();
>>>>>> L398: if (waitingToLockMonitor == NULL) {
>>>>>> L399: // we can only be blocked on a raw monitor if
>>>>>> not blocked on an ObjectMonitor
>>>>>> L400: waitingToLockRawMonitor =
>>>>>> jt->current_pending_raw_monitor();
>>>>>> L401: }
>>>>>>
>>>>>> JVM/TI has this event handler:
>>>>>>
>>>>>> typedef void (JNICALL *jvmtiEventMonitorContendedEnter)
>>>>>> (jvmtiEnv *jvmti_env,
>>>>>> JNIEnv* jni_env,
>>>>>> jthread thread,
>>>>>> jobject object);
>>>>>>
>>>>>> This event handler is called after
>>>>>> set_current_pending_monitor()
>>>>>> and if the event handler uses a RawMonitor, then it
>>>>>> possible for
>>>>>> for the thread to show up as blocked on both a Java
>>>>>> monitor and
>>>>>> a JVM/TI RawMonitor.
>>>>>
>>>>> Oh that is interesting - good catch! So that means the current
>>>>> code is broken because the raw monitor will replace the
>>>>> ObjectMonitor as the pending monitor and then set it back to NULL,
>>>>> thus losing the fact the thread is actually pending on the
>>>>> ObjectMonitor. And of course while the pending monitor is the raw
>>>>> monitor that totally messes up the deadlock detection as the
>>>>> ObjectMonitor is missing from consideration. :(
>>>>
>>>> If I remember correctly it is a scenario where this issue also
>>>> comes to the picture:
>>>> https://bugs.openjdk.java.net/browse/JDK-8033399
>>>>
>>>> I do not really understand how shared ParkEvent can be
>>>> used/consumed by both ObjectMonitor and RawMonitor on the same thread.
>>>> But we observed and investigated this problem several years ago and
>>>> Dan finally filed this enhancement.
>>>
>>> I still don't see how this is possible as you are not actually
>>> enqueued on the ObjectMonitor when the call out to the event
>>> callback is made. but that is a topic for another email thread. :)
>
> Correct that you cannot be enqueued on the ObjectMonitor when you
> make the callback. However, I don't think that was the point of
> 8033399 when we filed so very long ago...
>
> Quoting a comment from David:
>
>> David Holmes added a comment - 2014-01-27 18:34
>> Could there be multiple places in event handling code that could in
>> theory consume unparks and so require the re-issue of an unpark()
>> from different locations in the code?
>>
>> Seems to me that perhaps raw monitors - given they can be entered
>> from within the normal monitor code - should have their own _event
>> object per thread, so that this accidental consumption of unparks can
>> not occur.
>
>
> The scenario that comes to mind:
>
> - T1 is contending on an ObjectMonitor and has set waitingToLockMonitor.
> - T1 calls the jvmtiEventMonitorContendedEnter event handler that
> contends on a JvmtiRawMonitor and has set waitingToLockRawMonitor.
> - T1 blocks on the JvmtiRawMonitor and parks.
> - T2 is exiting the ObjectMonitor and has picked T1 as the successor
> so it unparks T1. Only T1 is parked for the JvmtiRawMonitor and
> not for the ObjectMonitor. T2 hasn't quite finished exiting the
> ObjectMonitor yet... (not sure if this lag is possible)
> - T1 has unparked early for the JvmtiRawMonitor and at the same
> time T3 is exiting the JvmtiRawMonitor.
> - T1 quickly enters the JvmtiRawMonitor and T3 doesn't have to
> pick a successor so it doesn't do an unpark.
> - T1 finishes the work of the jvmtiEventMonitorContendedEnter and
> returns to the caller which is the code that's about to block on
> the ObjectMonitor...
> - T1 blocks on the ObjectMonitor and parks.
> - T2 finishes exiting the ObjectMonitor... Does T1 get unparked?
>
> I can't remember when T2 does the unpark() relative to dropping
> ownership of the ObjectMonitor. If the unpark() is first or if
> the _owner field setting to NULL lingers... it's possible for T1
> to block and park... with nothing to unpark T1...
>
> Pure, crazy theory here...
The scenario above is very interesting, thank you a lot!
I've added a comment with this to the 8033399.
> However, with David's work on this bug (8231289), this theoretical
> problem goes away... That's the only reason for trying to close
> this 8033399 sub-thread here...
It is not clear why this theoretical problem goes away with the David's
work on the bug 8231289.
I don't want to continue this discussion here and would suggest to move
to the 8231289 comments.
Thanks,
Serguei
>
> Dan
More information about the serviceability-dev
mailing list