RFR: 8231289: Disentangle JvmtiRawMonitor from ObjectMonitor and clean it up
serguei.spitsyn at oracle.com
serguei.spitsyn at oracle.com
Mon Oct 7 08:03:24 UTC 2019
On 10/7/19 00:58, serguei.spitsyn at oracle.com wrote:
> Hi Dan,
>
>
> On 10/3/19 17:01, Daniel D. Daugherty wrote:
>> On 10/3/19 7:35 PM, serguei.spitsyn at oracle.com wrote:
>>>
>>>
>>> On 10/3/19 3:33 PM, David Holmes wrote:
>>>> On 4/10/2019 3:15 am, serguei.spitsyn at oracle.com wrote:
>>>>> On 10/3/19 03:13, David Holmes wrote:
>>>>>> Hi Dan,
>>>>>>
>>>>>> On 3/10/2019 3:20 am, Daniel D. Daugherty wrote:
>>>>>>> Sorry for the delay in reviewing this one... I've been playing
>>>>>>> whack-a-mole
>>>>>>> with Robbin's MoCrazy test and my AsyncMonitorDeflation bits...
>>>>>>
>>>>>> No problem - your contribution made the wait worthwhile :)
>>>>>>
>>>>>>> On 9/24/19 1:09 AM, David Holmes wrote:
>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231289
>>>>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8231289/webrev/
>>>>>>>
>>>>>>> src/hotspot/share/prims/jvmtiEnv.cpp
>>>>>>> Thanks for removing the PROPER_TRANSITIONS stuff. That was old
>>>>>>> and crufty stuff.
>>>>>>>
>>>>>>> src/hotspot/share/prims/jvmtiEnvBase.cpp
>>>>>>> No comments.
>>>>>>>
>>>>>>> src/hotspot/share/prims/jvmtiRawMonitor.cpp
>>>>>>> L39: new (ResourceObj::C_HEAP, mtInternal)
>>>>>>> GrowableArray<JvmtiRawMonitor*>(1,true);
>>>>>>> nit - need a space between ',' and 'true'.
>>>>>>>
>>>>>>> Update: leave for your follow-up bug.
>>>>>>
>>>>>> Fixed now so I don't forget later. :)
>>>>>>
>>>>>>> src/hotspot/share/prims/jvmtiRawMonitor.hpp
>>>>>>> No comments.
>>>>>>>
>>>>>>> src/hotspot/share/runtime/objectMonitor.hpp
>>>>>>> Glad I added those 'protected for JvmtiRawMonitor' in one
>>>>>>> of my recent cleanup bugs. Obviously I'll have to merge
>>>>>>> with Async Monitor Deflation. :-)
>>>>>>>
>>>>>>> src/hotspot/share/runtime/thread.cpp
>>>>>>> No comments.
>>>>>>>
>>>>>>> src/hotspot/share/runtime/thread.hpp
>>>>>>> No comments.
>>>>>>>
>>>>>>> src/hotspot/share/services/threadService.cpp
>>>>>>> L397: waitingToLockMonitor =
>>>>>>> jt->current_pending_monitor();
>>>>>>> L398: if (waitingToLockMonitor == NULL) {
>>>>>>> L399: // we can only be blocked on a raw monitor if
>>>>>>> not blocked on an ObjectMonitor
>>>>>>> L400: waitingToLockRawMonitor =
>>>>>>> jt->current_pending_raw_monitor();
>>>>>>> L401: }
>>>>>>>
>>>>>>> JVM/TI has this event handler:
>>>>>>>
>>>>>>> typedef void (JNICALL *jvmtiEventMonitorContendedEnter)
>>>>>>> (jvmtiEnv *jvmti_env,
>>>>>>> JNIEnv* jni_env,
>>>>>>> jthread thread,
>>>>>>> jobject object);
>>>>>>>
>>>>>>> This event handler is called after
>>>>>>> set_current_pending_monitor()
>>>>>>> and if the event handler uses a RawMonitor, then it
>>>>>>> possible for
>>>>>>> for the thread to show up as blocked on both a Java
>>>>>>> monitor and
>>>>>>> a JVM/TI RawMonitor.
>>>>>>
>>>>>> Oh that is interesting - good catch! So that means the current
>>>>>> code is broken because the raw monitor will replace the
>>>>>> ObjectMonitor as the pending monitor and then set it back to
>>>>>> NULL, thus losing the fact the thread is actually pending on the
>>>>>> ObjectMonitor. And of course while the pending monitor is the raw
>>>>>> monitor that totally messes up the deadlock detection as the
>>>>>> ObjectMonitor is missing from consideration. :(
>>>>>
>>>>> If I remember correctly it is a scenario where this issue also
>>>>> comes to the picture:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8033399
>>>>>
>>>>> I do not really understand how shared ParkEvent can be
>>>>> used/consumed by both ObjectMonitor and RawMonitor on the same
>>>>> thread.
>>>>> But we observed and investigated this problem several years ago
>>>>> and Dan finally filed this enhancement.
>>>>
>>>> I still don't see how this is possible as you are not actually
>>>> enqueued on the ObjectMonitor when the call out to the event
>>>> callback is made. but that is a topic for another email thread. :)
>>
>> Correct that you cannot be enqueued on the ObjectMonitor when you
>> make the callback. However, I don't think that was the point of
>> 8033399 when we filed so very long ago...
>>
>> Quoting a comment from David:
>>
>>> David Holmes added a comment - 2014-01-27 18:34
>>> Could there be multiple places in event handling code that could in
>>> theory consume unparks and so require the re-issue of an unpark()
>>> from different locations in the code?
>>>
>>> Seems to me that perhaps raw monitors - given they can be entered
>>> from within the normal monitor code - should have their own _event
>>> object per thread, so that this accidental consumption of unparks
>>> can not occur.
>>
>>
>> The scenario that comes to mind:
>>
>> - T1 is contending on an ObjectMonitor and has set waitingToLockMonitor.
>> - T1 calls the jvmtiEventMonitorContendedEnter event handler that
>> contends on a JvmtiRawMonitor and has set waitingToLockRawMonitor.
>> - T1 blocks on the JvmtiRawMonitor and parks.
>> - T2 is exiting the ObjectMonitor and has picked T1 as the successor
>> so it unparks T1. Only T1 is parked for the JvmtiRawMonitor and
>> not for the ObjectMonitor. T2 hasn't quite finished exiting the
>> ObjectMonitor yet... (not sure if this lag is possible)
>> - T1 has unparked early for the JvmtiRawMonitor and at the same
>> time T3 is exiting the JvmtiRawMonitor.
>> - T1 quickly enters the JvmtiRawMonitor and T3 doesn't have to
>> pick a successor so it doesn't do an unpark.
>> - T1 finishes the work of the jvmtiEventMonitorContendedEnter and
>> returns to the caller which is the code that's about to block on
>> the ObjectMonitor...
>> - T1 blocks on the ObjectMonitor and parks.
>> - T2 finishes exiting the ObjectMonitor... Does T1 get unparked?
>>
>> I can't remember when T2 does the unpark() relative to dropping
>> ownership of the ObjectMonitor. If the unpark() is first or if
>> the _owner field setting to NULL lingers... it's possible for T1
>> to block and park... with nothing to unpark T1...
>>
>> Pure, crazy theory here...
>
> The scenario above is very interesting, thank you a lot!
> I've added a comment with this to the 8033399.
>
>
>> However, with David's work on this bug (8231289), this theoretical
>> problem goes away... That's the only reason for trying to close
>> this 8033399 sub-thread here...
>
> It is not clear why this theoretical problem goes away with the
> David's work on the bug 8231289.
> I don't want to continue this discussion here and would suggest to
> move to the 8231289 comments.
Please, skip it.
I see your comment on this in one of the following messages from you.
Thanks,
Serguei
> Thanks,
> Serguei
>
>>
>> Dan
More information about the serviceability-dev
mailing list