RFR: 8231289: Disentangle JvmtiRawMonitor from ObjectMonitor and clean it up

Fri Oct 4 13:43:41 UTC 2019

On 10/3/19 9:55 PM, David Holmes wrote:
> <trimming>
>
> On 4/10/2019 10:01 am, Daniel D. Daugherty wrote:
>> On 10/3/19 7:35 PM, serguei.spitsyn at oracle.com wrote:
>>>>> If I remember correctly it is a scenario where this issue also 
>>>>> comes to the picture:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8033399
>>>>>
>>>>> I do not really understand how shared ParkEvent can be 
>>>>> used/consumed by both ObjectMonitor and RawMonitor on the same 
>>>>> thread.
>>>>> But we observed and investigated this problem several years ago 
>>>>> and Dan finally filed this enhancement.
>>>>
>>>> I still don't see how this is possible as you are not actually 
>>>> enqueued on the ObjectMonitor when the call out to the event 
>>>> callback is made. but that is a topic for another email thread. :)
>>
>> Correct that you cannot be enqueued on the ObjectMonitor when you
>> make the callback. However, I don't think that was the point of
>> 8033399 when we filed so very long ago...
>>
>> Quoting a comment from David:
>>
>>> David Holmes added a comment - 2014-01-27 18:34
>>> Could there be multiple places in event handling code that could in 
>>> theory consume unparks and so require the re-issue of an unpark() 
>>> from different locations in the code?
>>>
>>> Seems to me that perhaps raw monitors - given they can be entered 
>>> from within the normal monitor code - should have their own _event 
>>> object per thread, so that this accidental consumption of unparks 
>>> can not occur. 
>
> And since then I've decided this isn't actually a problem.

Sounds good to me.

>
>>
>> The scenario that comes to mind:
>>
>> - T1 is contending on an ObjectMonitor and has set waitingToLockMonitor.
>> - T1 calls the jvmtiEventMonitorContendedEnter event handler that
>>    contends on a JvmtiRawMonitor and has set waitingToLockRawMonitor.
>> - T1 blocks on the JvmtiRawMonitor and parks.
>> - T2 is exiting the ObjectMonitor and has picked T1 as the successor
>>    so it unparks T1. 
>
> Nope - T1 is not yet enqueued on the ObjectMonitor so it can't be 
> picked as successor.

Also sounds good.

>
>> Only T1 is parked for the JvmtiRawMonitor and
>>    not for the ObjectMonitor. T2 hasn't quite finished exiting the
>>    ObjectMonitor yet... (not sure if this lag is possible)
>> - T1 has unparked early for the JvmtiRawMonitor and at the same
>>    time T3 is exiting the JvmtiRawMonitor.
>> - T1 quickly enters the JvmtiRawMonitor and T3 doesn't have to
>>    pick a successor so it doesn't do an unpark.
>> - T1 finishes the work of the jvmtiEventMonitorContendedEnter and
>>    returns to the caller which is the code that's about to block on
>>    the ObjectMonitor...
>> - T1 blocks on the ObjectMonitor and parks.
>
> For this to actually be a problem requires that the call out to the 
> raw monitor code happens inbetween the check for whether T1 needs to 
> park and the park call. That also is not the case.

Also sounds good.

>
>> - T2 finishes exiting the ObjectMonitor... Does T1 get unparked?
>>
>> I can't remember when T2 does the unpark() relative to dropping
>> ownership of the ObjectMonitor. If the unpark() is first or if
>> the _owner field setting to NULL lingers... it's possible for T1
>> to block and park... with nothing to unpark T1...
>>
>> Pure, crazy theory here...
>>
>> However, with David's work on this bug (8231289), this theoretical
>> problem goes away... That's the only reason for trying to close
>> this 8033399 sub-thread here...
>
> My work here makes no difference to 8033399 perceived problem. The 
> same ParkEvent continues to be used by both JvmtiRawMonitor and 
> ObjectMonitor.

I confused myself into thinking that you either created a
new ParkEvent or didn't use the existing ParkEvent anymore.
My mistake.

Dan

>
> Cheers,
> David
>
>> Dan
>>
>>
>>>
>>> Agreed.
>>> Just wanted to point out it can be related.
>>> Dan filed this RFE and can have more knowledge.
>>>
>>>> Meanwhile what to do about broken deadlock detection ... :(
>>>
>>> It is a good catch from Dan.
>>>
>>> Thanks,
>>> Serguei
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Thanks,
>>>>> Serguei
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> This also probably means that you can have a pending raw monitor 
>>>>>> at the same time as you have a "Blocker" as I'm pretty sure there 
>>>>>> are various JVM TI event handlers that may execute between the 
>>>>>> Blocker being set and the actual park. So that would be an 
>>>>>> additional breakage in the existing code.
>>>>>>
>>>>>> Back to my code and I have two problems. The second, which is 
>>>>>> easy to address, is the deadlock printing code. I'll hoist the 
>>>>>> waitingToLockRawMonitor chunk to the top so it is executed 
>>>>>> independent of the waitingToLockMonitor value (which remains in 
>>>>>> an if/else relationship with the waitingToLockBlocker). But now 
>>>>>> that we might print two "records" at a time I have to make 
>>>>>> additional changes to get meaningful output for the current 
>>>>>> thread (which is handled as a common code after the if/else block 
>>>>>> to finish whichever record was being printed). Also I can no 
>>>>>> longer use "continue" as the 3 outcomes are not mutually 
>>>>>> exclusive - so this could get a bit messy. :(
>>>>>>
>>>>>> So definitely a v2 webrev on the way.
>>>>>>
>>>>>> But before that I need to solve my first problem - and I don't 
>>>>>> know how. Now that it is apparent that a thread can be blocked on 
>>>>>> both a raw monitor and an ObjectMonitor at the same time, I have 
>>>>>> no idea how to actually account for this in the deadlock 
>>>>>> detection code. That code has a while loop that expects to at 
>>>>>> most find either a locked ObjectMonitor or j.u.c Blocker, and it 
>>>>>> adds the owner thread to the cycle detection, then moves on. But 
>>>>>> now I can have two different owner threads in the same loop 
>>>>>> iteration. I don't know how to account for that.
>>>>>>
>>>>>> Given that it seems to me that the current code is already broken 
>>>>>> if we encounter these conditions, then perhaps all I can do is 
>>>>>> handle the other cases, where the blocking reasons are mutually 
>>>>>> exclusive, and not try to fix things? i.e. leave lines #434 to 
>>>>>> #440 as they are in webrev v1 - which implies no change to line 
>>>>>> #398 except the comment ... ??
>>>>>>
>>>>>>> test/hotspot/jtreg/vmTestbase/nsk/jvmti/RawMonitorWait/rawmnwait005/rawmnwait005.cpp 
>>>>>>>
>>>>>>>      No comments.
>>>>>>>
>>>>>>>
>>>>>>> Thumbs up! The only non-nit I have is the setting of 
>>>>>>> waitingToLockRawMonitor
>>>>>>> on L400 and the corresponding comment on L399. Everything else 
>>>>>>> is a nit.
>>>>>>>
>>>>>>> I don't need to see a new webrev.
>>>>>>
>>>>>> If only that were true :(
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> Thanks for tackling this disentangle issue!
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> The earlier attempt to rewrite JvmtiRawMonitor as a simple 
>>>>>>>> wrapper around PlatformMonitor proved not so simple and 
>>>>>>>> ultimately had too many issues due to the need to support 
>>>>>>>> Thread.interrupt.
>>>>>>>>
>>>>>>>> I'd previously stated in the bug report:
>>>>>>>>
>>>>>>>> "In the worst-case I suppose we could just copy ObjectMonitor 
>>>>>>>> to a new class and have JvmtiRawMonitor continue to extend that 
>>>>>>>> (with some additional minor adjustments) - or even just inline 
>>>>>>>> it all as needed."
>>>>>>>>
>>>>>>>> but hadn't looked at it in detail. Richard Reingruber did look 
>>>>>>>> at it and pointed out that it is actually quite simple - we 
>>>>>>>> barely use any actual code from ObjectMonitor, mainly just the 
>>>>>>>> state. So thanks Richard! :)
>>>>>>>>
>>>>>>>> So this change basically copies or moves anything needed by 
>>>>>>>> JvmtiRawMonitor from ObjectMonitor, breaking the connection 
>>>>>>>> between the two. We also copy and simplify ObjectWaiter, 
>>>>>>>> turning it into a QNode internal class. There is then a lot of 
>>>>>>>> cleanup that was applied (and a lot more that could still be 
>>>>>>>> done):
>>>>>>>>
>>>>>>>> - Removed the never implemented/used PROPER_TRANSITIONS ifdefs
>>>>>>>> - Fixed the disconnect between the types of non-JavaThreads 
>>>>>>>> expected by the upper layer code and lower layer code
>>>>>>>> - cleaned up and simplified return codes
>>>>>>>> - consolidated code that is identical for JavaThreads and 
>>>>>>>> non-JavaThreads (e.g. notify/notifyAll).
>>>>>>>> - removed used of TRAPS/THREAD where not appropriate and 
>>>>>>>> replaced with "Thread * Self" in the style of the rest of the code
>>>>>>>> - changed recursions to be int rather than intptr_t (a "fixme" 
>>>>>>>> in the ObjectMonitor code)
>>>>>>>>
>>>>>>>>
>>>>>>>> I have not changed the many style flaws with this code:
>>>>>>>> - Capitalized names
>>>>>>>> - extra spaces before ;
>>>>>>>> - ...
>>>>>>>>
>>>>>>>> but could do so if needed. I wanted to try and keep it more 
>>>>>>>> obvious that the fundamental functional code is actually 
>>>>>>>> unmodified.
>>>>>>>>
>>>>>>>> There is one aspect that requires further explanation: the 
>>>>>>>> notion of current pending monitor. The "current pending 
>>>>>>>> monitor" is stored in the Thread and used by a number of 
>>>>>>>> introspection APIs for things like finding monitors, doing 
>>>>>>>> deadlock detection, etc. The JvmtiRawMonitor code would also 
>>>>>>>> set/clear itself as "current pending monitor". Most uses of the 
>>>>>>>> current pending monitor actually, explicitly or implicitly, 
>>>>>>>> ignore the case when the monitor is a JvmtiRawMonitor (observed 
>>>>>>>> by the fact the mon->object() query returns NULL). The 
>>>>>>>> exception to that is deadlock detection where raw monitors are 
>>>>>>>> at least partially accounted for. To preserve that I added the 
>>>>>>>> notion of "current pending raw monitor" and updated the 
>>>>>>>> deadlock detection code to use that.
>>>>>>>>
>>>>>>>> The test:
>>>>>>>>
>>>>>>>>
>>>>>>>> test/hotspot/jtreg/vmTestbase/nsk/jvmti/RawMonitorWait/rawmnwait005/rawmnwait005.cpp 
>>>>>>>>
>>>>>>>>
>>>>>>>> was updated because I'd noticed previously that it was the only 
>>>>>>>> test that used interrupt with raw monitors, but was in fact 
>>>>>>>> broken: the test thread is a daemon thread so the main thread 
>>>>>>>> could terminate the VM immediately after the interrupt() call, 
>>>>>>>> thus you would never know if the interruption actually worked 
>>>>>>>> as expected.
>>>>>>>>
>>>>>>>> Testing:
>>>>>>>>  - tiers 1 - 3
>>>>>>>>  - vmTestbase/nsk/monitoring/  (for deadlock detection**)
>>>>>>>>  - vmTestbase/nsk/jdwp
>>>>>>>>  - vmTestbase/nsk/jdb/
>>>>>>>>  - vmTestbase/nsk/jdi/
>>>>>>>>  - vmTestbase/nsk/jvmti/
>>>>>>>>  - serviceability/jvmti/
>>>>>>>>  - serviceability/jdwp
>>>>>>>>  - JDK: java/lang/management
>>>>>>>>
>>>>>>>> ** There are no existing deadlock related tests involving 
>>>>>>>> JvmtiRawMonitor. It would be interesting/useful to add them to 
>>>>>>>> the existing nsk/monitoring tests that cover synchronized and 
>>>>>>>> JNI locking. But it's a non-trivial enhancement that I don't 
>>>>>>>> really have time to do.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>
>>>>>
>>>
>>