RFR: 8231289: Disentangle JvmtiRawMonitor from ObjectMonitor and clean it up

Thu Oct 3 22:33:18 UTC 2019

On 4/10/2019 3:15 am, serguei.spitsyn at oracle.com wrote:
> On 10/3/19 03:13, David Holmes wrote:
>> Hi Dan,
>>
>> On 3/10/2019 3:20 am, Daniel D. Daugherty wrote:
>>> Sorry for the delay in reviewing this one... I've been playing 
>>> whack-a-mole
>>> with Robbin's MoCrazy test and my AsyncMonitorDeflation bits...
>>
>> No problem - your contribution made the wait worthwhile :)
>>
>>> On 9/24/19 1:09 AM, David Holmes wrote:
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231289
>>>> webrev: http://cr.openjdk.java.net/~dholmes/8231289/webrev/
>>>
>>> src/hotspot/share/prims/jvmtiEnv.cpp
>>>      Thanks for removing the PROPER_TRANSITIONS stuff. That was old
>>>      and crufty stuff.
>>>
>>> src/hotspot/share/prims/jvmtiEnvBase.cpp
>>>      No comments.
>>>
>>> src/hotspot/share/prims/jvmtiRawMonitor.cpp
>>>      L39:   new (ResourceObj::C_HEAP, mtInternal) 
>>> GrowableArray<JvmtiRawMonitor*>(1,true);
>>>          nit - need a space between ',' and 'true'.
>>>
>>>          Update: leave for your follow-up bug.
>>
>> Fixed now so I don't forget later. :)
>>
>>> src/hotspot/share/prims/jvmtiRawMonitor.hpp
>>>      No comments.
>>>
>>> src/hotspot/share/runtime/objectMonitor.hpp
>>>      Glad I added those 'protected for JvmtiRawMonitor' in one
>>>      of my recent cleanup bugs. Obviously I'll have to merge
>>>      with Async Monitor Deflation. :-)
>>>
>>> src/hotspot/share/runtime/thread.cpp
>>>      No comments.
>>>
>>> src/hotspot/share/runtime/thread.hpp
>>>      No comments.
>>>
>>> src/hotspot/share/services/threadService.cpp
>>>      L397:     waitingToLockMonitor = jt->current_pending_monitor();
>>>      L398:     if (waitingToLockMonitor == NULL) {
>>>      L399:       // we can only be blocked on a raw monitor if not 
>>> blocked on an ObjectMonitor
>>>      L400:       waitingToLockRawMonitor = 
>>> jt->current_pending_raw_monitor();
>>>      L401:     }
>>>
>>>          JVM/TI has this event handler:
>>>
>>>            typedef void (JNICALL *jvmtiEventMonitorContendedEnter)
>>>                (jvmtiEnv *jvmti_env,
>>>                 JNIEnv* jni_env,
>>>                 jthread thread,
>>>                 jobject object);
>>>
>>>          This event handler is called after 
>>> set_current_pending_monitor()
>>>          and if the event handler uses a RawMonitor, then it possible 
>>> for
>>>          for the thread to show up as blocked on both a Java monitor and
>>>          a JVM/TI RawMonitor.
>>
>> Oh that is interesting - good catch! So that means the current code is 
>> broken because the raw monitor will replace the ObjectMonitor as the 
>> pending monitor and then set it back to NULL, thus losing the fact the 
>> thread is actually pending on the ObjectMonitor. And of course while 
>> the pending monitor is the raw monitor that totally messes up the 
>> deadlock detection as the ObjectMonitor is missing from consideration. :(
> 
> If I remember correctly it is a scenario where this issue also comes to 
> the picture:
> https://bugs.openjdk.java.net/browse/JDK-8033399
> 
> I do not really understand how shared ParkEvent can be used/consumed by 
> both ObjectMonitor and RawMonitor on the same thread.
> But we observed and investigated this problem several years ago and Dan 
> finally filed this enhancement.

I still don't see how this is possible as you are not actually enqueued 
on the ObjectMonitor when the call out to the event callback is made. 
but that is a topic for another email thread. :)

Meanwhile what to do about broken deadlock detection ... :(

Thanks,
David

> Thanks,
> Serguei
> 
> 
> 
> 
>> This also probably means that you can have a pending raw monitor at 
>> the same time as you have a "Blocker" as I'm pretty sure there are 
>> various JVM TI event handlers that may execute between the Blocker 
>> being set and the actual park. So that would be an additional breakage 
>> in the existing code.
>>
>> Back to my code and I have two problems. The second, which is easy to 
>> address, is the deadlock printing code. I'll hoist the 
>> waitingToLockRawMonitor chunk to the top so it is executed independent 
>> of the waitingToLockMonitor value (which remains in an if/else 
>> relationship with the waitingToLockBlocker). But now that we might 
>> print two "records" at a time I have to make additional changes to get 
>> meaningful output for the current thread (which is handled as a common 
>> code after the if/else block to finish whichever record was being 
>> printed). Also I can no longer use "continue" as the 3 outcomes are 
>> not mutually exclusive - so this could get a bit messy. :(
>>
>> So definitely a v2 webrev on the way.
>>
>> But before that I need to solve my first problem - and I don't know 
>> how. Now that it is apparent that a thread can be blocked on both a 
>> raw monitor and an ObjectMonitor at the same time, I have no idea how 
>> to actually account for this in the deadlock detection code. That code 
>> has a while loop that expects to at most find either a locked 
>> ObjectMonitor or j.u.c Blocker, and it adds the owner thread to the 
>> cycle detection, then moves on. But now I can have two different owner 
>> threads in the same loop iteration. I don't know how to account for that.
>>
>> Given that it seems to me that the current code is already broken if 
>> we encounter these conditions, then perhaps all I can do is handle the 
>> other cases, where the blocking reasons are mutually exclusive, and 
>> not try to fix things? i.e. leave lines #434 to #440 as they are in 
>> webrev v1 - which implies no change to line #398 except the comment 
>> ... ??
>>
>>> test/hotspot/jtreg/vmTestbase/nsk/jvmti/RawMonitorWait/rawmnwait005/rawmnwait005.cpp 
>>>
>>>      No comments.
>>>
>>>
>>> Thumbs up! The only non-nit I have is the setting of 
>>> waitingToLockRawMonitor
>>> on L400 and the corresponding comment on L399. Everything else is a nit.
>>>
>>> I don't need to see a new webrev.
>>
>> If only that were true :(
>>
>> Thanks,
>> David
>>
>>> Thanks for tackling this disentangle issue!
>>>
>>> Dan
>>>
>>>
>>>>
>>>> The earlier attempt to rewrite JvmtiRawMonitor as a simple wrapper 
>>>> around PlatformMonitor proved not so simple and ultimately had too 
>>>> many issues due to the need to support Thread.interrupt.
>>>>
>>>> I'd previously stated in the bug report:
>>>>
>>>> "In the worst-case I suppose we could just copy ObjectMonitor to a 
>>>> new class and have JvmtiRawMonitor continue to extend that (with 
>>>> some additional minor adjustments) - or even just inline it all as 
>>>> needed."
>>>>
>>>> but hadn't looked at it in detail. Richard Reingruber did look at it 
>>>> and pointed out that it is actually quite simple - we barely use any 
>>>> actual code from ObjectMonitor, mainly just the state. So thanks 
>>>> Richard! :)
>>>>
>>>> So this change basically copies or moves anything needed by 
>>>> JvmtiRawMonitor from ObjectMonitor, breaking the connection between 
>>>> the two. We also copy and simplify ObjectWaiter, turning it into a 
>>>> QNode internal class. There is then a lot of cleanup that was 
>>>> applied (and a lot more that could still be done):
>>>>
>>>> - Removed the never implemented/used PROPER_TRANSITIONS ifdefs
>>>> - Fixed the disconnect between the types of non-JavaThreads expected 
>>>> by the upper layer code and lower layer code
>>>> - cleaned up and simplified return codes
>>>> - consolidated code that is identical for JavaThreads and 
>>>> non-JavaThreads (e.g. notify/notifyAll).
>>>> - removed used of TRAPS/THREAD where not appropriate and replaced 
>>>> with "Thread * Self" in the style of the rest of the code
>>>> - changed recursions to be int rather than intptr_t (a "fixme" in 
>>>> the ObjectMonitor code)
>>>>
>>>>
>>>> I have not changed the many style flaws with this code:
>>>> - Capitalized names
>>>> - extra spaces before ;
>>>> - ...
>>>>
>>>> but could do so if needed. I wanted to try and keep it more obvious 
>>>> that the fundamental functional code is actually unmodified.
>>>>
>>>> There is one aspect that requires further explanation: the notion of 
>>>> current pending monitor. The "current pending monitor" is stored in 
>>>> the Thread and used by a number of introspection APIs for things 
>>>> like finding monitors, doing deadlock detection, etc. The 
>>>> JvmtiRawMonitor code would also set/clear itself as "current pending 
>>>> monitor". Most uses of the current pending monitor actually, 
>>>> explicitly or implicitly, ignore the case when the monitor is a 
>>>> JvmtiRawMonitor (observed by the fact the mon->object() query 
>>>> returns NULL). The exception to that is deadlock detection where raw 
>>>> monitors are at least partially accounted for. To preserve that I 
>>>> added the notion of "current pending raw monitor" and updated the 
>>>> deadlock detection code to use that.
>>>>
>>>> The test:
>>>>
>>>>
>>>> test/hotspot/jtreg/vmTestbase/nsk/jvmti/RawMonitorWait/rawmnwait005/rawmnwait005.cpp 
>>>>
>>>>
>>>> was updated because I'd noticed previously that it was the only test 
>>>> that used interrupt with raw monitors, but was in fact broken: the 
>>>> test thread is a daemon thread so the main thread could terminate 
>>>> the VM immediately after the interrupt() call, thus you would never 
>>>> know if the interruption actually worked as expected.
>>>>
>>>> Testing:
>>>>  - tiers 1 - 3
>>>>  - vmTestbase/nsk/monitoring/  (for deadlock detection**)
>>>>  - vmTestbase/nsk/jdwp
>>>>  - vmTestbase/nsk/jdb/
>>>>  - vmTestbase/nsk/jdi/
>>>>  - vmTestbase/nsk/jvmti/
>>>>  - serviceability/jvmti/
>>>>  - serviceability/jdwp
>>>>  - JDK: java/lang/management
>>>>
>>>> ** There are no existing deadlock related tests involving 
>>>> JvmtiRawMonitor. It would be interesting/useful to add them to the 
>>>> existing nsk/monitoring tests that cover synchronized and JNI 
>>>> locking. But it's a non-trivial enhancement that I don't really have 
>>>> time to do.
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>
>