RFR: 8328866: Add raw monitor rank support to the debug agent.

Chris Plummer cjplummer at openjdk.org
Thu May 2 21:29:56 UTC 2024


On Thu, 2 May 2024 02:40:36 GMT, Alex Menkov <amenkov at openjdk.org> wrote:

>> This PR adds ranked monitor support to the debug agent. The debug agent has a large number of monitors, and it's really hard to know which order to grab them in, and for that matter which monitors might already be held at any given moment. By imposing a rank on each monitor, we can check to make sure they are always grabbed in the order of their rank. Having this in place when I was working on [JDK-8324868](https://bugs.openjdk.org/browse/JDK-8324868) would have made it much easier to detect a deadlock that was occuring, and the reason for it. That's what motivated me to do this work
>> 
>> There were 2 or 3 minor rank issues discovered as a result of these changes. I also learned a lot about some of the more ugly details of the locking support in the process.
>> 
>> Tested with the following on all supported platforms and with virtual threads:
>> 
>> com/sun/jdi
>> vmTestbase/nsk/jdi
>> vmTestbase/nsk/jdb
>> vmTestbase/nsk/jdwp 
>> 
>> Still need to run tier2 and tier5.
>> 
>> Details of the changes follow in the first comment.
>
> src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 656:
> 
>> 654:     commonRef_lock();
>> 655:     if (gdata->rankedMonitors) {
>> 656:         dbgRawMonitor_lock();
> 
> What is this call for? I think dbgRawMonitor_lock/dbgRawMonitor_unlock should not be used outside of util.c (I'd remove them from util.h)

After calling getLocks(), there may still be attempts to enter locks. The locks should already be held. I filed [JDK-8330193](https://bugs.openjdk.org/browse/JDK-8330193) to assert that. However, even when entering an already entered lock, we still need to grab dbgRawMonitor. I found that out the hard way when there were deadlocks on dbgRawMonitor because it was not entered it here.

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1204:
> 
>> 1202:     // Need to lock during initialization so verifyMonitorRank() can be guaranteed that
>> 1203:     // if the monitor field is set, then the monitor if fully initialized. More specifically,
>> 1204:     // that the rank field has been set.
> 
> rank for all monitors can be set in `util_initialize()`:
> 
> for (i = 0; i < NUM_DEBUG_RAW_MONITORS; i++) {
>   dbg_monitors[i]->rank = i;
> }

Ok.

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1231:
> 
>> 1229:     }
>> 1230: 
>> 1231:     dbg_monitor->monitor = NULL;
> 
> I think it would be better to protect this with dbgRawMonitor

I don't see how that helps. Access to the field is not protected.

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1243:
> 
>> 1241:     error = JVMTI_FUNC_PTR(gdata->jvmti,GetThreadInfo)
>> 1242:             (gdata->jvmti, thread, &info);
>> 1243:     return info.name;
> 
> Need to delete JNI reference info.thread_group and info.jthreadGroup (if not NULL)

Did you mean info.context_class_loader, not info.jthreadGroup?

It looks like elsewhere when we call GetThreadInfo it is bracketed with the WITH_LOCAL_REFS macro to automatically push/pop a local refs frame, although it is only setting the size to 1, which seems like a bug. I'll do the same here.

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1262:
> 
>> 1260:     for (i = 0; i < NUM_DEBUG_RAW_MONITORS; i++) {
>> 1261:         DebugRawMonitor* dbg_monitor = &dbg_monitors[i];
>> 1262:         if (dbg_monitor == NULL) {
> 
> Should be `dbg_monitor->monitor`

ok

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1279:
> 
>> 1277:         return; // This assert is not reliable if the VM is exiting
>> 1278:     }
>> 1279:     jthread current_thread = threadControl_currentThread();
> 
> Looks like all callers of the `assertIsCurrentThread()` have reference to the current thread. Maybe pass it as argument?

Ok. There were a few changes w.r.t. when this API is called, as was when the callers have access to the current thread, so I that's why I ended up not passing it in.

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1281:
> 
>> 1279:     jthread current_thread = threadControl_currentThread();
>> 1280:     if (!isSameObject(env, thread, current_thread)) {
>> 1281:         tty_message("ERROR: Threads not the same: %p %p\n", thread, current_thread);
> 
> Not sure the addresses provide useful information. Maybe print thread names?

ok

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1289:
> 
>> 1287: 
>> 1288: static void
>> 1289: verifyMonitorRank(JNIEnv *env, DebugRawMonitorRank rank, jthread thread)
> 
> please add a comment that the function should be called under dbgRawMonitor lock (also for other functions which assume this)

ok. I was also thinking of setting a static that contains the jthread of the thread that holds the lock, and we can assert it is held.

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1294:
> 
>> 1292:     // has a higher rank than the monitor we are about to enter.
>> 1293:     DebugRawMonitorRank i;
>> 1294:     for (i = 0; i < NUM_DEBUG_RAW_MONITORS; i++) {
> 
> No need to check monitors from the beginning.
> Should be enough to check starting from `min(rank, FIRST_LEAF_DEBUG_RAW_MONITOR)`

I originally started at `rank`, but then when I added the 2nd of the two checks below I switched to 0, but your min suggestion should be optimal.

> src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1470:
> 
>> 1468: 
>> 1469:     // Assert that the current thread owns this monitor.
>> 1470:     assertIsCurrentThread(getEnv(), dbg_monitor->ownerThread);
> 
> reading ownerThread (and other) fields without dbgRawMonitor is not MT-safe (it's not atomic)

It is safe if you are the owner of the monitor.  So what that means is that this assert could potentially crash, but only if you are not the owner, which means if it didn't crash then it would have asserted. However, even when asserts are disabled, isSameObject() is still executed, and that's were the crash would happen. assertIsCurrentThread() really should be a no-op if asserts are not enabled. But even if that change is not made, it still would only crash if something bad was going on since the current thread should always own the monitor.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588360985
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588367723
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588371240
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588390369
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588391447
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588394746
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588397148
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588400113
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588407481
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588424198


More information about the serviceability-dev mailing list