RFR: 8328866: Add raw monitor rank support to the debug agent.

Alex Menkov amenkov at openjdk.org
Thu May 2 19:41:56 UTC 2024


On Wed, 1 May 2024 21:32:46 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

> This PR adds ranked monitor support to the debug agent. The debug agent has a large number of monitors, and it's really hard to know which order to grab them in, and for that matter which monitors might already be held at any given moment. By imposing a rank on each monitor, we can check to make sure they are always grabbed in the order of their rank. Having this in place when I was working on [JDK-8324868](https://bugs.openjdk.org/browse/JDK-8324868) would have made it much easier to detect a deadlock that was occuring, and the reason for it. That's what motivated me to do this work
> 
> There were 2 or 3 minor rank issues discovered as a result of these changes. I also learned a lot about some of the more ugly details of the locking support in the process.
> 
> Tested with the following on all supported platforms and with virtual threads:
> 
> com/sun/jdi
> vmTestbase/nsk/jdi
> vmTestbase/nsk/jdb
> vmTestbase/nsk/jdwp 
> 
> Still need to run tier2 and tier5.
> 
> Details of the changes follow in the first comment.

src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 656:

> 654:     commonRef_lock();
> 655:     if (gdata->rankedMonitors) {
> 656:         dbgRawMonitor_lock();

What is this call for? I think dbgRawMonitor_lock/dbgRawMonitor_unlock should not be used outside of util.c (I'd remove them from util.h)

src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1204:

> 1202:     // Need to lock during initialization so verifyMonitorRank() can be guaranteed that
> 1203:     // if the monitor field is set, then the monitor if fully initialized. More specifically,
> 1204:     // that the rank field has been set.

rank for all monitors can be set in `util_initialize()`:

for (i = 0; i < NUM_DEBUG_RAW_MONITORS; i++) {
  dbg_monitors[i]->rank = i;
}

src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1231:

> 1229:     }
> 1230: 
> 1231:     dbg_monitor->monitor = NULL;

I think it would be better to protect this with dbgRawMonitor

src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1243:

> 1241:     error = JVMTI_FUNC_PTR(gdata->jvmti,GetThreadInfo)
> 1242:             (gdata->jvmti, thread, &info);
> 1243:     return info.name;

Need to delete JNI reference info.thread_group and info.jthreadGroup (if not NULL)

src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1262:

> 1260:     for (i = 0; i < NUM_DEBUG_RAW_MONITORS; i++) {
> 1261:         DebugRawMonitor* dbg_monitor = &dbg_monitors[i];
> 1262:         if (dbg_monitor == NULL) {

Should be `dbg_monitor->monitor`

src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1279:

> 1277:         return; // This assert is not reliable if the VM is exiting
> 1278:     }
> 1279:     jthread current_thread = threadControl_currentThread();

Looks like all callers of the `assertIsCurrentThread()` have reference to the current thread. Maybe pass it as argument?

src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1281:

> 1279:     jthread current_thread = threadControl_currentThread();
> 1280:     if (!isSameObject(env, thread, current_thread)) {
> 1281:         tty_message("ERROR: Threads not the same: %p %p\n", thread, current_thread);

Not sure the addresses provide useful information. Maybe print thread names?

src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1289:

> 1287: 
> 1288: static void
> 1289: verifyMonitorRank(JNIEnv *env, DebugRawMonitorRank rank, jthread thread)

please add a comment that the function should be called under dbgRawMonitor lock (also for other functions which assume this)

src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1294:

> 1292:     // has a higher rank than the monitor we are about to enter.
> 1293:     DebugRawMonitorRank i;
> 1294:     for (i = 0; i < NUM_DEBUG_RAW_MONITORS; i++) {

No need to check monitors from the beginning.
Should be enough to check starting from `min(rank, FIRST_LEAF_DEBUG_RAW_MONITOR)`

src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1470:

> 1468: 
> 1469:     // Assert that the current thread owns this monitor.
> 1470:     assertIsCurrentThread(getEnv(), dbg_monitor->ownerThread);

reading ownerThread (and other) fields without dbgRawMonitor is not MT-safe (it's not atomic)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1586999913
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588259933
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1586968454
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1586970362
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1586973059
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1586983415
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1586984623
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588206582
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1586976698
PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588252771


More information about the serviceability-dev mailing list