RFR: 8328866: Add raw monitor rank support to the debug agent.

Wed May 1 21:42:11 UTC 2024

On Wed, 1 May 2024 21:32:46 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

> This PR adds ranked monitor support to the debug agent. The debug agent has a large number of monitors, and it's really hard to know which order to grab them in, and for that matter which monitors might already be held at any given moment. By imposing a rank on each monitor, we can check to make sure they are always grabbed in the order of their rank. Having this in place when I was working on [JDK-8324868](https://bugs.openjdk.org/browse/JDK-8324868) would have made it much easier to detect a deadlock that was occuring, and the reason for it. That's what motivated me to do this work
> 
> There were 2 or 3 minor rank issues discovered as a result of these changes. I also learned a lot about some of the more ugly details of the locking support in the process.
> 
> Tested with the following on all supported platforms and with virtual threads:
> 
> com/sun/jdi
> vmTestbase/nsk/jdi
> vmTestbase/nsk/jdb
> vmTestbase/nsk/jdwp 
> 
> Still need to run tier2 and tier5.
> 
> Details of the changes follow in the first comment.

I was able to minimize changes for users of RawMonitor by keeping the same client wrapper APIs (eg. debugMonitorEnter), and for the most part passing in the same argument (the monitor to enter). The only change is that the client is passing a new type called DebugRawMonitor* instead of a jrawMonitorID. The DebugRawMonitor encapsulates the jrawMonitorID and contains other monitor info that needs to be tracked (like the rank and current owner).

Ranks were determined largely by trial and error by running tests and fixing the rank violations asserts as I hit them. In the enum that contains all the rankings, I explain the rational for each monitor's rank.

For the moment I have included a flag (rankedmonitors) to control whether or not ranked monitors should be used. This is just a safety net in case someone runs into an issue. The flag is not documented and will likely be removed at some point.

The ranked monitor supported ended up needing a RawMonitor itself (dbgRawMonitor_lock). This RawMonitor is therefore not part of the ranked monitor support. It could probably use a better name. Suggestions are welcomed.

There are a few issues I fixed while working on the ranked monitor support:

During debug agent initialization we need to call util_initialize() before commonRef_initialize() because the later creates a monitor, and that depends on util_initialize() having already been called.

getLocks() in threadControl.c wasn't grabbing locks in the right order. This could have lead to a deadlock, but seems the structure of how locks are used prevented one from happening.

In invoker_completeInvokeRequest() I had to add grabbing the stepLock in order to maintain proper lock order when getLocks() was eventually called (which also grabbed stepLock). Other than triggering a rank violation assert, this doesn't seem to have caused any real issue, but did leave the potential for a deadlock.

In handleInterrupt() I noticed that a local JNI ref was not being freed. I noticed because there were other places where I had added calls to threadControl_currentThread() that resulted in a noticeable JNI ref leak. When I fixed those, I fixed the one in handleInterrupt() too even tough it was not causing any problems.

The ranked monitor APIs needed some extra safeguards during VM exit that are not needed for the unranked monitor APIs. This is because the unranked APIs didn't have a problem with the current thread being NULL, but the ranked APIs require a valid current thread. That is why you see the following check:

    if (gdata->vmDead && current_thread == NULL) {
        return;
    }

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19044#issuecomment-2089176402