RFR: 8338708: Don't create/destroy debug agent cmdQueueLock for each connection.

Fri Aug 30 20:43:18 UTC 2024

On Fri, 30 Aug 2024 17:09:58 GMT, Leonid Mesnik <lmesnik at openjdk.org> wrote:

>> Only allocate the cmdQueueLock once rather than allocate each time there is a new connection (and destroy when the connection is terminated).
>> 
>> Currently each time there is a new debug agent connection established, the cmdQueueLock is allocated. It is then destroyed when the connection is dropped, only to be recreated on the next connection. I looked closely at the use of this lock, and what happens when the connection is dropped, and I can't see any reason to create and then dispose of cmdQueueLock for each connection. Of the 18 debug agent raw monitors, this is the only one that gets destroyed and recreated. The other 17 stick around for the next connection. I'd like to get rid of this reallocation in order to support static initialization of all the debug agent raw monitors on startup. This will be done as part of the ranked monitor support ([JDK-8328866](https://bugs.openjdk.org/browse/JDK-8328866)).
>> 
>> As part of the testing for this change, I put an assert in place when we try to create cmdQueueLock for the 2nd time. I ran all our debugger tests, and unfortunately none of them failed. This means we don't currently test attaching to the debug agent more than once in any of our tests, so I did two things to better test out this change. The first was to write a test that attaches numerous times in sequence, and the 2nd was to manually attach and reattach using jdb. At the same time I stepped through the debug agent code in gdb just to help better understand the setup and use of cmdQueueLock. I didn't see anything that gave me concern about making this change.
>> 
>> Tested by running all jdi, jdwp, and jdb tests on all supported platforms.
>
> src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c line 90:
> 
>> 88:     /* We may be starting a new connection after an error */
>> 89:     cmdQueue = NULL;
>> 90:     if (cmdQueueLock == NULL) {
> 
> Doesn't it makes sense to check that lock is not locked here?
> Is there possibility that lock is left locked for disconnected process or any other similar timeout. Probably it is not covered by our tests.

The normal status of the cmdQueueLock is for the Listener thread to be waiting on it. When the Reader thread gets a packet and queues it up, it then does a notify. Any holding of the lock is short lived. I don't see how it could still be locked at this point, and if it was, it should be quickly released. I typed up the following notes for myself a while back when figuring this all out. They may be of some help to you in understanding whe this should not be a problem.

=========================================

The debug agent has two threads involved in dealing with the connection to the debugger. The first is the JDWP Transport Listener thread. It is initially created to listen for the debugger's connection request. Once the request comes in the connection is setup and the Listener thread spawns off the JDWP Command Reader thread. The Reader thread is responsible for reading in command packets and queuing them up for the Listener thread to process.

The Reader thread is usually waiting for the next JDWP command, which it will enqueue when it arrives:

    while (shouldListen) {
        jint rc;

        rc = transport_receivePacket(&packet);
        ...
        enqueue(&packet);  <-- does a notify on cmdQueueLock
        ...
    }

The Listener thread sits in a loop dequeuing commands until it gets a VM.Dispose command, at which point it will exit the loop, destroy the cmdQueueLock, and then exit the thread. Usually while dequeuing it is blocked here waiting for the next command:

    debugMonitorEnter(cmdQueueLock);
    while (!transportError && (cmdQueue == NULL)) {
        debugMonitorWait(cmdQueueLock);
    }
    debugMonitorExit(cmdQueueLock);

It will process enqueued packets in this manner until it gets a VM.Dispose command.

The cmdQueueLock is created by the Listener thread just before spawning the Reader thread. It is destroyed after it exits as described above. If VM.Dispose is being used, things are pretty orderly, but we have to also consider abnormal connection termination where no VM.Dispose is done. In that case the the above transport_receivePacket() call will fail and the Reader thread calls the following code:

    debugMonitorEnter(cmdQueueLock);
    transportError = JNI_TRUE;
    debugMonitorNotify(cmdQueueLock);
    debugMonitorExit(cmdQueueLock);

The notify wakes up the Listener thread, which is in the debugMonitorWait() trying to dequeue the next command. The dequeue fails, which results in the Listener thread shutting down the connection. It will destroy the cmdQueueLock just like it does in the orderly shutdown.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20755#discussion_r1739396992