RFR: 8269268: JDWP: Properly fix thread lookup assert in findThread()

Chris Plummer cjplummer at openjdk.java.net
Thu Jun 24 16:29:27 UTC 2021


On Thu, 24 Jun 2021 05:17:02 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

> Re-enable the assert that was disabled (with some overhead) by [JDK-8265683](https://bugs.openjdk.java.net/browse/JDK-8265683). Explanation is in the CR and also in comments included with the changes.
> 
> I tested by running `vmTestbase/nsk/jdb/suspend/suspend001/suspend001.java` and `vmTestbase/nsk/jdb/wherei/wherei001/wherei001.java` 100's of times, and did not see any failures. I also verified the original issue was still reproducible by temporarily not setting `gdata->handlingVMDeath = JNI_TRUE`, which did trigger the assert as expected.

There are two threads to consider here. The one that `VM_DEATH` was received on and the thread that reads JDWP commands from the debugger. For the latter, a `VM.Resume` command has been received. The debugger does this as the test wraps up, allowing all debuggee threads to resume and the debugeee to exit. `VM.resume()` calls `threadControl_resumeAll()`, which calls `commonResumeList()`, which walks the `runningThreads` list to build an array of jthreads that need to be resumed. This array is passed to JVMTI `ResumeThreadList()`. Any time after this point a `VM_DEATH` can be triggered since the debuggee main thread calls `System.exit()` and does not wait for all the threads it creates to exit first (no join is done). The `VM_DEATH` event leads to clearing all the JVMTI callbacks, so the debug agent is no longer notified of `THREAD_END` events as the other threads created by the debuggee exit. This leads to threads still being on `runningThreads`, even though they have exited and had t
 heir TLS cleared by JVMTI. Keep in mind we are still in `commonResumeList()` when all this is happening. After calling JVMTI `ResumeThreadList()`, it then walks that same array of jthreads that were resumed, doing a `findThread()` on each. This is done mainly so it can update the `suspendCount` the `ThreadNodes`. This is when the assert was getting triggered. Some of the debuggee threads had already exited and had their TLS cleared, but were still in `runningThreads` since the `THREAD_END` was never received. The fix is to not assert (and instead look up the thread in `runningThreads`) if we are dealing with a `VM_DEATH` event.

You mentioned `gdata->handlingVMDeath` should be volatile. I will fix that. I also think I will rename it to `gdata->jvmtiCallBacksCleared`. There is no need to ever set it back false, so I will remove that line of code. Once true, we know the vm is exiting anyway, and the callbacks will never be setup again. Also, I'm going to set it true before actually clearing the callbacks rather than after. Doing it early is fine. In fact if it was always true, `findThread()` would still work, but just be slower.

As far as any other synchronizing or memory barriers being needed, I'm a bit unclear on that part. The only thing that matters is that `gdata->jvmtiCallBacksCleared` being set true is visible to other threads before the callbacks get cleared. It seems the call to `SetEventCallbacks()` should ensure this.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4580


More information about the serviceability-dev mailing list