RFR: 8305209: JDWP exit error AGENT_ERROR_INVALID_THREAD(203): missing entry in running thread table [v3]
Serguei Spitsyn
sspitsyn at openjdk.org
Thu Apr 6 06:34:15 UTC 2023
On Thu, 6 Apr 2023 03:35:17 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:
>> The real purpose of this PR is to add virtual thread support to ThreadMemoryLeakTest.java, but this exposed bugs in both the debug agent and in TestScaffold, so those are being fixed also (and the debug agent bug is the CR being used).
>>
>> The debug agent bug is due to a race condition during VM exit. The VM is in the process of shutting down. The debug agent has already disabled JVMTI callbacks and has sent the VMDeathEvent. At this point in time there are also threads exiting that the debug agent knows about, but it will not get a ThreadEndEvent for because of the callbacks being disabled. Thus these threads remain in the debug agent's list of known threads, even though they have exited. The debuggee receives the VMDeathEvent and does a VM.resume(). During the debug agent's handing of the VM.Resume command, it iterates over all known threads and needs to map each to its ThreadNode so it can be resumed, and this mapping requires accessing the JVMTI TLS for the thread. The problem is some of the threads may have exited already, and therefore no longer have TLS. This results in the assert in the debug agent. This debug agent issue was already addressed for platform threads, but not for virtual threads, which is why w
e started seeing this issue when this test was modified. The fix is to just replicate what is done for platform threads for virtual threads also.
>>
>> The TestScaffold bug is that if the debuggee crashes/asserts, this is likely to go unnoticed, especially if it happens during VM exit (and the test essentially has already completed). Because of this TestScaffold bug, the debug agent bug above did not result in a test failure. After fixing TestScaffold to check the exitCode of the debuggee process, the test started to appropriately fail until the debug agent was fixed.
>>
>> One other thing to point out is the OOME issue I started getting frequently when testing with virtual threads. Since virtual threads are created at a much higher rate than platform threads, their creation started to overwhelm the debugger (actually the JDI implementation). There is already a mechanism in place to do a VM.HoldEvents if JDI has queue up 10,000 events. The problem is that events are coming in so fast that even after doing the VM.HoldEvents, the number of queued events continues to go up for a while, and sometimes reaches 30,000 or more. This raises the peak memory usage of the test quite a bit. Since the test purposely uses a small heap so a memory leak is quickly and reliably detected, the large queue often results in an OOME. Because of this I make virtual threads sleep for 100ms instead of 50ms to slow down their creation, and this resolved the issue.
>>
>> I tested by running all of test/jdk/com/sun/jdi 25 times on each platform with and without virtual thread testing enabled.
>
> Chris Plummer has updated the pull request incrementally with one additional commit since the last revision:
>
> TestScaffold now waits indefinitely for process exit. Simpler coding of sleep time.
Marked as reviewed by sspitsyn (Reviewer).
-------------
PR Review: https://git.openjdk.org/jdk/pull/13246#pullrequestreview-1374133992
More information about the serviceability-dev
mailing list