RFR: 8282379: [LOOM] vmTestbase/nsk/jdi/ClassType/invokeMethod/invokemethod011 sometimes fails
Chris Plummer
cjplummer at openjdk.org
Fri Feb 3 21:48:37 UTC 2023
A number of tests that use JDI invokeMethod() support occasionally fail when run using virtual threads. The tests fail with:
# ERROR: TEST FAILED: wrong invocation:
# ERROR: invoking debuggee thread instance of java.lang.VirtualThread(name='invokemethod010tThr', id=272)
# ERROR: is not suspended again after the invocation
Although as explained later, this is a misleading message, and does not accurately reflect why the test is failing. More on that below.
The root cause of the failure is due to these tests using JDI invokeMethod support with the INVOKE_SINGLE_THREADED flag. This is not always going to work with virtual threads because the invoked method is doing a Thread.sleep(400), and that is at risk of blocking indefinitely if all other threads are blocked form making progress. Note that technically platform threads could fail in the same manner. However, the reason the failure only happens now with virtual threads is because the implementation of Thread.sleep() differs for virtual threads, and may require ownership of a monitor that sometimes is held by another thread.
Another issue is that the tests do not do a very good job of error handling when this happens, and give the misleading failure reason of the invoked thread not being suspended after the invoke completed. The reason it is not suspended is because the invoke has actually not completed. There was a timeout that the test did not properly note as the cause of the failure. The test (debugger side) spawns a thread to do the JDI invokeMethod with, and then waits for it with:
invThr.join(argHandler.getWaitTime()*60000);
This join() times out, but the test assumes once it returns the invoke is complete, even though the invoked thread is actually still in the middle of the invoke. So that is the reason debuggee invokemethod thread is not currently suspended. I've fixed this by having a test check if invThr is still alive after the join. If it is, then the test is made to fail at that point, rather than continuing on and checking the debuggee threads status. The failure then becomes:
nsk.share.TestFailure: TEST FAILED: invoke never completed
At that point a vm.resume() is done to allow the invoke to complete, and the test will exit with this failure.
As for avoiding the failure in the first place (the deadlock in the debuggee during the invoke), this is really a test bug for relying on INVOKE_SINGLE_THREADED and assuming that the invoked thread won't become deadlocked. Since there is a Thread.sleep() call in the invoked method, it can't make this assumption. From the ObjectReference.invoke() spec:
"By default, all threads in the target VM are resumed while the method is being invoked if they were previously suspended by an event or by VirtualMachine.suspend() or ThreadReference.suspend(). This is done to prevent the deadlocks that will occur if any of the threads own monitors that will be needed by the invoked method."
"The resumption of other threads during the invocation can be prevented by specifying the INVOKE_SINGLE_THREADED bit flag in the options argument; however, there is no protection against or recovery from the deadlocks described above, so this option should be used with great caution."
For platform threads, sleep() doesn't require any monitors, so these tests never ran into problems before. For virtual threads however there is some synchronization done, and potential reliance on other threads not being suspended. A way around this is to always use sleep(0), which will at least attempt to yield the thread. For platform threads an actual yield is likely. For a virtual thread it will not yield in this particular case because the virtual thread is pinned to the carrier thread due to the jvmti breakpoint callback that is currently in the call chain of the invoked thread. So for virtual threads this effectively the same as sitting in a spin loop with no yielding. This is ok. CPU wasting is not a concern.
-------------
Commit messages:
- Remove invokemethod011 from problem list.
- Fix invokemethod tests that can fail when using INVOKE_SINGLE_THREADED.
- Fix invokemethod tests that can fail when using INVOKE_SINGLE_THREADED.
Changes: https://git.openjdk.org/jdk/pull/12420/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12420&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8282379
Stats: 89 lines in 15 files changed: 66 ins; 2 del; 21 mod
Patch: https://git.openjdk.org/jdk/pull/12420.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/12420/head:pull/12420
PR: https://git.openjdk.org/jdk/pull/12420
More information about the serviceability-dev
mailing list