RFR: 8353496: SuspendResume1.java and SuspendResume2.java timeout after JDK-8319447

Alan Bateman alanb at openjdk.org
Wed May 14 05:49:50 UTC 2025


On Mon, 12 May 2025 23:19:58 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

> The tests `SuspendResume1`, `SuspendResume2` and `SuspendResumeAll` are intermittently failed with a timeout (deadlock). The tests run with `-Djdk.virtualThreadScheduler.maxPoolSize=1` so there is only one carrier. The short sleep in `TestedThread.run` isn't sufficient to make progress. This will happen if tasks pushed by the delayed scheduler are executing before the tasks for the newly started virtual thread. FJP won't search other submission queues until the queue it keeps going back to is empty or there is contention. These deadlocks can be made better reproducible if the sleep in `TestedThread.run` is made minimal (1 millisecond).
> The fix is to increase the sleep to 50 milliseconds and also to decrease the busy part of the busy loop.
> 
> Testing:
> - Mach5 test runs of the fixed tests

Marked as reviewed by alanb (Reviewer).

I see Fei Yang's comment confirming that this fixes the timeouts in their environment, that is useful to know.

Main lesson here is that the virtual thread is not fair. A virtual thread doing short sleep, sleep(1) in one case here, may be continued and execute before other virtual threads that are queued to continue.

-------------

PR Review: https://git.openjdk.org/jdk/pull/25194#pullrequestreview-2838801041
PR Comment: https://git.openjdk.org/jdk/pull/25194#issuecomment-2878748617


More information about the serviceability-dev mailing list