RFR: 8366659: ObjectMonitor::wait() can deadlock with a suspension request [v6]
Daniel D. Daugherty
dcubed at openjdk.org
Wed Nov 12 20:09:31 UTC 2025
On Wed, 12 Nov 2025 13:29:07 GMT, Anton Artemov <aartemov at openjdk.org> wrote:
>> Hi, please consider the following changes:
>>
>> If suspension is allowed when a thread is re-entering an object monitor (OM), then a deadlock is possible. There are two places where it can happen:
>>
>> 1) The waiting thread is made to be a successor and is unparked. Upon a suspension request, the thread will suspend itself whilst clearing the successor. The OM will be left unlocked (not grabbed by any thread), while the other threads are parked until a thread grabs the OM and the exits it. The suspended thread is on the entry-list and can be selected as a successor again. None of other threads can be woken up to grab the OM until the suspended thread has been resumed and successfully releases the OM.
>>
>> 2) The race between suspension and retry: the thread could reacquire the OM and complete the wait() code in full, but then on return to Java it will be suspended while holding the OM.
>>
>> The issues are addressed by not allowing suspension in case 1, and by handling the suspension request at a later stage, after the thread has grabbed the OM in `reenter_internal()` in case 2. In case of a suspension request, the thread exits the OM and enters it again once resumed.
>>
>> The JVMTI `waited` event posting (2nd one) is postponed until the suspended thread is resumed and has entered the OM again. The `enter` to the OM (in case `ExitOnSuspend` did exit) is done without posting any events.
>>
>> Tests are added for both scenarios.
>>
>> Tested in tiers 1 - 7.
>
> Anton Artemov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits:
>
> - Merge remote-tracking branch 'origin/master' into JDK-8366659-OM-wait-suspend-deadlock
> - 8366659: Fixed lines in tests.
> - Merge remote-tracking branch 'origin/master' into JDK-8366659-OM-wait-suspend-deadlock
> - 8366659: Added a comment to a boolean arg for enter()
> - Merge remote-tracking branch 'origin/master' into JDK-8366659-OM-wait-suspend-deadlock
> - Merge remote-tracking branch 'origin/master' into JDK-8366659-OM-wait-suspend-deadlock
> - 8366659: Fixed new lines.
> - Merge remote-tracking branch 'origin/master' into JDK-8366659-OM-wait-suspend-deadlock
> - 8366659: Removed incorrect assert,
> - 8366659: Fixed merge conflict
> - ... and 10 more: https://git.openjdk.org/jdk/compare/400a83da...702880c6
I've done another crawl thru review. Thanks for making changes after the first
round of comments. In this crawl-thru, my deep focus was on the modified test.
test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java line 297:
> 295: }
> 296:
> 297: // Notify the resumer while holding the threadLock
Nit: please add a period at the end of this sentence.
test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java line 372:
> 370: // - a threadLock enter in the resumer thread
> 371: // - resumption of the waiter thread
> 372: // - a threadLock enter in the freshly resumed waiter thread
This list of step tests is identical to the list on L490 -> L493 and the
original llist on L256 -> L259.
This step comment:
`370: // - a threadLock enter in the resumer thread`
should be updated to something like:
// - a blocked threadLock enter in the resumer thread while the
// threadLock is held by the main thread.
This change of threadLock scope also requires this update from:
605: // - tries to grab the threadLock (should not block!)
to:
605: // - tries to grab the threadLock (should not block with doWork1!)
test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java line 388:
> 386: try {
> 387: Thread.sleep(1000);
> 388: } catch(Exception e) {}
Why is this 1 second delay needed?
test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java line 428:
> 426: // launch the waiter thread
> 427: synchronized (barrierLaunch) {
> 428: waiter = new SuspendWithObjectMonitorWaitWorker("waiter", 1);
This change to `wait` for `1` instead of `0` requires this comment to be updated from:
// TS_READY_TO_NOTIFY is set after the main thread has
// entered threadLock so a spurious wakeup can't get the
// waiter thread out of this threadLock.wait(0) call:
to:
// TS_READY_TO_NOTIFY is set after the main thread has
// entered threadLock so a spurious wakeup can't get the
// waiter thread out of this threadLock.wait(0) call in
// doWork1 or doWork2. doWork3 passes a one so that the
// wait() can terminate early and block on reentry.
I'm having trouble seeing why this third test case is necessary. We do a short `wait(1)`
in this test case instead of a `wait(0)` so we terminate the `wait(1)` with a timeout instead
of a `notify()` from the main thread.
In all worker test cases, the main thread grabs the threadsLock when the "waiter" thread
calls `wait()`, the main thread does a `notify()`, the main thread waits until the worker
thread contends on threadsLock and finally the main thread suspends the worker thread.
The only thing that I see that the `wait(1)` brings to the party is that the worker3 thread
might get to re-entry block on threadsLock via a timeout instead of a notify.
What am I missing here?
test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java line 454:
> 452: try {
> 453: Thread.sleep(1000);
> 454: } catch(Exception e) {}
Why is this 1 second delay needed?
test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java line 493:
> 491: // - a threadLock enter in the resumer thread
> 492: // - resumption of the waiter thread
> 493: // - a threadLock enter in the freshly resumed waiter thread
This list of step tests is identical to the list on L369 -> L372 and the
original llist on L256 -> L259.
This step comment:
491: // - a threadLock enter in the resumer thread
should be updated to something like:
// - a blocked threadLock enter in the resumer thread while the
// threadLock is held by the main thread.
test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait.java line 509:
> 507: try {
> 508: Thread.sleep(1000);
> 509: } catch(Exception e) {}
Why is this 1 second delay needed?
-------------
Marked as reviewed by dcubed (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/27040#pullrequestreview-3454633483
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519187101
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519214534
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519476007
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519522596
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519194484
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519205810
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2519483650
More information about the serviceability-dev
mailing list