RFR: 8366659: ObjectMonitor::wait() can deadlock with a suspension request [v6]
Daniel D. Daugherty
dcubed at openjdk.org
Thu Nov 13 20:35:28 UTC 2025
On Wed, 12 Nov 2025 13:29:07 GMT, Anton Artemov <aartemov at openjdk.org> wrote:
>> Hi, please consider the following changes:
>>
>> If suspension is allowed when a thread is re-entering an object monitor (OM), then a deadlock is possible. There are two places where it can happen:
>>
>> 1) The waiting thread is made to be a successor and is unparked. Upon a suspension request, the thread will suspend itself whilst clearing the successor. The OM will be left unlocked (not grabbed by any thread), while the other threads are parked until a thread grabs the OM and the exits it. The suspended thread is on the entry-list and can be selected as a successor again. None of other threads can be woken up to grab the OM until the suspended thread has been resumed and successfully releases the OM.
>>
>> 2) The race between suspension and retry: the thread could reacquire the OM and complete the wait() code in full, but then on return to Java it will be suspended while holding the OM.
>>
>> The issues are addressed by not allowing suspension in case 1, and by handling the suspension request at a later stage, after the thread has grabbed the OM in `reenter_internal()` in case 2. In case of a suspension request, the thread exits the OM and enters it again once resumed.
>>
>> The JVMTI `waited` event posting (2nd one) is postponed until the suspended thread is resumed and has entered the OM again. The `enter` to the OM (in case `ExitOnSuspend` did exit) is done without posting any events.
>>
>> Tests are added for both scenarios.
>>
>> Tested in tiers 1 - 7.
>
> Anton Artemov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits:
>
> - Merge remote-tracking branch 'origin/master' into JDK-8366659-OM-wait-suspend-deadlock
> - 8366659: Fixed lines in tests.
> - Merge remote-tracking branch 'origin/master' into JDK-8366659-OM-wait-suspend-deadlock
> - 8366659: Added a comment to a boolean arg for enter()
> - Merge remote-tracking branch 'origin/master' into JDK-8366659-OM-wait-suspend-deadlock
> - Merge remote-tracking branch 'origin/master' into JDK-8366659-OM-wait-suspend-deadlock
> - 8366659: Fixed new lines.
> - Merge remote-tracking branch 'origin/master' into JDK-8366659-OM-wait-suspend-deadlock
> - 8366659: Removed incorrect assert,
> - 8366659: Fixed merge conflict
> - ... and 10 more: https://git.openjdk.org/jdk/compare/400a83da...702880c6
The transaction diagram in SuspendWithObjectMonitorWait.java on L56 -> L77 is
for the `doWork1` test so the comment should be modifed to make that clear by
adding this above L56:
//
// doWork1 algorithm:
I've created a transaction diagram for doWork2:
//
// doWork2 algorithm:
//
// main waiter resumer
// ================= ================== ===================
// launch waiter
// <launch returns> waiter running
// launch resumer enter threadLock
// <launch returns> threadLock.wait() resumer running
// enter threadLock : wait for notify
// threadLock.notify wait finishes :
// : reenter blocks :
// suspend waiter <suspended> :
// <ready to test> : :
// : : :
// notify resumer : wait finishes
// delay 1-second : :
// exit threadLock : :
// join resumer : enter threadLock
// : <resumed> resume waiter
// : : exit threadLock
// : reenter threadLock :
// <join returns> : resumer exits
// join waiter :
// <join returns> waiter exits
//
// Note: The sleep(1-second) in main along with the delayed exit
// of threadLock in main forces the resumer thread to reach
// "enter threadLock" and block. This difference from doWork1
// forces the resumer thread to be contending for threadLock
// while the waiter thread is in threadLock.wait() increasing
// stress on the monitor sub-system.
//
I've created a transaction diagram for doWork3:
//
// doWork3 algorithm:
//
// main waiter resumer
// =================== ====================== ===================
// launch waiter
// <launch returns> waiter running
// launch resumer enter threadLock
// <launch returns> while !READY_TO_NOTIFY resumer running
// delay 1-second threadLock.wait(1) wait for notify
// enter threadLock : :
// set READY_TO_NOTIFY :
// threadLock.notify wait finishes :
// : reenter blocks :
// suspend waiter <suspended> :
// <ready to test> : :
// : : :
// notify resumer : wait finishes
// delay 1-second : :
// exit threadLock : :
// join resumer : enter threadLock
// : <resumed> resume waiter
// : : exit threadLock
// : reenter threadLock :
// <join returns> : resumer exits
// join waiter :
// <join returns> waiter exits
//
// Note: The sleep(1-second) in main along with the delayed exit
// of threadLock in main forces the resumer thread to reach
// "enter threadLock" and block. This difference from doWork1
// forces the resumer thread to be contending for threadLock
// while the waiter thread is in the threadLock.wait(1) tight
// loop increasing stress on the monitor sub-system.
//
// Note: The first sleep(1-second) in main and the wait(1) in the waiter
// thread allows the waiter thread to loop tightly here:
// while !READY_TO_NOTIFY
// threadLock.wait(1)
//
-------------
PR Comment: https://git.openjdk.org/jdk/pull/27040#issuecomment-3529595262
PR Comment: https://git.openjdk.org/jdk/pull/27040#issuecomment-3529599987
PR Comment: https://git.openjdk.org/jdk/pull/27040#issuecomment-3529602353
More information about the serviceability-dev
mailing list