RFR: 8366659: ObjectMonitor::wait() liveness problem with a suspension request [v31]
Daniel D. Daugherty
dcubed at openjdk.org
Fri Jan 23 20:13:03 UTC 2026
On Fri, 23 Jan 2026 09:05:58 GMT, Anton Artemov <aartemov at openjdk.org> wrote:
>> Hi, please consider the following changes:
>>
>> If suspension is allowed when a thread is re-entering an object monitor (OM), then a following liveness issues can happen in the `ObjectMonitor::wait()` method.
>>
>> The waiting thread is made to be a successor and is unparked. Upon a suspension request, the thread will suspend itself whilst clearing the successor. The OM will be left unlocked (not grabbed by any thread), while the other threads are parked until a thread grabs the OM and the exits it. The suspended thread is on the entry-list and can be selected as a successor again. None of other threads can be woken up to grab the OM until the suspended thread has been resumed and successfully releases the OM.
>>
>> This can happen in three places where the successor could be suspended:
>>
>> 1:
>> https://github.com/openjdk/jdk/blob/6322aaba63b235cb6c73d23a932210af318404ec/src/hotspot/share/runtime/objectMonitor.cpp#L1897
>>
>> 2:
>> https://github.com/openjdk/jdk/blob/6322aaba63b235cb6c73d23a932210af318404ec/src/hotspot/share/runtime/objectMonitor.cpp#L1149
>>
>> 3:
>> https://github.com/openjdk/jdk/blob/6322aaba63b235cb6c73d23a932210af318404ec/src/hotspot/share/runtime/objectMonitor.cpp#L1951
>>
>> The issues are addressed by not allowing suspension in case 1, and by handling the suspension request at a later stage, after the thread has grabbed the OM in `reenter_internal()` in case 2. In case of a suspension request, the thread exits the OM and enters it again once resumed.
>>
>> Case 3 is handled by not transferring a thread to the `entry_list` in `notify_internal()` in case the corresponding JVMTI event is allowed. Instead, a tread is unparked and let run. Since it is not on the `entry_list`, it will not be chosen as a successor and it is no harm to suspend it if needed when posting the event.
>>
>> Possible issue of posting a `waited` event while still be suspended is addressed by adding a suspension check just before the posting of event.
>>
>> Tests are added.
>>
>> Tested in tiers 1 - 7.
>
> Anton Artemov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 56 commits:
>
> - Merge remote-tracking branch 'origin/master' into JDK-8366659-OM-wait-suspend-deadlock
> - 8366659: Changed condition on when to post an event.
> - 8366659: Fixed whitespaces.
> - 8366659: Addressed reviewer's comments.
> - 8366659: Addressed reviewer's comments.
> - 8366659: Addressed reviewers' comments.
> - 8366659: Fixed whitespace.
> - 8366659: Addressed reviewer's comments.
> - 8366659: Addressed reviewers' comments, added comments, renamed tests.
> - 8366659: Modified the comment.
> - ... and 46 more: https://git.openjdk.org/jdk/compare/fa20391e...31779cad
I last reviewed v25 and I tried to use the "Changes since you last reviewed" link, but I
couldn't make sense of what I was seeing in the comparison from v25 -> v30. I'm gonna
have to go back and do another crawl thru review here.
Is it just me or has this patch (not counting the test refactoring) gotten really big? I realize
that what we're trying to fix here is extremely complicated hence the size of the corrections.
The bigger the patch, the harder it is to backport, but I don't know if we have any plans to
backport this at all. Also, I don't see a root cause analysis in the bug report so we don't
know how far back this issue goes.
I suppose we could morph the new tests into something standalone that doesn't depend
on anything in the current test suites and then work our way backward thru the LTS releases
to see how far back this goes...
-------------
PR Comment: https://git.openjdk.org/jdk/pull/27040#issuecomment-3792172270
More information about the hotspot-runtime-dev
mailing list