RFR: 8366659: ObjectMonitor::wait() liveness problem with a suspension request [v20]
David Holmes
dholmes at openjdk.org
Mon Jan 19 01:57:36 UTC 2026
On Fri, 9 Jan 2026 12:05:37 GMT, Anton Artemov <aartemov at openjdk.org> wrote:
>> Hi, please consider the following changes:
>>
>> If suspension is allowed when a thread is re-entering an object monitor (OM), then a following liveness issues can happen in the `ObjectMonitor::wait()` method.
>>
>> The waiting thread is made to be a successor and is unparked. Upon a suspension request, the thread will suspend itself whilst clearing the successor. The OM will be left unlocked (not grabbed by any thread), while the other threads are parked until a thread grabs the OM and the exits it. The suspended thread is on the entry-list and can be selected as a successor again. None of other threads can be woken up to grab the OM until the suspended thread has been resumed and successfully releases the OM.
>>
>> This can happen in three places where the successor could be suspended:
>>
>> 1:
>> https://github.com/openjdk/jdk/blob/6322aaba63b235cb6c73d23a932210af318404ec/src/hotspot/share/runtime/objectMonitor.cpp#L1897
>>
>> 2:
>> https://github.com/openjdk/jdk/blob/6322aaba63b235cb6c73d23a932210af318404ec/src/hotspot/share/runtime/objectMonitor.cpp#L1149
>>
>> 3:
>> https://github.com/openjdk/jdk/blob/6322aaba63b235cb6c73d23a932210af318404ec/src/hotspot/share/runtime/objectMonitor.cpp#L1951
>>
>> The issues are addressed by not allowing suspension in case 1, and by handling the suspension request at a later stage, after the thread has grabbed the OM in `reenter_internal()` in case 2. In case of a suspension request, the thread exits the OM and enters it again once resumed.
>>
>> Case 3 is handled by not transferring a thread to the `entry_list` in `notify_internal()` in case the corresponding JVMTI event is allowed. Instead, a tread is unparked and let run. Since it is not on the `entry_list`, it will not be chosen as a successor and it is no harm to suspend it if needed when posting the event.
>>
>> Possible issue of posting a `waited` event while still be suspended is addressed by adding a suspension check just before the posting of event.
>>
>> Tests are added.
>>
>> Tested in tiers 1 - 7.
>
> Anton Artemov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 44 commits:
>
> - 8366659: Fixed year in the copyright.
> - Merge remote-tracking branch 'origin/master' into JDK-8366659-OM-wait-suspend-deadlock
> - 8366659: Removed ClearSuccOnSuspend
> - 8366659: Fixed liveness problem.
> - Merge remote-tracking branch 'origin/master' into JDK-8366659-OM-wait-suspend-deadlock
> - 8366659: Fixed build problem.
> - 8366659: Fixed build issue.
> - 8366659: Changed the way how notify_internal works if JVMTI monitor waited event allowed.
> - Merge remote-tracking branch 'origin/master' into JDK-8366659-OM-wait-suspend-deadlock
> - 8366659: Fixed semi-broken tests
> - ... and 34 more: https://git.openjdk.org/jdk/compare/a01283a5...21b83214
I'm struggling to map code changes to the description of the changes. I think a lot more comments are needed to describe things more clearly.
The three testcases need a clearer description - and '1' '2' '3' are not great names. Given we are in the SuspendWithObjectMonitorWait directory we do not need that prefix on all the file names - we can use shorter but more descriptive names for what is actually being tested.
Thanks
src/hotspot/share/runtime/objectMonitor.cpp line 1907:
> 1905: // That is, we fail toward safety.
> 1906:
> 1907: was_notified = true;
This isn't quite right. If we would reach line 1882 then we were not notified, so we cannot unconditionally set this to true here. I think you need to initialize to true at line 1868 and then set to false at line 1882 (deleting the current comment).
src/hotspot/share/runtime/objectMonitor.cpp line 1917:
> 1915: }
> 1916:
> 1917: // The thread is now either on off-list (TS_RUN),
Suggestion:
// The thread is now either off-list (TS_RUN),
src/hotspot/share/runtime/objectMonitor.cpp line 1932:
> 1930: // (Don't cache naked oops over safepoints, of course).
> 1931:
> 1932: // post monitor waited event. Note that this is past-tense, we are done waiting.
Suggestion:
// Post monitor waited event. Note that this is past-tense, we are done waiting.
You need to expand the comment to explain the significance of checking for TS_ENTER.
src/hotspot/share/runtime/objectMonitor.cpp line 1967:
> 1965: // We can go to a safepoint at the end of this block. If we
> 1966: // do a thread dump during that safepoint, then this thread will show
> 1967: // as having "-locked" the monitor, but the OS and java.lang.Thread
To be clear, if it is a plain-old safepoint, we will show the monitor as locked because it is. But if it happens we were suspended, then we will release the monitor and so we wouldn't report locked.
src/hotspot/share/runtime/objectMonitor.cpp line 1973:
> 1971: // and set the OM as pending.
> 1972: }
> 1973: // Done waiting, post the corresponding event
What event? I expected to see `JvmtiExport::post_monitor_waited` here. ??
src/hotspot/share/runtime/objectMonitor.cpp line 2053:
> 2051: // Adding to _entry_list uses Atomic::cmpxchg() which already provides
> 2052: // a fence that prevents reordering of the stores.
> 2053: if (!JvmtiExport::should_post_monitor_waited()) {
You need to explain why this is now conditional.
src/hotspot/share/runtime/objectMonitor.cpp line 2063:
> 2061: if (!JvmtiExport::should_post_monitor_waited()) {
> 2062: add_to_entry_list(current, iterator);
> 2063: } else {
You need to explain what each of these paths is doing. At this point I'm lost.
src/hotspot/share/runtime/objectMonitor.cpp line 2069:
> 2067:
> 2068: oop vthread = nullptr;
> 2069: ParkEvent* Trigger;
I assume this is copied from elsewhere but we should use proper style rules in the new code i.e. local variables don't start with a capital. Thanks. Also "trigger" doesn't really convet anythng meaningful to me - `evt` for event seems fine.
src/hotspot/share/runtime/objectMonitor.cpp line 2277:
> 2275: node->_at_reenter = true;
> 2276:
> 2277: if (state == ObjectWaiter::TS_RUN) {
I'm finding the changes to `was_notified` versus use of explicit state checks very confusing. I keep asking myself which state means I was notified and which does not.
test/hotspot/jtreg/serviceability/jvmti/SuspendWithObjectMonitorWait/SuspendWithObjectMonitorWait1.java line 71:
> 69: public int doWork1(int timeMax, PrintStream out) {
> 70: SuspendWithObjectMonitorWaitWorker waiter; // waiter thread
> 71: SuspendWithObjectMonitorWaitWorker resumer; // resumer thread
Suggestion:
SuspendWithObjectMonitorWaitWorker waiter; // waiter thread
SuspendWithObjectMonitorWaitWorker resumer; // resumer thread
-------------
Changes requested by dholmes (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/27040#pullrequestreview-3675924710
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2702902497
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2702905495
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2702909290
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2702947656
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2702958092
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2702960742
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2702965432
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2702967605
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2702974131
PR Review Comment: https://git.openjdk.org/jdk/pull/27040#discussion_r2702975301
More information about the hotspot-runtime-dev
mailing list