RFR: 8369944: Notification can be lost due to interrupt in Object.wait

David Holmes dholmes at openjdk.org
Fri Oct 17 03:09:02 UTC 2025


On Thu, 16 Oct 2025 22:03:51 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

> If a thread waiting in `Object.wait` is interrupted at the same time it is being notified, it could wake up, read `node.TState` as `TS_ENTER` but then read `node._notified` as false. There is no ordering in `notify_internal()` between setting `iterator->_notified` to true and setting `node.TState` to `TS_ENTER`. 
> This means that `WasNotified` will be false and the thread will throw IE. But we will then miss a notification as per the JLS 17.2.4 specs:
> 
> `Note that if a thread is both interrupted and woken via notify, and that thread returns from wait by throwing an InterruptedException, then some other thread in the wait set must be notified.`
> 
> There is no need for `_notified`, since the value of `node.TState` can be used to tell if the thread was notified or not.
> Testing in mach5 tiers1-5.
> 
> Related note: besides checking for `!WasNotified`, I think we should also check for `ret == OS_OK` before throwing IE. This could have been a timed-wait that timed-out, and only after that, the thread was interrupted while trying to reenter the monitor (the thread can be observed in the reenter part either with JVMTI events enabled or by reading the j.l.Thread.State).
> 
> Thanks,
> Patricio

Great find - but now I find myself second guessing a lot of the code I thought I understood.

src/hotspot/share/runtime/objectMonitor.cpp line 1885:

> 1883:       if (interrupted || HAS_PENDING_EXCEPTION) {
> 1884:         // Intentionally empty
> 1885:       } else if (node.TState == ObjectWaiter::TS_WAIT) {

I must have misread the code previously, as I thought this was read under the wait_set_lock, but it isn't. So what stops us from seeing a stale TS_WAIT here and continuing with the park even though we were already notified? I'm assuming the implicit barriers in the TBIVMP? Or is it much more subtle than that, and the successor protocol will unpark us once the monitor could be available?

src/hotspot/share/runtime/objectMonitor.cpp line 1925:

> 1923:     OrderAccess::loadload();
> 1924:     if (has_successor(current)) clear_successor();
> 1925:     WasNotified = node.TState == ObjectWaiter::TS_ENTER;

We don't really need the `WasNotified` local any more as it is only used once below (which saves me from asking you to rename it `was_notified`.)

src/hotspot/share/runtime/objectMonitor.hpp line 70:

> 68:   bool is_wait()            const { return _is_wait; }
> 69:   bool at_reenter()         const { return _at_reenter; }
> 70:   bool at_monitorenter()    const { return !_is_wait || TState != TS_WAIT; }

Why did we drop the check of `_at_reenter` here?

-------------

PR Review: https://git.openjdk.org/jdk/pull/27856#pullrequestreview-3347757143
PR Review Comment: https://git.openjdk.org/jdk/pull/27856#discussion_r2438090869
PR Review Comment: https://git.openjdk.org/jdk/pull/27856#discussion_r2438072112
PR Review Comment: https://git.openjdk.org/jdk/pull/27856#discussion_r2438064864


More information about the hotspot-runtime-dev mailing list