RFR: 8366659: ObjectMonitor::wait() can deadlock with a suspension request [v17]

Serguei Spitsyn sspitsyn at openjdk.org
Mon Nov 24 10:24:17 UTC 2025


On Fri, 21 Nov 2025 11:38:49 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> Thanks @sspitsyn for diving into the issue. 
>> 
>> With that definition of the deadlock and suspension logic I agree that it might not be a problem at all. With this being said, is the existing test `SuspendWithObjectMonitorWait` demonstrating a real-world scenario? @dcubed-ojdk, what do you think?
>
>> is the existing test SuspendWithObjectMonitorWait demonstrating a real-world scenario?
> 
> It does not look as such. There could be some motivation to write it however, e.g. checking some invariants. At least, it seems this test does not enforce any strict rules on the OM implementation and JVMTI events + suspend/resume. :)
> New tests do not allow for OM implementation to put `MonitorWaited` event notification at a right spot. Otherwise, I would not object against them.

> @sspitsyn so your position is that it is okay for suspension to cause something to break as long as resuming the suspended thread then fixes things? Does it matter how much time passes?

Suspend/Resume is a debugging feature and normally used in a debugging session and expected to cause a slow down. 
Also, it is known to be somewhat risky to use. So, my answer is yes. Slow down time does not matter much as it depends on a suspension time.

> We have had a lot of discussion about this outside the PR and some of us at least feel there is a distinction between suspending a thread that clearly holds an application level resource (like a monitor) which then prevents other threads from continuing, versus suspending a thread holding a VM internal "resource" that prevents other threads from continuing.

Agreed

> The design of JVM TI thread suspension actively tries to minimise the ability to hold any internal VM resource whilst suspended, and the current problem seems a variant of that. If we suspend a thread that has not yet acquired a monitor, and inspection of the monitor would show it is not owned, then it seems a bug if other threads trying to acquire that monitor can not make progress.

You, most likely right that the current problem is a variant of that. But I disagree to qualify this issue as a deadlock.
The thread was picked as a successor and then suspended. It feels like it has to be qualified same as a thread owns the monitor and suspended. The issue is that the OM real state and the JVMTI state bits do to reflect this.
I feel that changing the order of JVMTI OM events and giving up the symmetry between `timed-wait` and `notified-wait` is risky and may cause more problems than this bug is trying to solve. I'm thinking if there is a way the tread could give up
being the OM successor.

> Agreed the tests are completely artificial but there is no way to test this other than to do that.

Agree with this. But the deadlock is in the new tests, not in the OM implementation. And yes, constructing a deadlock was needed to demonstrate the problem. It is confusing. :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27040#issuecomment-3569971399


More information about the serviceability-dev mailing list