RFR: 8215355: Object monitor deadlock with no threads holding the monitor (using jemalloc 5.1)
David Holmes
david.holmes at oracle.com
Mon Nov 18 02:30:48 UTC 2019
Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
This was a very difficult bug to track down and I want to publicly
acknowledge and thank the jemalloc folk (users and developers) for
continuing to investigate this issue from their side. Without their
persistence this issue would have languished.
The thread stack_base() is the first address above the thread's stack.
However, the "in stack" checks performed by Thread::on_local_stack and
Thread::is_in_stack allowed the checked address to be equal to the
stack_base() - which is not correct. Here's how this manifests as the bug:
- Let a JavaThread instance, T2, be allocated at the end of thread T1's
stack i.e. at T1->stack_base()
[This seems to be why this only reproduced with jemalloc.]
- Let T2 lock an inflated monitor
- Let T1 try to lock the same monitor
- T1 would consider the _owner field value (T2) as being in its stack
and so consider the monitor stack-locked by T1
- And so both T1 and T2 would have ownership of the monitor allowing
the monitor state (and application state) to be corrupted. This results
in a range of hangs and crashes depending on the exact interleaving.
Interestingly Thread::is_in_usable_stack does not have this bug.
The bug can be tracked way back to JDK-6699669 as explained in the bug
report. That issue also showed that the same bug existed in the SA
implementations of these "on stack" checks.
Testing:
- The reproducer from the bug report, using jemalloc, ran over 5000
times without failing in any way.
- tiers 1-3 on all Oracle platforms
- serviceability/sa tests
Thanks,
David
-----
More information about the hotspot-runtime-dev
mailing list