RFR: 8215355: Object monitor deadlock with no threads holding the monitor (using jemalloc 5.1)

Mon Nov 18 02:30:48 UTC 2019

Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/

This was a very difficult bug to track down and I want to publicly 
acknowledge and thank the jemalloc folk (users and developers) for 
continuing to investigate this issue from their side. Without their 
persistence this issue would have languished.

The thread stack_base() is the first address above the thread's stack. 
However, the "in stack" checks performed by Thread::on_local_stack and 
Thread::is_in_stack allowed the checked address to be equal to the 
stack_base() - which is not correct. Here's how this manifests as the bug:

- Let a JavaThread instance, T2, be allocated at the end of thread T1's 
stack i.e. at T1->stack_base()
   [This seems to be why this only reproduced with jemalloc.]
- Let T2 lock an inflated monitor
- Let T1 try to lock the same monitor
   - T1 would consider the _owner field value (T2) as being in its stack 
and so consider the monitor stack-locked by T1
   - And so both T1 and T2 would have ownership of the monitor allowing 
the monitor state (and application state) to be corrupted. This results 
in a range of hangs and crashes depending on the exact interleaving.

Interestingly Thread::is_in_usable_stack does not have this bug.

The bug can be tracked way back to JDK-6699669 as explained in the bug 
report. That issue also showed that the same bug existed in the SA 
implementations of these "on stack" checks.

Testing:
   - The reproducer from the bug report, using jemalloc, ran over 5000 
times without failing in any way.
   - tiers 1-3 on all Oracle platforms
   - serviceability/sa tests

Thanks,
David
-----