RFR: 8215355: Object monitor deadlock with no threads holding the monitor (using jemalloc 5.1)

Mon Nov 18 11:07:37 UTC 2019

Looks good, thanks David!

/Robbin

On 11/18/19 3:30 AM, David Holmes wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
> webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
> 
> This was a very difficult bug to track down and I want to publicly acknowledge 
> and thank the jemalloc folk (users and developers) for continuing to investigate 
> this issue from their side. Without their persistence this issue would have 
> languished.
> 
> The thread stack_base() is the first address above the thread's stack. However, 
> the "in stack" checks performed by Thread::on_local_stack and 
> Thread::is_in_stack allowed the checked address to be equal to the stack_base() 
> - which is not correct. Here's how this manifests as the bug:
> 
> - Let a JavaThread instance, T2, be allocated at the end of thread T1's stack 
> i.e. at T1->stack_base()
>    [This seems to be why this only reproduced with jemalloc.]
> - Let T2 lock an inflated monitor
> - Let T1 try to lock the same monitor
>    - T1 would consider the _owner field value (T2) as being in its stack and so 
> consider the monitor stack-locked by T1
>    - And so both T1 and T2 would have ownership of the monitor allowing the 
> monitor state (and application state) to be corrupted. This results in a range 
> of hangs and crashes depending on the exact interleaving.
> 
> Interestingly Thread::is_in_usable_stack does not have this bug.
> 
> The bug can be tracked way back to JDK-6699669 as explained in the bug report. 
> That issue also showed that the same bug existed in the SA implementations of 
> these "on stack" checks.
> 
> Testing:
>    - The reproducer from the bug report, using jemalloc, ran over 5000 times 
> without failing in any way.
>    - tiers 1-3 on all Oracle platforms
>    - serviceability/sa tests
> 
> Thanks,
> David
> -----