RFR: 8215355: Object monitor deadlock with no threads holding the monitor (using jemalloc 5.1)

serguei.spitsyn at oracle.com serguei.spitsyn at oracle.com
Tue Nov 19 04:34:39 UTC 2019


Hi David,

The fix looks good.
It is besides the platform-dependent code that Thomas flagged.

There can be similar broken code on other platforms.
For instance, there is a suspicious spot in cpu/ppc/frame_ppc.cpp:

     // sender_fp must be within the stack and above (but not
     // equal) current frame's fp.
     if (sender_fp > thread->stack_base() || sender_fp <= fp) {
         return false;
     }

Thanks,
Serguei


On 11/17/19 18:30, David Holmes wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
> webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
>
> This was a very difficult bug to track down and I want to publicly 
> acknowledge and thank the jemalloc folk (users and developers) for 
> continuing to investigate this issue from their side. Without their 
> persistence this issue would have languished.
>
> The thread stack_base() is the first address above the thread's stack. 
> However, the "in stack" checks performed by Thread::on_local_stack and 
> Thread::is_in_stack allowed the checked address to be equal to the 
> stack_base() - which is not correct. Here's how this manifests as the 
> bug:
>
> - Let a JavaThread instance, T2, be allocated at the end of thread 
> T1's stack i.e. at T1->stack_base()
>   [This seems to be why this only reproduced with jemalloc.]
> - Let T2 lock an inflated monitor
> - Let T1 try to lock the same monitor
>   - T1 would consider the _owner field value (T2) as being in its 
> stack and so consider the monitor stack-locked by T1
>   - And so both T1 and T2 would have ownership of the monitor allowing 
> the monitor state (and application state) to be corrupted. This 
> results in a range of hangs and crashes depending on the exact 
> interleaving.
>
> Interestingly Thread::is_in_usable_stack does not have this bug.
>
> The bug can be tracked way back to JDK-6699669 as explained in the bug 
> report. That issue also showed that the same bug existed in the SA 
> implementations of these "on stack" checks.
>
> Testing:
>   - The reproducer from the bug report, using jemalloc, ran over 5000 
> times without failing in any way.
>   - tiers 1-3 on all Oracle platforms
>   - serviceability/sa tests
>
> Thanks,
> David
> -----



More information about the serviceability-dev mailing list