RFR: 8215355: Object monitor deadlock with no threads holding the monitor (using jemalloc 5.1)
David Holmes
david.holmes at oracle.com
Tue Nov 19 04:37:24 UTC 2019
Hi Serguei,
On 19/11/2019 2:34 pm, serguei.spitsyn at oracle.com wrote:
> Hi David,
>
> The fix looks good.
Thanks for taking a look!
> It is besides the platform-dependent code that Thomas flagged.
>
> There can be similar broken code on other platforms.
> For instance, there is a suspicious spot in cpu/ppc/frame_ppc.cpp:
>
> // sender_fp must be within the stack and above (but not
> // equal) current frame's fp.
> if (sender_fp > thread->stack_base() || sender_fp <= fp) {
> return false;
> }
I have filed:
https://bugs.openjdk.java.net/browse/JDK-8234372
"Investigate use of Thread::stack_base() and queries for "in stack""
to look at all uses of stack_base().
Thanks,
David
> Thanks,
> Serguei
>
>
> On 11/17/19 18:30, David Holmes wrote:
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
>> webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
>>
>> This was a very difficult bug to track down and I want to publicly
>> acknowledge and thank the jemalloc folk (users and developers) for
>> continuing to investigate this issue from their side. Without their
>> persistence this issue would have languished.
>>
>> The thread stack_base() is the first address above the thread's stack.
>> However, the "in stack" checks performed by Thread::on_local_stack and
>> Thread::is_in_stack allowed the checked address to be equal to the
>> stack_base() - which is not correct. Here's how this manifests as the
>> bug:
>>
>> - Let a JavaThread instance, T2, be allocated at the end of thread
>> T1's stack i.e. at T1->stack_base()
>> [This seems to be why this only reproduced with jemalloc.]
>> - Let T2 lock an inflated monitor
>> - Let T1 try to lock the same monitor
>> - T1 would consider the _owner field value (T2) as being in its
>> stack and so consider the monitor stack-locked by T1
>> - And so both T1 and T2 would have ownership of the monitor allowing
>> the monitor state (and application state) to be corrupted. This
>> results in a range of hangs and crashes depending on the exact
>> interleaving.
>>
>> Interestingly Thread::is_in_usable_stack does not have this bug.
>>
>> The bug can be tracked way back to JDK-6699669 as explained in the bug
>> report. That issue also showed that the same bug existed in the SA
>> implementations of these "on stack" checks.
>>
>> Testing:
>> - The reproducer from the bug report, using jemalloc, ran over 5000
>> times without failing in any way.
>> - tiers 1-3 on all Oracle platforms
>> - serviceability/sa tests
>>
>> Thanks,
>> David
>> -----
>
More information about the serviceability-dev
mailing list