RFR: 8215355: Object monitor deadlock with no threads holding the monitor (using jemalloc 5.1)
Thomas Stüfe
thomas.stuefe at gmail.com
Mon Nov 18 13:31:13 UTC 2019
Hi David,
On Mon, Nov 18, 2019 at 2:26 PM David Holmes <david.holmes at oracle.com>
wrote:
> Hi Thomas,
>
> Thanks for taking a look.
>
> On 18/11/2019 9:58 pm, Thomas Stüfe wrote:
> > This is evil :)
> >
> > There might be more cases like this, e.g.
> >
> > frame_x86.cpp frame::is_interpreted_frame_valid():
> >
> > if (locals > thread->stack_base() || locals < (address) fp()) return
> false;
>
> Yes that might be a case where >= should be in use. I'll file another
> bug to check uses of stack_base().
>
>
Many of them could use Thread::in_usable_stack(), I assume.
> > Also, I would have thought the little alloca() dance we do at the start
> > of thread_native_entry() would push the first real frame down the stack.
>
> I know nothing of that code. :)
>
See os_linux.cpp:
...
// Try to randomize the cache line index of hot stack frames.
// This helps when threads of the same stack traces evict each other's
// cache lines. The threads can be either from the same JVM instance, or
// from different JVM instances. The benefit is especially true for
// processors with hyperthreading technology.
static int counter = 0;
int pid = os::current_process_id();
alloca(((pid ^ counter++) & 7) * 128);
> > The fix looks good.
>
> Thanks!
>
> David
> -----
>
>
Cheers, Thomas
> > Cheers, Thomas
> >
> >
> >
> > On Mon, Nov 18, 2019 at 3:31 AM David Holmes <david.holmes at oracle.com
> > <mailto:david.holmes at oracle.com>> wrote:
> >
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
> > webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
> >
> > This was a very difficult bug to track down and I want to publicly
> > acknowledge and thank the jemalloc folk (users and developers) for
> > continuing to investigate this issue from their side. Without their
> > persistence this issue would have languished.
> >
> > The thread stack_base() is the first address above the thread's
> stack.
> > However, the "in stack" checks performed by Thread::on_local_stack
> and
> > Thread::is_in_stack allowed the checked address to be equal to the
> > stack_base() - which is not correct. Here's how this manifests as
> > the bug:
> >
> > - Let a JavaThread instance, T2, be allocated at the end of thread
> T1's
> > stack i.e. at T1->stack_base()
> > [This seems to be why this only reproduced with jemalloc.]
> > - Let T2 lock an inflated monitor
> > - Let T1 try to lock the same monitor
> > - T1 would consider the _owner field value (T2) as being in its
> > stack
> > and so consider the monitor stack-locked by T1
> > - And so both T1 and T2 would have ownership of the monitor
> > allowing
> > the monitor state (and application state) to be corrupted. This
> results
> > in a range of hangs and crashes depending on the exact interleaving.
> >
> > Interestingly Thread::is_in_usable_stack does not have this bug.
> >
> > The bug can be tracked way back to JDK-6699669 as explained in the
> bug
> > report. That issue also showed that the same bug existed in the SA
> > implementations of these "on stack" checks.
> >
> > Testing:
> > - The reproducer from the bug report, using jemalloc, ran over
> 5000
> > times without failing in any way.
> > - tiers 1-3 on all Oracle platforms
> > - serviceability/sa tests
> >
> > Thanks,
> > David
> > -----
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20191118/18ebca9b/attachment.html>
More information about the serviceability-dev
mailing list