RFR: 8215355: Object monitor deadlock with no threads holding the monitor (using jemalloc 5.1)

Mon Nov 18 13:31:13 UTC 2019

Hi David,

On Mon, Nov 18, 2019 at 2:26 PM David Holmes <david.holmes at oracle.com>
wrote:

> Hi Thomas,
>
> Thanks for taking a look.
>
> On 18/11/2019 9:58 pm, Thomas Stüfe wrote:
> > This is evil :)
> >
> > There might be more cases like this, e.g.
> >
> > frame_x86.cpp  frame::is_interpreted_frame_valid():
> >
> > if (locals > thread->stack_base() || locals < (address) fp()) return
> false;
>
> Yes that might be a case where >= should be in use. I'll file another
> bug to check uses of stack_base().
>
>
Many of them could use Thread::in_usable_stack(), I assume.

> > Also, I would have thought the little alloca() dance we do at the start
> > of thread_native_entry() would push the first real frame down the stack.
>
> I know nothing of that code. :)
>

See os_linux.cpp:
...
  // Try to randomize the cache line index of hot stack frames.
  // This helps when threads of the same stack traces evict each other's
  // cache lines. The threads can be either from the same JVM instance, or
  // from different JVM instances. The benefit is especially true for
  // processors with hyperthreading technology.
  static int counter = 0;
  int pid = os::current_process_id();
  alloca(((pid ^ counter++) & 7) * 128);

> > The fix looks good.
>
> Thanks!
>
> David
> -----
>
>
Cheers, Thomas

> > Cheers, Thomas
> >
> >
> >
> > On Mon, Nov 18, 2019 at 3:31 AM David Holmes <david.holmes at oracle.com
> > <mailto:david.holmes at oracle.com>> wrote:
> >
> >     Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
> >     webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
> >
> >     This was a very difficult bug to track down and I want to publicly
> >     acknowledge and thank the jemalloc folk (users and developers) for
> >     continuing to investigate this issue from their side. Without their
> >     persistence this issue would have languished.
> >
> >     The thread stack_base() is the first address above the thread's
> stack.
> >     However, the "in stack" checks performed by Thread::on_local_stack
> and
> >     Thread::is_in_stack allowed the checked address to be equal to the
> >     stack_base() - which is not correct. Here's how this manifests as
> >     the bug:
> >
> >     - Let a JavaThread instance, T2, be allocated at the end of thread
> T1's
> >     stack i.e. at T1->stack_base()
> >         [This seems to be why this only reproduced with jemalloc.]
> >     - Let T2 lock an inflated monitor
> >     - Let T1 try to lock the same monitor
> >         - T1 would consider the _owner field value (T2) as being in its
> >     stack
> >     and so consider the monitor stack-locked by T1
> >         - And so both T1 and T2 would have ownership of the monitor
> >     allowing
> >     the monitor state (and application state) to be corrupted. This
> results
> >     in a range of hangs and crashes depending on the exact interleaving.
> >
> >     Interestingly Thread::is_in_usable_stack does not have this bug.
> >
> >     The bug can be tracked way back to JDK-6699669 as explained in the
> bug
> >     report. That issue also showed that the same bug existed in the SA
> >     implementations of these "on stack" checks.
> >
> >     Testing:
> >         - The reproducer from the bug report, using jemalloc, ran over
> 5000
> >     times without failing in any way.
> >         - tiers 1-3 on all Oracle platforms
> >         - serviceability/sa tests
> >
> >     Thanks,
> >     David
> >     -----
> >
>