RFR (S): 8137165: Tests fail in SR_Handler because thread is not VMThread or JavaThread

Robbin Ehn robbin.ehn at oracle.com
Mon Mar 14 13:34:42 UTC 2016


Hi David,

On 03/14/2016 07:46 AM, David Holmes wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8137165
> Webrev: http://cr.openjdk.java.net/~dholmes/8137165/webrev/

This looks good to me.

/Robbin

>
> This isn't a fix per-se but some additional diagnostic code to try and
> detect the conditions where the bug might manifest. The basic failure
> mode was:
>
> # Internal Error
> (/opt/jprt/T/P1/175841.hseigel/s/hotspot/src/os/linux/vm/os_linux.cpp:3950),
> pid=27906, tid=13248
> # assert(thread->is_VM_thread() || thread->is_Java_thread()) failed:
> Must be VMThread or JavaThread
>
> with a stack showing in part:
>
> #34 0xf6623ec0 in report_vm_error (
>      file=0xf71b6140
> "/scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/os/linux/vm/os_linux.cpp",
> line=3901,
>      error_msg=0xf71b62e0 "assert(thread->is_VM_thread() ||
> thread->is_Java_thread()) failed", detail_fmt=0xf71b62c0 "Must be
> VMThread or JavaThread")
>      at
> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/utilities/debug.cpp:218
>
> #35 0xf6d21b3f in SR_handler (sig=12, siginfo=0xc1b58ccc,
> context=0xc1b58d4c)
>      at
> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/os/linux/vm/os_linux.cpp:3901
>
> #36 <signal handler called>
> #37 0xf776b430 in __kernel_vsyscall ()
> #38 0xf773ccef in pthread_sigmask () from /lib/libpthread.so.0
> #39 0xf6d23e6c in os::free_thread (osthread=0xc20cf8b8)
>      at
> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/os/linux/vm/os_linux.cpp:879
>
> #40 0xf6f6811d in Thread::~Thread (this=0xc20cd800, __in_chrg=<optimized
> out>)
>      at
> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/runtime/thread.cpp:367
>
> #41 0xf6f6866f in JavaThread::~JavaThread (this=0xc20cd800,
>      __in_chrg=<optimized out>)
>      at
> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/runtime/thread.cpp:1611
>
> #42 0xf6f6877c in JavaThread::~JavaThread (this=0xc20cd800,
>      __in_chrg=<optimized out>)
>      at
> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/runtime/thread.cpp:1655
>
> #43 0xf6f74a38 in JavaThread::thread_main_inner (this=0xc20cd800)
>      at
> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/runtime/thread.cpp:1724
>
> #44 0xf6f74e12 in JavaThread::run (this=0xc20cd800)
>      at
> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/runtime/thread.cpp:1698
>
> #45 0xf6d238ec in java_start (thread=0xc20cd800)
>
> What appears to be happening is that the thread has blocked SR_signum
> (SIGUSR2) at some point (though there is no code that does this), and
> the signal has become pending on the thread due to the event sampling
> logic. The thread terminates, executing well into the destructor until
> it gets to os::free_thread which restores the original signal mask for
> the thread - that signal mask has SR_signum unblocked and so the signal
> is delivered immediately and we enter the SR_handler. For some reason
> this triggers the assertion failure - though why exactly is unclear as
> we have not released the thread memory as yet, nor done anything that
> should invalidate that call. Whatever the reason the state of the thread
> causes secondary failures in the error reporting code as well.
>
> Attempts to reproduce this bug have been unsuccessful (so maybe we had a
> random memory stomp on the thread state - who knows.)
>
> So what I am doing is simply adding an additional assertion to try and
> catch, during regular testing, any  occurrence of SR_signum being
> blocked while a thread is terminating.
>
> In addition a couple of minor cleanups in the signal related code:
> - strictly speaking SR_handler needs to use
> Thread::current_or_null_safe() because it needs ot use library-based TLS
> in a signal context.
> - sigsets should (POSIX recommendation) be explicitly emptied/filled
> before being set via pthread_sigmask
> - change 0 to NULL in call to pthread_sigmask
>
> Testing: - JPRT, original failing testcase
>
> Thanks,
> David


More information about the serviceability-dev mailing list