RFR (S): 8137165: Tests fail in SR_Handler because thread is not VMThread or JavaThread
David Holmes
david.holmes at oracle.com
Tue Mar 15 00:29:10 UTC 2016
Thanks Robbin.
Can I have a Reviewer too please.
David
On 14/03/2016 11:34 PM, Robbin Ehn wrote:
> Hi David,
>
> On 03/14/2016 07:46 AM, David Holmes wrote:
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8137165
>> Webrev: http://cr.openjdk.java.net/~dholmes/8137165/webrev/
>
> This looks good to me.
>
> /Robbin
>
>>
>> This isn't a fix per-se but some additional diagnostic code to try and
>> detect the conditions where the bug might manifest. The basic failure
>> mode was:
>>
>> # Internal Error
>> (/opt/jprt/T/P1/175841.hseigel/s/hotspot/src/os/linux/vm/os_linux.cpp:3950),
>>
>> pid=27906, tid=13248
>> # assert(thread->is_VM_thread() || thread->is_Java_thread()) failed:
>> Must be VMThread or JavaThread
>>
>> with a stack showing in part:
>>
>> #34 0xf6623ec0 in report_vm_error (
>> file=0xf71b6140
>> "/scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/os/linux/vm/os_linux.cpp",
>>
>> line=3901,
>> error_msg=0xf71b62e0 "assert(thread->is_VM_thread() ||
>> thread->is_Java_thread()) failed", detail_fmt=0xf71b62c0 "Must be
>> VMThread or JavaThread")
>> at
>> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/utilities/debug.cpp:218
>>
>>
>> #35 0xf6d21b3f in SR_handler (sig=12, siginfo=0xc1b58ccc,
>> context=0xc1b58d4c)
>> at
>> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/os/linux/vm/os_linux.cpp:3901
>>
>>
>> #36 <signal handler called>
>> #37 0xf776b430 in __kernel_vsyscall ()
>> #38 0xf773ccef in pthread_sigmask () from /lib/libpthread.so.0
>> #39 0xf6d23e6c in os::free_thread (osthread=0xc20cf8b8)
>> at
>> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/os/linux/vm/os_linux.cpp:879
>>
>>
>> #40 0xf6f6811d in Thread::~Thread (this=0xc20cd800, __in_chrg=<optimized
>> out>)
>> at
>> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/runtime/thread.cpp:367
>>
>>
>> #41 0xf6f6866f in JavaThread::~JavaThread (this=0xc20cd800,
>> __in_chrg=<optimized out>)
>> at
>> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/runtime/thread.cpp:1611
>>
>>
>> #42 0xf6f6877c in JavaThread::~JavaThread (this=0xc20cd800,
>> __in_chrg=<optimized out>)
>> at
>> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/runtime/thread.cpp:1655
>>
>>
>> #43 0xf6f74a38 in JavaThread::thread_main_inner (this=0xc20cd800)
>> at
>> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/runtime/thread.cpp:1724
>>
>>
>> #44 0xf6f74e12 in JavaThread::run (this=0xc20cd800)
>> at
>> /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/runtime/thread.cpp:1698
>>
>>
>> #45 0xf6d238ec in java_start (thread=0xc20cd800)
>>
>> What appears to be happening is that the thread has blocked SR_signum
>> (SIGUSR2) at some point (though there is no code that does this), and
>> the signal has become pending on the thread due to the event sampling
>> logic. The thread terminates, executing well into the destructor until
>> it gets to os::free_thread which restores the original signal mask for
>> the thread - that signal mask has SR_signum unblocked and so the signal
>> is delivered immediately and we enter the SR_handler. For some reason
>> this triggers the assertion failure - though why exactly is unclear as
>> we have not released the thread memory as yet, nor done anything that
>> should invalidate that call. Whatever the reason the state of the thread
>> causes secondary failures in the error reporting code as well.
>>
>> Attempts to reproduce this bug have been unsuccessful (so maybe we had a
>> random memory stomp on the thread state - who knows.)
>>
>> So what I am doing is simply adding an additional assertion to try and
>> catch, during regular testing, any occurrence of SR_signum being
>> blocked while a thread is terminating.
>>
>> In addition a couple of minor cleanups in the signal related code:
>> - strictly speaking SR_handler needs to use
>> Thread::current_or_null_safe() because it needs ot use library-based TLS
>> in a signal context.
>> - sigsets should (POSIX recommendation) be explicitly emptied/filled
>> before being set via pthread_sigmask
>> - change 0 to NULL in call to pthread_sigmask
>>
>> Testing: - JPRT, original failing testcase
>>
>> Thanks,
>> David
More information about the serviceability-dev
mailing list