RFR (XS): 7129715: MAC: SIGBUS in nsk stress test
Daniel D. Daugherty
daniel.daugherty at oracle.com
Thu Jun 14 12:59:55 PDT 2012
On 6/14/12 8:31 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/7129715/webrev.00/
>
> StackOverflowError exceptions may get lost on OSX. The changes that were made to the signal handler to check for SIGSEGV or SIGBUS rather than only SIGSEGV or only SIGBUS in several places broke the logic of the handler.
>
> With the test of the CR, the thread catches a SIGSEGV or SIGBUS because it's exhausting its stack. The stub to return to is set:
> stub = SharedRuntime::continuation_for_implicit_exception(thread, pc, SharedRuntime::STACK_OVERFLOW);
> but it is overwritten in:
> #if defined(__APPLE__)
> // 32-bit Darwin reports a SIGBUS for nearly all memory access exceptions.
> // 64-bit Darwin may also use a SIGBUS (seen with compressed oops).
> // Catching SIGBUS here prevents the implicit SIGBUS NULL check below from
> // being called, so only do so if the implicit NULL check is not necessary.
> } else if (sig == SIGBUS&& MacroAssembler::needs_explicit_null_check((intptr_t)info->si_addr)) {
> #else
> } else if (sig == SIGBUS /*&& info->si_code == BUS_OBJERR */) {
> #endif
> // BugId 4454115: A read from a MappedByteBuffer can fault
> // here if the underlying file has been truncated.
> // Do not crash the VM in such a case.
> CodeBlob* cb = CodeCache::find_blob_unsafe(pc);
> nmethod* nm = cb->is_nmethod() ? (nmethod*)cb : NULL;
> if (nm != NULL&& nm->has_unsafe_access()) {
> stub = StubRoutines::handler_for_unsafe_access();
> }
>
> so the stack overflow exception is not thrown and the thread continues growing the stack leading to a fatal error.
>
> Roland.
Roland,
Thanks for tackling such nasty code...
Just trying to understand this one... These checks:
476 if (sig == SIGSEGV || sig == SIGBUS) {
480 if (addr < thread->stack_base() &&
481 addr >= thread->stack_base() - thread->stack_size()) {
483 if (thread->in_stack_yellow_zone(addr)) {
485 if (thread->thread_state() == _thread_in_Java) {
tell us that we took a SIGSEGV or SIGBUS while running Java code
in the yellow zone of our stack... so stack overflow... which gets
us to this setting of "stub":
488 stub =
SharedRuntime::continuation_for_implicit_exception(thread, pc,
SharedRuntime::STACK_OVERFLOW);
This line:
519 if (thread->thread_state() == _thread_in_Java) {
gets us into another block of "stub" setting code, but it
currently doesn't care that "stub" was already set. That's
the code you're trying to fix with this new line:
519 if (thread->thread_state() == _thread_in_Java && stub == NULL) {
Just to be complete, I'm trying to understand which of the
many places that set "stub" is clobbering the existing value.
I think it is this code block:
603 } else if ((sig == SIGSEGV || sig == SIGBUS) &&
604
!MacroAssembler::needs_explicit_null_check((intptr_t)info->si_addr)) {
605 // Determination of interpreter/vtable stub/compiled
code null exception
606 stub =
SharedRuntime::continuation_for_implicit_exception(thread, pc,
SharedRuntime::IMPLICIT_NULL);
607 }
With the new code in place, "stub == NULL" is false and we hit this
code block:
608 } else if (thread->thread_state() == _thread_in_vm &&
609 sig == SIGBUS && /* info->si_code == BUS_OBJERR
&& */
610 thread->doing_unsafe_access()) {
611 stub = StubRoutines::handler_for_unsafe_access();
612 }
613
614 // jni_fast_Get<Primitive>Field can trap at certain pc's if
a GC kicks in
615 // and the heap gets shrunk before the field access.
616 if ((sig == SIGSEGV) || (sig == SIGBUS)) {
617 address addr = JNI_FastGetField::find_slowcase_pc(pc);
618 if (addr != (address)-1) {
619 stub = addr;
620 }
621 }
We already know that the thread is _thread_in_Java so we don't take
the branch on 608. However, it looks like if the signal is SIGBUS,
then we'll make the call to JNI_FastGetField::find_slowcase_pc()
where we'll search some list of cached PCs (I think). I'm guessing
that since we're in _thread_in_Java that our PC won't match anything
on that list, but... I'm going to guess that the find_slowcase_pc()
call could be better guarded, but that's a different problem.
We'll bypass this check:
627 if ((sig == SIGSEGV || sig == SIGBUS) &&
628 os::is_memory_serialize_page(thread, (address)
info->si_addr)) {
because we're not the serialization page...
We by-pass a huge block of #ifndef AMD64 code (I hope :-))...
And I think with the fixed version of line 519, we land here:
714 if (stub != NULL) {
715 // save all thread context in case we need to restore it
716 if (thread != NULL) thread->set_saved_exception_pc(pc);
717
718 uc->context_pc = (intptr_t)stub;
719 return true;
720 }
Of course, with all the #ifdef and #ifndef stuff, I could be all
wet and don't really understand what this code is doing at all...
Thumbs up, I think... now my head hurts... :-)
Dan
More information about the hotspot-compiler-dev
mailing list