RFR (XS): 7129715: MAC: SIGBUS in nsk stress test

Thu Jun 14 12:59:55 PDT 2012

On 6/14/12 8:31 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/7129715/webrev.00/
>
> StackOverflowError exceptions may get lost on OSX. The changes that were made to the signal handler to check for SIGSEGV or SIGBUS rather than only SIGSEGV or only SIGBUS in several places broke the logic of the handler.
>
> With the test of the CR, the thread catches a SIGSEGV or SIGBUS because it's exhausting its stack. The stub to return to is set:
> stub = SharedRuntime::continuation_for_implicit_exception(thread, pc, SharedRuntime::STACK_OVERFLOW);
> but it is overwritten in:
> #if defined(__APPLE__)
>        // 32-bit Darwin reports a SIGBUS for nearly all memory access exceptions.
>        // 64-bit Darwin may also use a SIGBUS (seen with compressed oops).
>        // Catching SIGBUS here prevents the implicit SIGBUS NULL check below from
>        // being called, so only do so if the implicit NULL check is not necessary.
>        } else if (sig == SIGBUS&&  MacroAssembler::needs_explicit_null_check((intptr_t)info->si_addr)) {
> #else
>        } else if (sig == SIGBUS /*&&  info->si_code == BUS_OBJERR */) {
> #endif
>          // BugId 4454115: A read from a MappedByteBuffer can fault
>          // here if the underlying file has been truncated.
>          // Do not crash the VM in such a case.
>          CodeBlob* cb = CodeCache::find_blob_unsafe(pc);
>          nmethod* nm = cb->is_nmethod() ? (nmethod*)cb : NULL;
>          if (nm != NULL&&  nm->has_unsafe_access()) {
>            stub = StubRoutines::handler_for_unsafe_access();
>          }
>
> so the stack overflow exception is not thrown and the thread continues growing the stack leading to a fatal error.
>
> Roland.

Roland,

Thanks for tackling such nasty code...
Just trying to understand this one... These checks:

     476     if (sig == SIGSEGV || sig == SIGBUS) {
     480       if (addr < thread->stack_base() &&
     481           addr >= thread->stack_base() - thread->stack_size()) {
     483         if (thread->in_stack_yellow_zone(addr)) {
     485           if (thread->thread_state() == _thread_in_Java) {

tell us that we took a SIGSEGV or SIGBUS while running Java code
in the yellow zone of our stack... so stack overflow... which gets
us to this setting of "stub":

     488             stub = 
SharedRuntime::continuation_for_implicit_exception(thread, pc, 
SharedRuntime::STACK_OVERFLOW);

This line:

     519     if (thread->thread_state() == _thread_in_Java) {

gets us into another block of "stub" setting code, but it
currently doesn't care that "stub" was already set. That's
the code you're trying to fix with this new line:

519     if (thread->thread_state() == _thread_in_Java && stub == NULL) {

Just to be complete, I'm trying to understand which of the
many places that set "stub" is clobbering the existing value.
I think it is this code block:

     603       } else if ((sig == SIGSEGV || sig == SIGBUS) &&
     604                
!MacroAssembler::needs_explicit_null_check((intptr_t)info->si_addr)) {
     605           // Determination of interpreter/vtable stub/compiled 
code null exception
     606           stub = 
SharedRuntime::continuation_for_implicit_exception(thread, pc, 
SharedRuntime::IMPLICIT_NULL);
     607       }

With the new code in place, "stub == NULL" is false and we hit this
code block:

     608     } else if (thread->thread_state() == _thread_in_vm &&
     609                sig == SIGBUS && /* info->si_code == BUS_OBJERR 
&& */
     610                thread->doing_unsafe_access()) {
     611         stub = StubRoutines::handler_for_unsafe_access();
     612     }
     613
     614     // jni_fast_Get<Primitive>Field can trap at certain pc's if 
a GC kicks in
     615     // and the heap gets shrunk before the field access.
     616     if ((sig == SIGSEGV) || (sig == SIGBUS)) {
     617       address addr = JNI_FastGetField::find_slowcase_pc(pc);
     618       if (addr != (address)-1) {
     619         stub = addr;
     620       }
     621     }

We already know that the thread is _thread_in_Java so we don't take
the branch on 608. However, it looks like if the signal is SIGBUS,
then we'll make the call to JNI_FastGetField::find_slowcase_pc()
where we'll search some list of cached PCs (I think). I'm guessing
that since we're in _thread_in_Java that our PC won't match anything
on that list, but... I'm going to guess that the find_slowcase_pc()
call could be better guarded, but that's a different problem.

We'll bypass this check:

  627     if ((sig == SIGSEGV || sig == SIGBUS) &&
  628         os::is_memory_serialize_page(thread, (address) 
info->si_addr)) {

because we're not the serialization page...

We by-pass a huge block of #ifndef AMD64 code (I hope :-))...

And I think with the fixed version of line 519, we land here:

  714   if (stub != NULL) {
  715     // save all thread context in case we need to restore it
  716     if (thread != NULL) thread->set_saved_exception_pc(pc);
  717
  718     uc->context_pc = (intptr_t)stub;
  719     return true;
  720   }

Of course, with all the #ifdef and #ifndef stuff, I could be all
wet and don't really understand what this code is doing at all...

Thumbs up, I think... now my head hurts... :-)

Dan