JDK-8009302 (infinite recursion on AppKit thread / stack overflow)

Gerard Ziemski gerard.ziemski at oracle.com
Wed Apr 10 06:38:54 PDT 2013


hi David,

I have already analyzed the stack layouts and except for a much larger 
stack size and larger native guard pages for AppKit thread I did not see 
any differences (used Mac's own vmmap tool) - Hotspot own guard pages 
were installed correctly. I admit, however, that I looked at that when I 
was first learning about stacks and guard pages, so I might have missed 
something -  I will have another look.

But you are right, the question of why it works for all the other 
threads except the AppKit one has not been answered, which is why I'm 
not presenting this as an official fix with webrev, but just my findings 
so far. I absolutely agree that this has to be answered before we commit 
to a particular fix.

My point here, however, is that using alt stack does help also in cases 
where there is a recursion in native code as opposed to Java code (or 
any other problem in user, or our, native code that leads to a corrupted 
stack). With alt stack, Hotspot itself will receive the signal resulting 
from native corrupted stack and will be reported to the user as detected 
by Java Runtime Environment with a nice info dump, as opposed to current 
behavior where the Java process simply terminates (at least on Mac) and 
all we get is a native crash report.

Of course I missed all the discussion about the problems with alt stack, 
so I will have to go back to the archives to search for it - would that 
be just hotspot-runtime-dev at openjdk.java.net or is there another place 
this was discussed as well?


cheers

On 4/9/2013 7:55 PM, David Holmes wrote:
> Hi Gerard,
>
> Switching to alt-signal-stack is not a trivial change and has been 
> discussed before (please search archives).
>
> As I wrote in the CR I still don't see anything showing exactly why 
> the problem is arising. I'm less interested at this stage in what 
> avoids the problem, than what causes the problem: why is the stack of 
> the appkit thread laid out differently and what is that layout?
>
> Cheers,
> David
>
> On 10/04/2013 5:56 AM, Gerard Ziemski wrote:
>> hi guys,
>>
>> After learning about signals, stacks and threads I found one possible
>> solution that involves setting up an alternate stack (sigaltstack) and
>> asking signals to use it (SA_ONSTACK).
>>
>> This would require us to modify 2 files:
>>
>> - os_bsd.cpp, method os::Bsd::set_signal_handler needs:
>>
>> #if __APPLE__
>>      // needed by AppKit thread
>>      if (sig == SIGSEGV) {
>>          sigAct.sa_flags |= SA_ONSTACK;
>>      }
>> #endif
>>
>> - jni.cpp , method attach_current_thread (though it should probably live
>> somewhere in thread.cpp instead) needs:
>>
>> #if __APPLE__
>>      // create alternate stack for SIGSEGV
>>      stack_t sigstack = {0};
>>      sigstack.ss_flags = 0;
>>      sigstack.ss_size = SIGSTKSZ;
>>      sigstack.ss_sp = valloc(sigstack.ss_size); // page aligned memory
>>      if (sigstack.ss_sp != NULL) {
>>          if (sigaltstack(&sigstack, NULL) == -1) {
>>              fprintf(stderr, "sigaltstack err\n");
>>          }
>>      } else {
>>          fprintf(stderr, "valloc err\n");
>>      }
>> #endif
>>
>> This would require us to increase slightly memory per app as the
>> alternate stack requires some extra memory (SIGSTKSZ, ie. 128K on Mac),
>> though that value could be lowered down possibly.
>>
>> Another advantage here is that if recursion happens even somewhere in
>> native code, it will also be caught and proper Java dump stack produced,
>> as opposed to Mac OS X CrashReporter popping up.
>>
>> During this morning meeting it was mentioned that we used to create
>> alternate stack, but that it wasn't reliable.
>>
>> What exactly was the problem with alt stack? Was it for a specific
>> platform?
>>
>> The fix here would only apply to Mac OS X.
>>
>> It's possible we can use XNU instead (just like
>> http://www.gnu.org/software/libsigsegv/) to catch the kernel level
>> signal and make sure our POSIX signal handler gets it, but what about
>> stack corruption if the thread that gets it was the one that had its
>> stack corrupted? (not sure how XNU handles this - I will look into this)
>> Seems we must have a guaranteed clean stack somehow in this case?
>>
>>
>> I have attached the updated test cases and source code (the RecursionC
>> and Signals should be platform independent - though they have Xcode
>> projects provided, RecursionAppkit requires Mac and Xcode)
>>
>> BTW If someone could check out my "Signals" code and tell me why the
>> signals seem to be ignored with dedicated signal thread, that would be
>> great.
>>
>> https://jbs.oracle.com/bugs/browse/JDK-8009302
>>
>>
>> cheers
>



More information about the hotspot-runtime-dev mailing list