JDK-8009302 (infinite recursion on AppKit thread / stack overflow)
Coleen Phillimore
coleen.phillimore at oracle.com
Thu Apr 11 07:16:36 PDT 2013
Hi,
This thread was one of the few I've saved because I think we could apply
this patch that he provided. I can't find any reason why we
discontinued alternate signal stacks except that there might have been
VM code that used the sp addresses to see which thread they were on. I
think there was some sort of check added and that we can't do this in
signal handling code (or should be it's in thread.cpp). We
discontinued alternate signal stacks a long time ago when the VM was
really different than it is today.
The other problem that I dug up from this thread was that the alternate
signal stacks used more memory. Gerard, does your patch only add the
alternate signal stack for the appkit thread and not all of the threads?
I really think this is a good idea to have some way to optionally catch
native stack overflows. We get complaints about this all the time!
It would need testing to make sure that the hs_err file clearly states
that the overflow is native code. I think we should do it.
I don't understand why the appkit thread is so different and why the
segv is not being reported to it from stack banging. It sounds like the
OS signal delivery is not what we expect? So if you do printf in
JVM_handle_bsd_signal, there is no printf for the stack bang?
Is there a way to make the code not so conditional?
thanks! This is a thorny first bug Gerard.
Coleen
On 04/11/2013 12:25 AM, David Holmes wrote:
> On 10/04/2013 11:38 PM, Gerard Ziemski wrote:
>> hi David,
>>
>> I have already analyzed the stack layouts and except for a much larger
>> stack size and larger native guard pages for AppKit thread I did not see
>> any differences (used Mac's own vmmap tool) - Hotspot own guard pages
>> were installed correctly. I admit, however, that I looked at that when I
>> was first learning about stacks and guard pages, so I might have missed
>> something - I will have another look.
>
> So where was the faulting address in relation to that stack layout?
>
>> But you are right, the question of why it works for all the other
>> threads except the AppKit one has not been answered, which is why I'm
>> not presenting this as an official fix with webrev, but just my findings
>> so far. I absolutely agree that this has to be answered before we commit
>> to a particular fix.
>>
>> My point here, however, is that using alt stack does help also in cases
>> where there is a recursion in native code as opposed to Java code (or
>> any other problem in user, or our, native code that leads to a corrupted
>> stack). With alt stack, Hotspot itself will receive the signal resulting
>> from native corrupted stack and will be reported to the user as detected
>> by Java Runtime Environment with a nice info dump, as opposed to current
>> behavior where the Java process simply terminates (at least on Mac) and
>> all we get is a native crash report.
>>
>> Of course I missed all the discussion about the problems with alt stack,
>> so I will have to go back to the archives to search for it - would that
>> be just hotspot-runtime-dev at openjdk.java.net or is there another place
>> this was discussed as well?
>
> See this thread:
>
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2011-August/002354.html
>
>
> I was left feeling that our stackoverflow logic wasn't necessarily
> working as might be expected. But I've not been in a position to dig
> into this.
>
> David
> -----
>
>>
>> cheers
>>
>> On 4/9/2013 7:55 PM, David Holmes wrote:
>>> Hi Gerard,
>>>
>>> Switching to alt-signal-stack is not a trivial change and has been
>>> discussed before (please search archives).
>>>
>>> As I wrote in the CR I still don't see anything showing exactly why
>>> the problem is arising. I'm less interested at this stage in what
>>> avoids the problem, than what causes the problem: why is the stack of
>>> the appkit thread laid out differently and what is that layout?
>>>
>>> Cheers,
>>> David
>>>
>>> On 10/04/2013 5:56 AM, Gerard Ziemski wrote:
>>>> hi guys,
>>>>
>>>> After learning about signals, stacks and threads I found one possible
>>>> solution that involves setting up an alternate stack (sigaltstack) and
>>>> asking signals to use it (SA_ONSTACK).
>>>>
>>>> This would require us to modify 2 files:
>>>>
>>>> - os_bsd.cpp, method os::Bsd::set_signal_handler needs:
>>>>
>>>> #if __APPLE__
>>>> // needed by AppKit thread
>>>> if (sig == SIGSEGV) {
>>>> sigAct.sa_flags |= SA_ONSTACK;
>>>> }
>>>> #endif
>>>>
>>>> - jni.cpp , method attach_current_thread (though it should probably
>>>> live
>>>> somewhere in thread.cpp instead) needs:
>>>>
>>>> #if __APPLE__
>>>> // create alternate stack for SIGSEGV
>>>> stack_t sigstack = {0};
>>>> sigstack.ss_flags = 0;
>>>> sigstack.ss_size = SIGSTKSZ;
>>>> sigstack.ss_sp = valloc(sigstack.ss_size); // page aligned memory
>>>> if (sigstack.ss_sp != NULL) {
>>>> if (sigaltstack(&sigstack, NULL) == -1) {
>>>> fprintf(stderr, "sigaltstack err\n");
>>>> }
>>>> } else {
>>>> fprintf(stderr, "valloc err\n");
>>>> }
>>>> #endif
>>>>
>>>> This would require us to increase slightly memory per app as the
>>>> alternate stack requires some extra memory (SIGSTKSZ, ie. 128K on
>>>> Mac),
>>>> though that value could be lowered down possibly.
>>>>
>>>> Another advantage here is that if recursion happens even somewhere in
>>>> native code, it will also be caught and proper Java dump stack
>>>> produced,
>>>> as opposed to Mac OS X CrashReporter popping up.
>>>>
>>>> During this morning meeting it was mentioned that we used to create
>>>> alternate stack, but that it wasn't reliable.
>>>>
>>>> What exactly was the problem with alt stack? Was it for a specific
>>>> platform?
>>>>
>>>> The fix here would only apply to Mac OS X.
>>>>
>>>> It's possible we can use XNU instead (just like
>>>> http://www.gnu.org/software/libsigsegv/) to catch the kernel level
>>>> signal and make sure our POSIX signal handler gets it, but what about
>>>> stack corruption if the thread that gets it was the one that had its
>>>> stack corrupted? (not sure how XNU handles this - I will look into
>>>> this)
>>>> Seems we must have a guaranteed clean stack somehow in this case?
>>>>
>>>>
>>>> I have attached the updated test cases and source code (the RecursionC
>>>> and Signals should be platform independent - though they have Xcode
>>>> projects provided, RecursionAppkit requires Mac and Xcode)
>>>>
>>>> BTW If someone could check out my "Signals" code and tell me why the
>>>> signals seem to be ignored with dedicated signal thread, that would be
>>>> great.
>>>>
>>>> https://jbs.oracle.com/bugs/browse/JDK-8009302
>>>>
>>>>
>>>> cheers
>>>
>>
More information about the hotspot-runtime-dev
mailing list