JDK-8009302 (infinite recursion on AppKit thread / stack overflow)

David Holmes david.holmes at oracle.com
Wed Apr 10 21:25:09 PDT 2013


On 10/04/2013 11:38 PM, Gerard Ziemski wrote:
> hi David,
>
> I have already analyzed the stack layouts and except for a much larger
> stack size and larger native guard pages for AppKit thread I did not see
> any differences (used Mac's own vmmap tool) - Hotspot own guard pages
> were installed correctly. I admit, however, that I looked at that when I
> was first learning about stacks and guard pages, so I might have missed
> something -  I will have another look.

So where was the faulting address in relation to that stack layout?

> But you are right, the question of why it works for all the other
> threads except the AppKit one has not been answered, which is why I'm
> not presenting this as an official fix with webrev, but just my findings
> so far. I absolutely agree that this has to be answered before we commit
> to a particular fix.
>
> My point here, however, is that using alt stack does help also in cases
> where there is a recursion in native code as opposed to Java code (or
> any other problem in user, or our, native code that leads to a corrupted
> stack). With alt stack, Hotspot itself will receive the signal resulting
> from native corrupted stack and will be reported to the user as detected
> by Java Runtime Environment with a nice info dump, as opposed to current
> behavior where the Java process simply terminates (at least on Mac) and
> all we get is a native crash report.
>
> Of course I missed all the discussion about the problems with alt stack,
> so I will have to go back to the archives to search for it - would that
> be just hotspot-runtime-dev at openjdk.java.net or is there another place
> this was discussed as well?

See this thread:

http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2011-August/002354.html

I was left feeling that our stackoverflow logic wasn't necessarily 
working as might be expected. But I've not been in a position to dig 
into this.

David
-----

>
> cheers
>
> On 4/9/2013 7:55 PM, David Holmes wrote:
>> Hi Gerard,
>>
>> Switching to alt-signal-stack is not a trivial change and has been
>> discussed before (please search archives).
>>
>> As I wrote in the CR I still don't see anything showing exactly why
>> the problem is arising. I'm less interested at this stage in what
>> avoids the problem, than what causes the problem: why is the stack of
>> the appkit thread laid out differently and what is that layout?
>>
>> Cheers,
>> David
>>
>> On 10/04/2013 5:56 AM, Gerard Ziemski wrote:
>>> hi guys,
>>>
>>> After learning about signals, stacks and threads I found one possible
>>> solution that involves setting up an alternate stack (sigaltstack) and
>>> asking signals to use it (SA_ONSTACK).
>>>
>>> This would require us to modify 2 files:
>>>
>>> - os_bsd.cpp, method os::Bsd::set_signal_handler needs:
>>>
>>> #if __APPLE__
>>>      // needed by AppKit thread
>>>      if (sig == SIGSEGV) {
>>>          sigAct.sa_flags |= SA_ONSTACK;
>>>      }
>>> #endif
>>>
>>> - jni.cpp , method attach_current_thread (though it should probably live
>>> somewhere in thread.cpp instead) needs:
>>>
>>> #if __APPLE__
>>>      // create alternate stack for SIGSEGV
>>>      stack_t sigstack = {0};
>>>      sigstack.ss_flags = 0;
>>>      sigstack.ss_size = SIGSTKSZ;
>>>      sigstack.ss_sp = valloc(sigstack.ss_size); // page aligned memory
>>>      if (sigstack.ss_sp != NULL) {
>>>          if (sigaltstack(&sigstack, NULL) == -1) {
>>>              fprintf(stderr, "sigaltstack err\n");
>>>          }
>>>      } else {
>>>          fprintf(stderr, "valloc err\n");
>>>      }
>>> #endif
>>>
>>> This would require us to increase slightly memory per app as the
>>> alternate stack requires some extra memory (SIGSTKSZ, ie. 128K on Mac),
>>> though that value could be lowered down possibly.
>>>
>>> Another advantage here is that if recursion happens even somewhere in
>>> native code, it will also be caught and proper Java dump stack produced,
>>> as opposed to Mac OS X CrashReporter popping up.
>>>
>>> During this morning meeting it was mentioned that we used to create
>>> alternate stack, but that it wasn't reliable.
>>>
>>> What exactly was the problem with alt stack? Was it for a specific
>>> platform?
>>>
>>> The fix here would only apply to Mac OS X.
>>>
>>> It's possible we can use XNU instead (just like
>>> http://www.gnu.org/software/libsigsegv/) to catch the kernel level
>>> signal and make sure our POSIX signal handler gets it, but what about
>>> stack corruption if the thread that gets it was the one that had its
>>> stack corrupted? (not sure how XNU handles this - I will look into this)
>>> Seems we must have a guaranteed clean stack somehow in this case?
>>>
>>>
>>> I have attached the updated test cases and source code (the RecursionC
>>> and Signals should be platform independent - though they have Xcode
>>> projects provided, RecursionAppkit requires Mac and Xcode)
>>>
>>> BTW If someone could check out my "Signals" code and tell me why the
>>> signals seem to be ignored with dedicated signal thread, that would be
>>> great.
>>>
>>> https://jbs.oracle.com/bugs/browse/JDK-8009302
>>>
>>>
>>> cheers
>>
>


More information about the hotspot-runtime-dev mailing list