Can't get hs_err log on native stack overflow on Linux
Coleen Phillimore
coleen.phillimore at oracle.com
Tue Aug 9 02:47:33 PDT 2011
To handle large native stacks, you have to increase the StackShadowPages
so that they cover the estimated size of the native stacks.
StackRedPages and StackYellowPages should stay the same. That's how the
design is supposed to work, and it should work correctly on linux x86
and arm. If you have an infinite recursion on native frames you should
see that in a core file, as you would in a C or C++ implementation. The
JVM is only trying to handle Java stack overflows and tolerate native
code mixed in.
That said, I don't know why these linux alternate signal stacks were so
buggy or what versions of linux they were buggy on. Maybe it is worth
having this change if we can resolve it.
Coleen
On 8/9/2011 4:46 AM, Yasumasa Suenaga wrote:
> Hi, David,
>
> Thank you for checking the history.
>
>> What I can say is that the stack-banging that we do with the guard pages
>> was considered generally more reliable, and could be applied the same
>> way across all platforms. (The Solaris version also dropped all use of
>> alternate signal stacks for other reasons.)
> I've understood the history.
> I guess that is "-XX:AltStackSize" .
> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
>
>
> However, at least, VM stack guard page (RedZone: -XX:StackRedPages) does not
> work in now implementation (on Linux x86 / AMD64). So, I think that we should
> fix this problem to work this function.
>
>
> Thanks,
>
> Yasumasa
>
> (2011/08/09 17:16), David Holmes wrote:
>> Well I was right about there being history and wrong about the nature of
>> the history. Seems we used alternate signal stacks on Linux up till 1.5
>> when it was explicitly dropped:
>>
>> 4852809: Linux: do not use alternate signal stack
>>
>> Unfortunately that bug is not public so I can't divulge the reasoning
>> behind the change.
>>
>> What I can say is that the stack-banging that we do with the guard pages
>> was considered generally more reliable, and could be applied the same
>> way across all platforms. (The Solaris version also dropped all use of
>> alternate signal stacks for other reasons.)
>>
>> David
>>
>> Yasumasa Suenaga said the following on 08/09/11 17:26:
>>> Hi, David,
>>> Thank you for replying.
>>>
>>> (2011/08/09 15:51), David Holmes wrote:
>>>> Hi,
>>>>
>>>> I could be mistaken here but I believe the intent/hope is that any
>>>> stackoverflow will be caught when the guard pages set up by the VM are
>>>> accessed. In that way we haven't run out of true native stack and so we
>>>> can still process the signal that indicates the stack overflow. This is
>>>> not a perfect mechanism of course and there may be situations where you
>>>> can jump over the guard pages and truly exhaust the stack.
>>> Yes, I agree.
>>>
>>>> I also believe there is a bit of bad history here, where we had problems
>>>> trying to use alternative signal stacks on Linux. It will take me a bit
>>>> of archaeology to dig up relevant info on that.
>>> If you've dug up relevant info, please tell me.
>>>
>>>
>>> BTW, my patch provides new VM option "UseAlternateSignalStack" .
>>> If this option sets to false, this patch (sigaltstack) will not work.
>>>
>>> When it is a viewpoint of the troubleshooting, I want to this function.
>>> If I can get hs_err log at native stack overflow, I can certainly suggest
>>> expanding stack area (-Xss).
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>> David Holmes
>>>>
>>>> Yasumasa Suenaga said the following on 08/09/11 16:06:
>>>>> Hi,
>>>>>
>>>>> I encountered native stack overflow at JNI code on Linux (Fedora 15 and Ubuntu 11).
>>>>> I got coredump image, however, I could not get hs_err log.
>>>>>
>>>>> In the case of SIGSEGV, hs_err log is generated in signal handler. If native
>>>>> stack overflow occurred, Linux can't use stack area. So, SIGSEGV handler
>>>>> (JVM_handle_linux_signal) is never called.
>>>>>
>>>>> manpage of sigaltstack(2):
>>>>> /****************/
>>>>> NOTES
>>>>> The most common usage of an alternate signal stack is to handle the SIGSEGV sig‐
>>>>> nal that is generated if the space available for the normal process stack is
>>>>> exhausted: in this case, a signal handler for SIGSEGV cannot be invoked on the
>>>>> process stack; if we wish to handle it, we must use an alternate signal stack.
>>>>> /****************/
>>>>>
>>>>>
>>>>> If this patch is applied, we can get hs_err log on native stack overflow as follows:
>>>>>
>>>>> /****************/
>>>>> #
>>>>> # SIGSEGV (0xb) at pc=0x00007fb23f1265f7, pid=25748, tid=140403650643712
>>>>> # java.lang.StackOverflowError: Native stack
>>>>> #
>>>>> # JRE version: 8.0
>>>>> # Java VM: OpenJDK 64-Bit Server VM (22.0-b01 mixed mode linux-amd64 compressed oops)
>>>>> # Problematic frame:
>>>>> # C [liboverflow.so+0x5f7] Java_Main_doStackOverflow+0x3b
>>>>> /****************/
>>>>>
>>>>>
>>>>> I've attached this patch and testcase in this email. Please check it.
>>>>>
>>>>>
>>>>> I would like to contribute this patch, and I hope to apply this patch to
>>>>> JDK 6 / 7 / 8.
>>>>>
>>>>>
>>>>> Please cooperate.
>>>>>
>>>>> Best regards,
>>>>> Yasumasa
>>>>>
More information about the hotspot-runtime-dev
mailing list