Can't get hs_err log on native stack overflow on Linux
Coleen Phillimore
coleen.phillimore at oracle.com
Tue Aug 9 02:59:11 PDT 2011
To answer my own question, alternate signal stacks consumed more memory
and decreased the number of threads that can be created (if I'm reading
this correctly).
Coleen
On 8/9/2011 5:47 AM, Coleen Phillimore wrote:
> To handle large native stacks, you have to increase the StackShadowPages
> so that they cover the estimated size of the native stacks.
> StackRedPages and StackYellowPages should stay the same. That's how the
> design is supposed to work, and it should work correctly on linux x86
> and arm. If you have an infinite recursion on native frames you should
> see that in a core file, as you would in a C or C++ implementation. The
> JVM is only trying to handle Java stack overflows and tolerate native
> code mixed in.
>
> That said, I don't know why these linux alternate signal stacks were so
> buggy or what versions of linux they were buggy on. Maybe it is worth
> having this change if we can resolve it.
>
> Coleen
>
> On 8/9/2011 4:46 AM, Yasumasa Suenaga wrote:
>> Hi, David,
>>
>> Thank you for checking the history.
>>
>>> What I can say is that the stack-banging that we do with the guard pages
>>> was considered generally more reliable, and could be applied the same
>>> way across all platforms. (The Solaris version also dropped all use of
>>> alternate signal stacks for other reasons.)
>> I've understood the history.
>> I guess that is "-XX:AltStackSize" .
>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
>>
>>
>> However, at least, VM stack guard page (RedZone: -XX:StackRedPages) does not
>> work in now implementation (on Linux x86 / AMD64). So, I think that we should
>> fix this problem to work this function.
>>
>>
>> Thanks,
>>
>> Yasumasa
>>
>> (2011/08/09 17:16), David Holmes wrote:
>>> Well I was right about there being history and wrong about the nature of
>>> the history. Seems we used alternate signal stacks on Linux up till 1.5
>>> when it was explicitly dropped:
>>>
>>> 4852809: Linux: do not use alternate signal stack
>>>
>>> Unfortunately that bug is not public so I can't divulge the reasoning
>>> behind the change.
>>>
>>> What I can say is that the stack-banging that we do with the guard pages
>>> was considered generally more reliable, and could be applied the same
>>> way across all platforms. (The Solaris version also dropped all use of
>>> alternate signal stacks for other reasons.)
>>>
>>> David
>>>
>>> Yasumasa Suenaga said the following on 08/09/11 17:26:
>>>> Hi, David,
>>>> Thank you for replying.
>>>>
>>>> (2011/08/09 15:51), David Holmes wrote:
>>>>> Hi,
>>>>>
>>>>> I could be mistaken here but I believe the intent/hope is that any
>>>>> stackoverflow will be caught when the guard pages set up by the VM are
>>>>> accessed. In that way we haven't run out of true native stack and so we
>>>>> can still process the signal that indicates the stack overflow. This is
>>>>> not a perfect mechanism of course and there may be situations where you
>>>>> can jump over the guard pages and truly exhaust the stack.
>>>> Yes, I agree.
>>>>
>>>>> I also believe there is a bit of bad history here, where we had problems
>>>>> trying to use alternative signal stacks on Linux. It will take me a bit
>>>>> of archaeology to dig up relevant info on that.
>>>> If you've dug up relevant info, please tell me.
>>>>
>>>>
>>>> BTW, my patch provides new VM option "UseAlternateSignalStack" .
>>>> If this option sets to false, this patch (sigaltstack) will not work.
>>>>
>>>> When it is a viewpoint of the troubleshooting, I want to this function.
>>>> If I can get hs_err log at native stack overflow, I can certainly suggest
>>>> expanding stack area (-Xss).
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>> David Holmes
>>>>>
>>>>> Yasumasa Suenaga said the following on 08/09/11 16:06:
>>>>>> Hi,
>>>>>>
>>>>>> I encountered native stack overflow at JNI code on Linux (Fedora 15 and Ubuntu 11).
>>>>>> I got coredump image, however, I could not get hs_err log.
>>>>>>
>>>>>> In the case of SIGSEGV, hs_err log is generated in signal handler. If native
>>>>>> stack overflow occurred, Linux can't use stack area. So, SIGSEGV handler
>>>>>> (JVM_handle_linux_signal) is never called.
>>>>>>
>>>>>> manpage of sigaltstack(2):
>>>>>> /****************/
>>>>>> NOTES
>>>>>> The most common usage of an alternate signal stack is to handle the SIGSEGV sig‐
>>>>>> nal that is generated if the space available for the normal process stack is
>>>>>> exhausted: in this case, a signal handler for SIGSEGV cannot be invoked on the
>>>>>> process stack; if we wish to handle it, we must use an alternate signal stack.
>>>>>> /****************/
>>>>>>
>>>>>>
>>>>>> If this patch is applied, we can get hs_err log on native stack overflow as follows:
>>>>>>
>>>>>> /****************/
>>>>>> #
>>>>>> # SIGSEGV (0xb) at pc=0x00007fb23f1265f7, pid=25748, tid=140403650643712
>>>>>> # java.lang.StackOverflowError: Native stack
>>>>>> #
>>>>>> # JRE version: 8.0
>>>>>> # Java VM: OpenJDK 64-Bit Server VM (22.0-b01 mixed mode linux-amd64 compressed oops)
>>>>>> # Problematic frame:
>>>>>> # C [liboverflow.so+0x5f7] Java_Main_doStackOverflow+0x3b
>>>>>> /****************/
>>>>>>
>>>>>>
>>>>>> I've attached this patch and testcase in this email. Please check it.
>>>>>>
>>>>>>
>>>>>> I would like to contribute this patch, and I hope to apply this patch to
>>>>>> JDK 6 / 7 / 8.
>>>>>>
>>>>>>
>>>>>> Please cooperate.
>>>>>>
>>>>>> Best regards,
>>>>>> Yasumasa
>>>>>>
More information about the hotspot-runtime-dev
mailing list