Can't get hs_err log on native stack overflow on Linux
Yasumasa Suenaga
suenaga.yasumasa at oss.ntt.co.jp
Tue Aug 9 20:06:58 PDT 2011
Hi, Coleen,
I found info as follows. Do you mean it?
http://us.generation-nt.com/patch-fix-sigaltstack-corruption-among-cloned-threads-help-180626641.html
If "Linux bug" means this, I think that we can approach with 2 ways.
1. Prepare alternate signal stack per each threads
We call sigaltstack(2) in pthread entry point ( static void *java_start(Thread *thread) ),
and register memory free routine with pthread_cleanup_push() / pthread_cleanup_pop() .
In this way, alternate signal stack is available in each threads, however, we need more
memory.
2. signal mask setting
In now implementation, SIGSEGV handler is registered with sa_mask sets to full with
sigfillset(3) . So, when SIGSEGV handler is invoked, another signal handler is blocked.
However, in JVM_handle_linux_signal(), current signal (including SIGSEGV) is set UNBLOCK .
Thus, if we remove sigprocmask(2), alternate signal stack works fine (no stack confliction).
/************************/
// unmask current signal
sigset_t newset;
sigemptyset(&newset);
sigaddset(&newset, sig);
sigprocmask(SIG_UNBLOCK, &newset, NULL);
VMError err(t, sig, pc, info, ucVoid);
err.report_and_die();
ShouldNotReachHere();
/************************/
Thanks,
Yasumasa
(2011/08/09 18:59), Coleen Phillimore wrote:
> To answer my own question, alternate signal stacks consumed more memory
> and decreased the number of threads that can be created (if I'm reading
> this correctly).
>
> Coleen
>
> On 8/9/2011 5:47 AM, Coleen Phillimore wrote:
>> To handle large native stacks, you have to increase the StackShadowPages
>> so that they cover the estimated size of the native stacks.
>> StackRedPages and StackYellowPages should stay the same. That's how the
>> design is supposed to work, and it should work correctly on linux x86
>> and arm. If you have an infinite recursion on native frames you should
>> see that in a core file, as you would in a C or C++ implementation. The
>> JVM is only trying to handle Java stack overflows and tolerate native
>> code mixed in.
>>
>> That said, I don't know why these linux alternate signal stacks were so
>> buggy or what versions of linux they were buggy on. Maybe it is worth
>> having this change if we can resolve it.
>>
>> Coleen
>>
>> On 8/9/2011 4:46 AM, Yasumasa Suenaga wrote:
>>> Hi, David,
>>>
>>> Thank you for checking the history.
>>>
>>>> What I can say is that the stack-banging that we do with the guard pages
>>>> was considered generally more reliable, and could be applied the same
>>>> way across all platforms. (The Solaris version also dropped all use of
>>>> alternate signal stacks for other reasons.)
>>> I've understood the history.
>>> I guess that is "-XX:AltStackSize" .
>>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
>>>
>>>
>>> However, at least, VM stack guard page (RedZone: -XX:StackRedPages) does not
>>> work in now implementation (on Linux x86 / AMD64). So, I think that we should
>>> fix this problem to work this function.
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>> (2011/08/09 17:16), David Holmes wrote:
>>>> Well I was right about there being history and wrong about the nature of
>>>> the history. Seems we used alternate signal stacks on Linux up till 1.5
>>>> when it was explicitly dropped:
>>>>
>>>> 4852809: Linux: do not use alternate signal stack
>>>>
>>>> Unfortunately that bug is not public so I can't divulge the reasoning
>>>> behind the change.
>>>>
>>>> What I can say is that the stack-banging that we do with the guard pages
>>>> was considered generally more reliable, and could be applied the same
>>>> way across all platforms. (The Solaris version also dropped all use of
>>>> alternate signal stacks for other reasons.)
>>>>
>>>> David
>>>>
>>>> Yasumasa Suenaga said the following on 08/09/11 17:26:
>>>>> Hi, David,
>>>>> Thank you for replying.
>>>>>
>>>>> (2011/08/09 15:51), David Holmes wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I could be mistaken here but I believe the intent/hope is that any
>>>>>> stackoverflow will be caught when the guard pages set up by the VM are
>>>>>> accessed. In that way we haven't run out of true native stack and so we
>>>>>> can still process the signal that indicates the stack overflow. This is
>>>>>> not a perfect mechanism of course and there may be situations where you
>>>>>> can jump over the guard pages and truly exhaust the stack.
>>>>> Yes, I agree.
>>>>>
>>>>>> I also believe there is a bit of bad history here, where we had problems
>>>>>> trying to use alternative signal stacks on Linux. It will take me a bit
>>>>>> of archaeology to dig up relevant info on that.
>>>>> If you've dug up relevant info, please tell me.
>>>>>
>>>>>
>>>>> BTW, my patch provides new VM option "UseAlternateSignalStack" .
>>>>> If this option sets to false, this patch (sigaltstack) will not work.
>>>>>
>>>>> When it is a viewpoint of the troubleshooting, I want to this function.
>>>>> If I can get hs_err log at native stack overflow, I can certainly suggest
>>>>> expanding stack area (-Xss).
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>> David Holmes
>>>>>>
>>>>>> Yasumasa Suenaga said the following on 08/09/11 16:06:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I encountered native stack overflow at JNI code on Linux (Fedora 15 and Ubuntu 11).
>>>>>>> I got coredump image, however, I could not get hs_err log.
>>>>>>>
>>>>>>> In the case of SIGSEGV, hs_err log is generated in signal handler. If native
>>>>>>> stack overflow occurred, Linux can't use stack area. So, SIGSEGV handler
>>>>>>> (JVM_handle_linux_signal) is never called.
>>>>>>>
>>>>>>> manpage of sigaltstack(2):
>>>>>>> /****************/
>>>>>>> NOTES
>>>>>>> The most common usage of an alternate signal stack is to handle the SIGSEGV sig‐
>>>>>>> nal that is generated if the space available for the normal process stack is
>>>>>>> exhausted: in this case, a signal handler for SIGSEGV cannot be invoked on the
>>>>>>> process stack; if we wish to handle it, we must use an alternate signal stack.
>>>>>>> /****************/
>>>>>>>
>>>>>>>
>>>>>>> If this patch is applied, we can get hs_err log on native stack overflow as follows:
>>>>>>>
>>>>>>> /****************/
>>>>>>> #
>>>>>>> # SIGSEGV (0xb) at pc=0x00007fb23f1265f7, pid=25748, tid=140403650643712
>>>>>>> # java.lang.StackOverflowError: Native stack
>>>>>>> #
>>>>>>> # JRE version: 8.0
>>>>>>> # Java VM: OpenJDK 64-Bit Server VM (22.0-b01 mixed mode linux-amd64 compressed oops)
>>>>>>> # Problematic frame:
>>>>>>> # C [liboverflow.so+0x5f7] Java_Main_doStackOverflow+0x3b
>>>>>>> /****************/
>>>>>>>
>>>>>>>
>>>>>>> I've attached this patch and testcase in this email. Please check it.
>>>>>>>
>>>>>>>
>>>>>>> I would like to contribute this patch, and I hope to apply this patch to
>>>>>>> JDK 6 / 7 / 8.
>>>>>>>
>>>>>>>
>>>>>>> Please cooperate.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Yasumasa
>>>>>>>
More information about the hotspot-runtime-dev
mailing list