Can't get hs_err log on native stack overflow on Linux
Yasumasa Suenaga
suenaga.yasumasa at oss.ntt.co.jp
Tue Aug 9 17:54:21 PDT 2011
Hi,
I agree to David.
BTW, I would like to explain about trouble of my customer.
My customer runs J2EE application (Pure Java) on JBoss.
java process (on RHEL5 x86_64) which runs JBoss had gone suddenly.
I requested hs_err log and core image. However, customer couldn't
find hs_err log.
I got core image and syslog (/var/log/messages), and checked Java
level stack trace with jstack.
/***************/
Thread 11548: (state = IN_NATIVE)
- java.net.SocketOutputStream.socketWrite0(java.io.FileDescriptor, byte[], int, int) @bci=0 (Interpreted frame)
- java.net.SocketOutputStream.socketWrite(byte[], int, int) @bci=44, line=92 (Interpreted frame)
- java.net.SocketOutputStream.write(byte[], int, int) @bci=4, line=136 (Interpreted frame)
- oracle.net.ns.DataPacket.send(int) @bci=144, line=199 (Interpreted frame)
- oracle.net.ns.NetOutputStream.flush() @bci=15, line=211 (Interpreted frame)
- oracle.net.ns.NetInputStream.getNextPacket() @bci=41, line=227 (Interpreted frame)
:
/***************/
This thread has 397 Java frames !!
In core image, crashed instruction is "MOV" which has RSP register
in destination operand. Value of RSP points memory region which has
no permission.
/***************/
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
:
LOAD 0x0000000001fba000 0x0000000042e2e000 0x0000000000000000
0x0000000000003000 0x0000000000003000 1000
:
/***************/
Thus I was convinced that this crash was caused by native stack overflow.
I suggested expanding stack size (-Xss), and customer has not reproduced
this trouble.
My customer sets "-Xss128k" to reduce physical memory usage (for native
thread stack, not Java Heap) because there is a possibility of generating
thousands of threads in JBoss.
In this case, frankly speaking, Java application is bad :-p
and I think that this is an unusual case.
However, Java class library has JNI implementation such as Network I/O .
So, This problem happens anywhere in Pure Java application.
Thus I made a patch and posted it, and I think that we should fix this
problem to work this function.
Thanks,
Yasumasa
(2011/08/09 19:38), David Holmes wrote:
> Dmitry Samersoff said the following on 08/09/11 19:30:
>> Yasumasa,
>>
>> Try to increase stack guard size by -XX:StackShadowPages=...
>> It should work since 6u25 (hs20) see. 6983240 for details.
>
> Changing the number of shadow pages has no affect here. It seems that when native code consumes all the stack the VM does not trap it or report it:
>
> // Handle ALL stack overflow variations here
> if (sig == SIGSEGV && info->si_code == SEGV_ACCERR) {
> address addr = (address) info->si_addr;
> if (thread->in_stack_yellow_zone(addr)) {
> thread->disable_stack_yellow_zone();
> if (thread->thread_state() == _thread_in_Java) {
> // Throw a stack overflow exception. Guard pages will be reenabled
> // while unwinding the stack.
> stub = SharedRuntime::continuation_for_implicit_exception(thread, pc, SharedRuntime::STACK_OVERFLOW);
> } else {
> // Thread was in the vm or native code. Return and try to finish.
> return true;
> }
> } else if (thread->in_stack_red_zone(addr)) {
> // Fatal red zone violation. Disable the guard pages and fall through
> // to handle_unexpected_exception way down below.
> thread->disable_stack_red_zone();
> tty->print_raw_cr("An irrecoverable stack overflow has occurred.");
> }
>
> If we hit the yellow zone while in native the signal handler just returns. If we hit the red zone then we should enter fatal error handling but that doesn't seem to happen. I'd need to trace through the signal code to see exactly where we end up.
>
> David
>
>> -Dmitry
>>
>>
>> On 2011-08-09 12:46, Yasumasa Suenaga wrote:
>>> Hi, David,
>>>
>>> Thank you for checking the history.
>>>
>>>> What I can say is that the stack-banging that we do with the guard pages
>>>> was considered generally more reliable, and could be applied the same
>>>> way across all platforms. (The Solaris version also dropped all use of
>>>> alternate signal stacks for other reasons.)
>>>
>>> I've understood the history.
>>> I guess that is "-XX:AltStackSize" .
>>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
>>>
>>>
>>> However, at least, VM stack guard page (RedZone: -XX:StackRedPages) does not
>>> work in now implementation (on Linux x86 / AMD64). So, I think that we should
>>> fix this problem to work this function.
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>> (2011/08/09 17:16), David Holmes wrote:
>>>> Well I was right about there being history and wrong about the nature of
>>>> the history. Seems we used alternate signal stacks on Linux up till 1.5
>>>> when it was explicitly dropped:
>>>>
>>>> 4852809: Linux: do not use alternate signal stack
>>>>
>>>> Unfortunately that bug is not public so I can't divulge the reasoning
>>>> behind the change.
>>>>
>>>> What I can say is that the stack-banging that we do with the guard pages
>>>> was considered generally more reliable, and could be applied the same
>>>> way across all platforms. (The Solaris version also dropped all use of
>>>> alternate signal stacks for other reasons.)
>>>>
>>>> David
>>>>
>>>> Yasumasa Suenaga said the following on 08/09/11 17:26:
>>>>> Hi, David,
>>>>> Thank you for replying.
>>>>>
>>>>> (2011/08/09 15:51), David Holmes wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I could be mistaken here but I believe the intent/hope is that any
>>>>>> stackoverflow will be caught when the guard pages set up by the VM are
>>>>>> accessed. In that way we haven't run out of true native stack and so we
>>>>>> can still process the signal that indicates the stack overflow. This is
>>>>>> not a perfect mechanism of course and there may be situations where you
>>>>>> can jump over the guard pages and truly exhaust the stack.
>>>>>
>>>>> Yes, I agree.
>>>>>
>>>>>> I also believe there is a bit of bad history here, where we had problems
>>>>>> trying to use alternative signal stacks on Linux. It will take me a bit
>>>>>> of archaeology to dig up relevant info on that.
>>>>>
>>>>> If you've dug up relevant info, please tell me.
>>>>>
>>>>>
>>>>> BTW, my patch provides new VM option "UseAlternateSignalStack" .
>>>>> If this option sets to false, this patch (sigaltstack) will not work.
>>>>>
>>>>> When it is a viewpoint of the troubleshooting, I want to this function.
>>>>> If I can get hs_err log at native stack overflow, I can certainly suggest
>>>>> expanding stack area (-Xss).
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>> David Holmes
>>>>>>
>>>>>> Yasumasa Suenaga said the following on 08/09/11 16:06:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I encountered native stack overflow at JNI code on Linux (Fedora 15 and Ubuntu 11).
>>>>>>> I got coredump image, however, I could not get hs_err log.
>>>>>>>
>>>>>>> In the case of SIGSEGV, hs_err log is generated in signal handler. If native
>>>>>>> stack overflow occurred, Linux can't use stack area. So, SIGSEGV handler
>>>>>>> (JVM_handle_linux_signal) is never called.
>>>>>>>
>>>>>>> manpage of sigaltstack(2):
>>>>>>> /****************/
>>>>>>> NOTES
>>>>>>> The most common usage of an alternate signal stack is to handle the SIGSEGV sig‐
>>>>>>> nal that is generated if the space available for the normal process stack is
>>>>>>> exhausted: in this case, a signal handler for SIGSEGV cannot be invoked on the
>>>>>>> process stack; if we wish to handle it, we must use an alternate signal stack.
>>>>>>> /****************/
>>>>>>>
>>>>>>>
>>>>>>> If this patch is applied, we can get hs_err log on native stack overflow as follows:
>>>>>>>
>>>>>>> /****************/
>>>>>>> #
>>>>>>> # SIGSEGV (0xb) at pc=0x00007fb23f1265f7, pid=25748, tid=140403650643712
>>>>>>> # java.lang.StackOverflowError: Native stack
>>>>>>> #
>>>>>>> # JRE version: 8.0
>>>>>>> # Java VM: OpenJDK 64-Bit Server VM (22.0-b01 mixed mode linux-amd64 compressed oops)
>>>>>>> # Problematic frame:
>>>>>>> # C [liboverflow.so+0x5f7] Java_Main_doStackOverflow+0x3b
>>>>>>> /****************/
>>>>>>>
>>>>>>>
>>>>>>> I've attached this patch and testcase in this email. Please check it.
>>>>>>>
>>>>>>>
>>>>>>> I would like to contribute this patch, and I hope to apply this patch to
>>>>>>> JDK 6 / 7 / 8.
>>>>>>>
>>>>>>>
>>>>>>> Please cooperate.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Yasumasa
>>>>>>>
>>>>>
>>
>>
More information about the hotspot-runtime-dev
mailing list