8065585: Change ShouldNotReachHere() to never return
Mikael Gerdin
mikael.gerdin at oracle.com
Fri Apr 17 14:55:22 UTC 2015
On 2015-04-17 14:52, Stefan Karlsson wrote:
>
>
> On 2015-04-17 13:49, Mikael Gerdin wrote:
>> On 2015-04-16 15:32, Stefan Karlsson wrote:
>>> On 2015-04-16 14:33, David Holmes wrote:
>>>> Hi Stefan,
>>>>
>>>> trimming ...
>>>>
>>>> On 16/04/2015 10:07 PM, Stefan Karlsson wrote:
>>>>> On 2015-04-16 04:23, David Holmes wrote:
>>>>>> Second, more important question: have you examined how this attribute
>>>>>> affects the ability to walk the stack? We have already seen issues on
>>>>>> some platforms where library functions, like abort(), have the
>>>>>> noreturn attribute and as a result the call is optimized in a way
>>>>>> that
>>>>>> prevents the stack from being walked - see eg:
>>>>>>
>>>>>> https://git.matricom.net/Firmware/bionic/commit/5f32207a3db0bea3ca1c7f4b2b563c11b895f276
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> though this:
>>>>>>
>>>>>> https://www.raspberrypi.org/forums/viewtopic.php?t=60540&p=451729
>>>>>>
>>>>>> suggests that problem may have been addressed by the libc folk.
>>>>>> But it
>>>>>> still raises the question as to how our own noreturn functions
>>>>>> will be
>>>>>> handled and how they will affect stacktrace generation in hs_err logs
>>>>>> or via gdb.
>>>>>
>>>>> I added a call to fatal(...) in the GC code. I get correct stacktraces
>>>>> in gdb, but the stacktraces in the hs_err files are broken with
>>>>> fastdebug and product builds:
>>>>
>>>> Which platforms?
>>>
>>> On Linux x86 and x86_64.
>>>
>>>>
>>>>> Stack: [0x00007f12518d2000,0x00007f12519d3000], sp=0x00007f12519d0eb0,
>>>>> free space=1019k
>>>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
>>>>> C=native code)
>>>>> V [libjvm.so+0x11db44a] VMError::report_and_die()+0x1ba
>>>>> V [libjvm.so+0x7efb80] report_vm_error(char const*, int, char
>>>>> const*,
>>>>> char const*)+0x90
>>>>> V [libjvm.so+0x7efc49] report_vm_error_noreturn(char const*, int,
>>>>> char
>>>>> const*, char const*)+0x9
>>>>> V [libjvm.so+0x7efc63]
>>>>> V [libjvm.so+0xfd7937]
>>>>> V [libjvm.so+0xfeec51]
>>>>> ...
>>>>
>>>> So what is the plan: try to get hs_err working again? Or file this
>>>> under "well it seemed like a good idea"? ;-)
>>>
>>> I'm leaning towards "seemed like a good idea", unless someone has an
>>> easy fix for these problems.
>>
>> I've been looking a bit at this. It's not the stack trace per se that
>> is broken, but the decoding of the function names is not working for
>> some of the callers of the noreturn functions.
>>
>> I tried this with report_fatal using -XX:ErrorHandlerTest=5 and got
>> the following:
>>
>> 0x7fb71ccd98d0 <report_fatal>: push %rbp
>> 0x7fb71ccd98d1 <report_fatal+1>: mov %rdx,%rcx
>> 0x7fb71ccd98d4 <report_fatal+4>: lea 0x9b4b34(%rip),%rdx
>> 0x7fb71ccd98db <report_fatal+11>: mov %rsp,%rbp
>> 0x7fb71ccd98de <report_fatal+14>: callq 0x7fb71ccd98c0
>> 0x7fb71ccd98e3: data16 data16 data16 nopw %cs:0x0(%rax,%rax,1)
>>
>> So the report_fatal frame has ...98e3 as its return address, but that
>> is actually outside the function and this causes dladdr() to return
>> NULL in dli_saddr and dli_sname.
>>
>> The JVM then attempts to decode using Decoder::decode but I wasn't
>> able to follow that code to understand why that fails.
>>
>> The same appears to happen for the caller of report_fatal
>> (controlled_crash in my case) but there I can't explain why dladdr
>> returns NULL values there.
>>
>> After these two functions the rest of the stack trace appears to be
>> correctly decoded.
>>
>> One approach could be to attempt to inject a "nop" at the end of
>> functions which call a "noreturn" function. This would hopefully make
>> the instruction after the call to the noreturn function part of the
>> caller and would make symbol decoding work.
>
> I found this mail thread:
> https://sourceware.org/bugzilla/show_bug.cgi?id=6522
>
> which blames the -fcross-jumping optimization.
>
> I recompiled hotspot with OPT_CFLAGS/debug.o=-fno-crossjumping, and now
> I get correct stack traces with fastdebug on Linux 64 bits.
I did a more thorough investigation into this on a slowdebug build, and
the reason for the symbols missing appears to be that after the JVM's
ELF Decoder runs into an un-decodeable symbol because a return PC points
to a nop in-between two symbols (because it's just called a noreturn
function) the Decoder sets m_status to FileInvalid and refuses to decode
any more symbols.
If I comment out the code to set the fail status I get a fairly normal
hs err stacktrace:
V [libjvm.so+0xf184c8] VMError::report(outputStream*)+0x133c
V [libjvm.so+0xf19865] VMError::report_and_die()+0x411
V [libjvm.so+0x7876de] report_vm_error(char const*, int, char const*,
char const*)+0xba
V [libjvm.so+0x7877d7] report_vm_error_noreturn(char const*, int, char
const*, char const*)+0x3d
V [libjvm.so+0x78781b] report_should_not_call(char const*, int)+0x0
V [libjvm.so+0x92bfeb]
V [libjvm.so+0x6e10ff] GenCollectorPolicy::mem_allocate_work(unsigned
long, bool, bool*)+0x283
V [libjvm.so+0x92c049] GenCollectedHeap::mem_allocate(unsigned long,
bool*)+0x5d
V [libjvm.so+0x45dbe5]
CollectedHeap::common_mem_allocate_noinit(KlassHandle, unsigned long,
Thread*)+0x103
V [libjvm.so+0x45dda2]
CollectedHeap::common_mem_allocate_init(KlassHandle, unsigned long,
Thread*)+0x4e
V [libjvm.so+0x45e034] CollectedHeap::array_allocate(KlassHandle, int,
int, Thread*)+0xac
V [libjvm.so+0xed2f04] TypeArrayKlass::allocate_common(int, bool,
Thread*)+0xf0
V [libjvm.so+0x44ae3e] TypeArrayKlass::allocate(int, Thread*)+0x3e
V [libjvm.so+0xcef2d5] oopFactory::new_typeArray(BasicType, int,
Thread*)+0x55
V [libjvm.so+0x9c5aa9] InterpreterRuntime::newarray(JavaThread*,
BasicType, int)+0x147
j alloc.AllocArrays.main([Ljava/lang/String;)V+237
v ~StubRoutines::call_stub
V [libjvm.so+0x9df121] JavaCalls::call_helper(JavaValue*,
methodHandle*, JavaCallArguments*, Thread*)+0x6b1
V [libjvm.so+0xd091d7] os::os_exception_wrapper(void (*)(JavaValue*,
methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*,
JavaCallArguments*, Thread*)+0x41
V [libjvm.so+0x9dea5a] JavaCalls::call(JavaValue*, methodHandle,
JavaCallArguments*, Thread*)+0x86
V [libjvm.so+0xa42306] jni_invoke_static(JNIEnv_*, JavaValue*,
_jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x200
V [libjvm.so+0xa5964a] jni_CallStaticVoidMethod+0x353
C [libjli.so+0x86ed] JavaMain+0x93c
C [libpthread.so.0+0x80a5] start_thread+0xc5
One problem is the line
V [libjvm.so+0x78781b] report_should_not_call(char const*, int)+0x0
I actually added a call to fatal(), but since fatal calls a noreturn
function the return pc of that frame accidentally points to the first
instruction in the next function, which happens to be
report_should_not_call.
I wonder if this could be fixed by forcing gcc to empit a nop after the
call to report_vm_error_noreturn in report_fatal and friends.
__asm__ __volatile__ ("nop" : : :);
appears to not be enough. GCC is very aggressive with noreturn, even
with -O0.
/Mikael
>
> StefanK
>>
>> /Mikael
>>
>>>
>>> Thanks,
>>> StefanK
>>>
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>>> Thanks,
>>>>> StefanK
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> StefanK
>>>>>>>
>>>>>
>>>
>>
>
More information about the hotspot-dev
mailing list