8065585: Change ShouldNotReachHere() to never return

Stefan Karlsson stefan.karlsson at oracle.com
Fri Apr 17 15:03:36 UTC 2015


On 2015-04-17 16:55, Mikael Gerdin wrote:
> On 2015-04-17 14:52, Stefan Karlsson wrote:
>>
>>
>> On 2015-04-17 13:49, Mikael Gerdin wrote:
>>> On 2015-04-16 15:32, Stefan Karlsson wrote:
>>>> On 2015-04-16 14:33, David Holmes wrote:
>>>>> Hi Stefan,
>>>>>
>>>>> trimming ...
>>>>>
>>>>> On 16/04/2015 10:07 PM, Stefan Karlsson wrote:
>>>>>> On 2015-04-16 04:23, David Holmes wrote:
>>>>>>> Second, more important question: have you examined how this 
>>>>>>> attribute
>>>>>>> affects the ability to walk the stack? We have already seen 
>>>>>>> issues on
>>>>>>> some platforms where library functions, like abort(), have the
>>>>>>> noreturn attribute and as a result the call is optimized in a way
>>>>>>> that
>>>>>>> prevents the stack from being walked - see eg:
>>>>>>>
>>>>>>> https://git.matricom.net/Firmware/bionic/commit/5f32207a3db0bea3ca1c7f4b2b563c11b895f276 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> though this:
>>>>>>>
>>>>>>> https://www.raspberrypi.org/forums/viewtopic.php?t=60540&p=451729
>>>>>>>
>>>>>>> suggests that problem may have been addressed by the libc folk.
>>>>>>> But it
>>>>>>> still raises the question as to how our own noreturn functions
>>>>>>> will be
>>>>>>> handled and how they will affect stacktrace generation in hs_err 
>>>>>>> logs
>>>>>>> or via gdb.
>>>>>>
>>>>>> I added a call to fatal(...) in the GC code. I get correct 
>>>>>> stacktraces
>>>>>> in gdb, but the stacktraces in the hs_err files are broken with
>>>>>> fastdebug and product builds:
>>>>>
>>>>> Which platforms?
>>>>
>>>> On Linux x86 and x86_64.
>>>>
>>>>>
>>>>>> Stack: [0x00007f12518d2000,0x00007f12519d3000], 
>>>>>> sp=0x00007f12519d0eb0,
>>>>>> free space=1019k
>>>>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
>>>>>> C=native code)
>>>>>> V  [libjvm.so+0x11db44a] VMError::report_and_die()+0x1ba
>>>>>> V  [libjvm.so+0x7efb80]  report_vm_error(char const*, int, char
>>>>>> const*,
>>>>>> char const*)+0x90
>>>>>> V  [libjvm.so+0x7efc49]  report_vm_error_noreturn(char const*, int,
>>>>>> char
>>>>>> const*, char const*)+0x9
>>>>>> V  [libjvm.so+0x7efc63]
>>>>>> V  [libjvm.so+0xfd7937]
>>>>>> V  [libjvm.so+0xfeec51]
>>>>>> ...
>>>>>
>>>>> So what is the plan: try to get hs_err working again? Or file this
>>>>> under "well it seemed like a good idea"? ;-)
>>>>
>>>> I'm leaning towards "seemed like a good idea", unless someone has an
>>>> easy fix for these problems.
>>>
>>> I've been looking a bit at this. It's not the stack trace per se that
>>> is broken, but the decoding of the function names is not working for
>>> some of the callers of the noreturn functions.
>>>
>>> I tried this with report_fatal using -XX:ErrorHandlerTest=5 and got
>>> the following:
>>>
>>> 0x7fb71ccd98d0 <report_fatal>:    push   %rbp
>>> 0x7fb71ccd98d1 <report_fatal+1>:    mov    %rdx,%rcx
>>> 0x7fb71ccd98d4 <report_fatal+4>:    lea 0x9b4b34(%rip),%rdx
>>> 0x7fb71ccd98db <report_fatal+11>:    mov    %rsp,%rbp
>>> 0x7fb71ccd98de <report_fatal+14>:    callq 0x7fb71ccd98c0
>>> 0x7fb71ccd98e3:    data16 data16 data16 nopw %cs:0x0(%rax,%rax,1)
>>>
>>> So the report_fatal frame has ...98e3 as its return address, but that
>>> is actually outside the function and this causes dladdr() to return
>>> NULL in dli_saddr and dli_sname.
>>>
>>> The JVM then attempts to decode using Decoder::decode but I wasn't
>>> able to follow that code to understand why that fails.
>>>
>>> The same appears to happen for the caller of report_fatal
>>> (controlled_crash in my case) but there I can't explain why dladdr
>>> returns NULL values there.
>>>
>>> After these two functions the rest of the stack trace appears to be
>>> correctly decoded.
>>>
>>> One approach could be to attempt to inject a "nop" at the end of
>>> functions which call a "noreturn" function. This would hopefully make
>>> the instruction after the call to the noreturn function part of the
>>> caller and would make symbol decoding work.
>>
>> I found this mail thread:
>> https://sourceware.org/bugzilla/show_bug.cgi?id=6522
>>
>> which blames the -fcross-jumping optimization.
>>
>> I recompiled hotspot with OPT_CFLAGS/debug.o=-fno-crossjumping, and now
>> I get correct stack traces with fastdebug on Linux 64 bits.
>
> I did a more thorough investigation into this on a slowdebug build, 
> and the reason for the symbols missing appears to be that after the 
> JVM's ELF Decoder runs into an un-decodeable symbol because a return 
> PC points to a nop in-between two symbols (because it's just called a 
> noreturn function) the Decoder sets m_status to FileInvalid and 
> refuses to decode any more symbols.
> If I comment out the code to set the fail status I get a fairly normal 
> hs err stacktrace:
>
> V  [libjvm.so+0xf184c8]  VMError::report(outputStream*)+0x133c
> V  [libjvm.so+0xf19865]  VMError::report_and_die()+0x411
> V  [libjvm.so+0x7876de]  report_vm_error(char const*, int, char 
> const*, char const*)+0xba
> V  [libjvm.so+0x7877d7]  report_vm_error_noreturn(char const*, int, 
> char const*, char const*)+0x3d
> V  [libjvm.so+0x78781b]  report_should_not_call(char const*, int)+0x0
> V  [libjvm.so+0x92bfeb]
> V  [libjvm.so+0x6e10ff] GenCollectorPolicy::mem_allocate_work(unsigned 
> long, bool, bool*)+0x283
> V  [libjvm.so+0x92c049]  GenCollectedHeap::mem_allocate(unsigned long, 
> bool*)+0x5d
> V  [libjvm.so+0x45dbe5] 
> CollectedHeap::common_mem_allocate_noinit(KlassHandle, unsigned long, 
> Thread*)+0x103
> V  [libjvm.so+0x45dda2] 
> CollectedHeap::common_mem_allocate_init(KlassHandle, unsigned long, 
> Thread*)+0x4e
> V  [libjvm.so+0x45e034] CollectedHeap::array_allocate(KlassHandle, 
> int, int, Thread*)+0xac
> V  [libjvm.so+0xed2f04]  TypeArrayKlass::allocate_common(int, bool, 
> Thread*)+0xf0
> V  [libjvm.so+0x44ae3e]  TypeArrayKlass::allocate(int, Thread*)+0x3e
> V  [libjvm.so+0xcef2d5]  oopFactory::new_typeArray(BasicType, int, 
> Thread*)+0x55
> V  [libjvm.so+0x9c5aa9]  InterpreterRuntime::newarray(JavaThread*, 
> BasicType, int)+0x147
> j  alloc.AllocArrays.main([Ljava/lang/String;)V+237
> v  ~StubRoutines::call_stub
> V  [libjvm.so+0x9df121]  JavaCalls::call_helper(JavaValue*, 
> methodHandle*, JavaCallArguments*, Thread*)+0x6b1
> V  [libjvm.so+0xd091d7]  os::os_exception_wrapper(void (*)(JavaValue*, 
> methodHandle*, JavaCallArguments*, Thread*), JavaValue*, 
> methodHandle*, JavaCallArguments*, Thread*)+0x41
> V  [libjvm.so+0x9dea5a]  JavaCalls::call(JavaValue*, methodHandle, 
> JavaCallArguments*, Thread*)+0x86
> V  [libjvm.so+0xa42306]  jni_invoke_static(JNIEnv_*, JavaValue*, 
> _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x200
> V  [libjvm.so+0xa5964a]  jni_CallStaticVoidMethod+0x353
> C  [libjli.so+0x86ed]  JavaMain+0x93c
> C  [libpthread.so.0+0x80a5]  start_thread+0xc5
>
> One problem is the line
> V  [libjvm.so+0x78781b]  report_should_not_call(char const*, int)+0x0
> I actually added a call to fatal(), but since fatal calls a noreturn 
> function the return pc of that frame accidentally points to the first 
> instruction in the next function, which happens to be 
> report_should_not_call.
>
> I wonder if this could be fixed by forcing gcc to empit a nop after 
> the call to report_vm_error_noreturn in report_fatal and friends.
> __asm__ __volatile__ ("nop" : : :);
> appears to not be enough. GCC is very aggressive with noreturn, even 
> with -O0.

And the reason why m_status was set to FileInvalid seems to be the bug 
in ElfSymbolTable::lookup, which returns true instead of false if it 
fails to find a symbol!:

bool ElfSymbolTable::lookup(address addr, int* stringtableIndex, int* 
posIndex, int* offset, ElfFuncDescTable* funcDescTable) {
...
   return true;
}

The caller will then think that the symbol was found and use the 
uninitialized output parameters.

StefanK

>
> /Mikael
>
>>
>> StefanK
>>>
>>> /Mikael
>>>
>>>>
>>>> Thanks,
>>>> StefanK
>>>>
>>>>>
>>>>> Cheers,
>>>>> David
>>>>>
>>>>>> Thanks,
>>>>>> StefanK
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> StefanK
>>>>>>>>
>>>>>>
>>>>
>>>
>>



More information about the hotspot-dev mailing list