8065585: Change ShouldNotReachHere() to never return

Mikael Gerdin mikael.gerdin at oracle.com
Fri Apr 17 14:55:22 UTC 2015


On 2015-04-17 14:52, Stefan Karlsson wrote:
>
>
> On 2015-04-17 13:49, Mikael Gerdin wrote:
>> On 2015-04-16 15:32, Stefan Karlsson wrote:
>>> On 2015-04-16 14:33, David Holmes wrote:
>>>> Hi Stefan,
>>>>
>>>> trimming ...
>>>>
>>>> On 16/04/2015 10:07 PM, Stefan Karlsson wrote:
>>>>> On 2015-04-16 04:23, David Holmes wrote:
>>>>>> Second, more important question: have you examined how this attribute
>>>>>> affects the ability to walk the stack? We have already seen issues on
>>>>>> some platforms where library functions, like abort(), have the
>>>>>> noreturn attribute and as a result the call is optimized in a way
>>>>>> that
>>>>>> prevents the stack from being walked - see eg:
>>>>>>
>>>>>> https://git.matricom.net/Firmware/bionic/commit/5f32207a3db0bea3ca1c7f4b2b563c11b895f276
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> though this:
>>>>>>
>>>>>> https://www.raspberrypi.org/forums/viewtopic.php?t=60540&p=451729
>>>>>>
>>>>>> suggests that problem may have been addressed by the libc folk.
>>>>>> But it
>>>>>> still raises the question as to how our own noreturn functions
>>>>>> will be
>>>>>> handled and how they will affect stacktrace generation in hs_err logs
>>>>>> or via gdb.
>>>>>
>>>>> I added a call to fatal(...) in the GC code. I get correct stacktraces
>>>>> in gdb, but the stacktraces in the hs_err files are broken with
>>>>> fastdebug and product builds:
>>>>
>>>> Which platforms?
>>>
>>> On Linux x86 and x86_64.
>>>
>>>>
>>>>> Stack: [0x00007f12518d2000,0x00007f12519d3000], sp=0x00007f12519d0eb0,
>>>>> free space=1019k
>>>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
>>>>> C=native code)
>>>>> V  [libjvm.so+0x11db44a]  VMError::report_and_die()+0x1ba
>>>>> V  [libjvm.so+0x7efb80]  report_vm_error(char const*, int, char
>>>>> const*,
>>>>> char const*)+0x90
>>>>> V  [libjvm.so+0x7efc49]  report_vm_error_noreturn(char const*, int,
>>>>> char
>>>>> const*, char const*)+0x9
>>>>> V  [libjvm.so+0x7efc63]
>>>>> V  [libjvm.so+0xfd7937]
>>>>> V  [libjvm.so+0xfeec51]
>>>>> ...
>>>>
>>>> So what is the plan: try to get hs_err working again? Or file this
>>>> under "well it seemed like a good idea"? ;-)
>>>
>>> I'm leaning towards "seemed like a good idea", unless someone has an
>>> easy fix for these problems.
>>
>> I've been looking a bit at this. It's not the stack trace per se that
>> is broken, but the decoding of the function names is not working for
>> some of the callers of the noreturn functions.
>>
>> I tried this with report_fatal using -XX:ErrorHandlerTest=5 and got
>> the following:
>>
>> 0x7fb71ccd98d0 <report_fatal>:    push   %rbp
>> 0x7fb71ccd98d1 <report_fatal+1>:    mov    %rdx,%rcx
>> 0x7fb71ccd98d4 <report_fatal+4>:    lea 0x9b4b34(%rip),%rdx
>> 0x7fb71ccd98db <report_fatal+11>:    mov    %rsp,%rbp
>> 0x7fb71ccd98de <report_fatal+14>:    callq  0x7fb71ccd98c0
>> 0x7fb71ccd98e3:    data16 data16 data16 nopw %cs:0x0(%rax,%rax,1)
>>
>> So the report_fatal frame has ...98e3 as its return address, but that
>> is actually outside the function and this causes dladdr() to return
>> NULL in dli_saddr and dli_sname.
>>
>> The JVM then attempts to decode using Decoder::decode but I wasn't
>> able to follow that code to understand why that fails.
>>
>> The same appears to happen for the caller of report_fatal
>> (controlled_crash in my case) but there I can't explain why dladdr
>> returns NULL values there.
>>
>> After these two functions the rest of the stack trace appears to be
>> correctly decoded.
>>
>> One approach could be to attempt to inject a "nop" at the end of
>> functions which call a "noreturn" function. This would hopefully make
>> the instruction after the call to the noreturn function part of the
>> caller and would make symbol decoding work.
>
> I found this mail thread:
> https://sourceware.org/bugzilla/show_bug.cgi?id=6522
>
> which blames the -fcross-jumping optimization.
>
> I recompiled hotspot with OPT_CFLAGS/debug.o=-fno-crossjumping, and now
> I get correct stack traces with fastdebug on Linux 64 bits.

I did a more thorough investigation into this on a slowdebug build, and 
the reason for the symbols missing appears to be that after the JVM's 
ELF Decoder runs into an un-decodeable symbol because a return PC points 
to a nop in-between two symbols (because it's just called a noreturn 
function) the Decoder sets m_status to FileInvalid and refuses to decode 
any more symbols.
If I comment out the code to set the fail status I get a fairly normal 
hs err stacktrace:

V  [libjvm.so+0xf184c8]  VMError::report(outputStream*)+0x133c
V  [libjvm.so+0xf19865]  VMError::report_and_die()+0x411
V  [libjvm.so+0x7876de]  report_vm_error(char const*, int, char const*, 
char const*)+0xba
V  [libjvm.so+0x7877d7]  report_vm_error_noreturn(char const*, int, char 
const*, char const*)+0x3d
V  [libjvm.so+0x78781b]  report_should_not_call(char const*, int)+0x0
V  [libjvm.so+0x92bfeb]
V  [libjvm.so+0x6e10ff]  GenCollectorPolicy::mem_allocate_work(unsigned 
long, bool, bool*)+0x283
V  [libjvm.so+0x92c049]  GenCollectedHeap::mem_allocate(unsigned long, 
bool*)+0x5d
V  [libjvm.so+0x45dbe5] 
CollectedHeap::common_mem_allocate_noinit(KlassHandle, unsigned long, 
Thread*)+0x103
V  [libjvm.so+0x45dda2] 
CollectedHeap::common_mem_allocate_init(KlassHandle, unsigned long, 
Thread*)+0x4e
V  [libjvm.so+0x45e034]  CollectedHeap::array_allocate(KlassHandle, int, 
int, Thread*)+0xac
V  [libjvm.so+0xed2f04]  TypeArrayKlass::allocate_common(int, bool, 
Thread*)+0xf0
V  [libjvm.so+0x44ae3e]  TypeArrayKlass::allocate(int, Thread*)+0x3e
V  [libjvm.so+0xcef2d5]  oopFactory::new_typeArray(BasicType, int, 
Thread*)+0x55
V  [libjvm.so+0x9c5aa9]  InterpreterRuntime::newarray(JavaThread*, 
BasicType, int)+0x147
j  alloc.AllocArrays.main([Ljava/lang/String;)V+237
v  ~StubRoutines::call_stub
V  [libjvm.so+0x9df121]  JavaCalls::call_helper(JavaValue*, 
methodHandle*, JavaCallArguments*, Thread*)+0x6b1
V  [libjvm.so+0xd091d7]  os::os_exception_wrapper(void (*)(JavaValue*, 
methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, 
JavaCallArguments*, Thread*)+0x41
V  [libjvm.so+0x9dea5a]  JavaCalls::call(JavaValue*, methodHandle, 
JavaCallArguments*, Thread*)+0x86
V  [libjvm.so+0xa42306]  jni_invoke_static(JNIEnv_*, JavaValue*, 
_jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x200
V  [libjvm.so+0xa5964a]  jni_CallStaticVoidMethod+0x353
C  [libjli.so+0x86ed]  JavaMain+0x93c
C  [libpthread.so.0+0x80a5]  start_thread+0xc5

One problem is the line
V  [libjvm.so+0x78781b]  report_should_not_call(char const*, int)+0x0
I actually added a call to fatal(), but since fatal calls a noreturn 
function the return pc of that frame accidentally points to the first 
instruction in the next function, which happens to be 
report_should_not_call.

I wonder if this could be fixed by forcing gcc to empit a nop after the 
call to report_vm_error_noreturn in report_fatal and friends.
__asm__ __volatile__ ("nop" : : :);
appears to not be enough. GCC is very aggressive with noreturn, even 
with -O0.

/Mikael

>
> StefanK
>>
>> /Mikael
>>
>>>
>>> Thanks,
>>> StefanK
>>>
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>>> Thanks,
>>>>> StefanK
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> StefanK
>>>>>>>
>>>>>
>>>
>>
>


More information about the hotspot-dev mailing list