Linux crash native stacks only have 1 line

Volker Simonis volker.simonis at gmail.com
Tue Apr 7 17:35:48 UTC 2020


On Tue, Apr 7, 2020 at 6:48 PM Alexander Miloslavskiy
<alexandr.miloslavskiy at gmail.com> wrote:
>
> > one of the problems may be that the native code you're crashed in was
> > compiled with "-fomit-frame-pointer" which means that the frame pointer
> > register can't be used for unwinding.
>
> I found that gcc `-O` already implies `-fomit-frame-pointer` [1].
> gcc `-O2` includes all optimizations from `-O`.
>
> I have tested that on a small program and confirmed that
> 1) `-O2` omits frame pointers,
> 2) `-O2 -fno-omit-frame-pointer` preserves frame pointers.
>
> Since most "production" code is compiled with -O2, this effectively
> means that if native crash occurs outside JRE then crash log will fail
> to unwind stack :(
>
> In my case, most crashes occur in GTK or GLib, both omit frame pointers.
>
> > I you already managed to attach gdb and get a correct back trace at the
> > point where your program crashes, you can just as well debug the HotSpot
> > stack tracing routine "print_native_stack()" in the file "debug.cpp" to
> > see what's the actual problem :)
>
> I didn't debug because I understand that JDK can't unwind stack without
> frame pointers.
>
> > Before doing  that, I'd check your
> > reproducer with jdk14 or better the tip revision just to make sure
> > you're not hunting a problem which has already been fixed upstream.
>
> I have tested with JDK14+36 and alas, no changes.
>
> > In general it's always advisable to mention the exact jdk version and
> > distribution you've used.
>
> Sorry, didn't mention because I observed this on all JDK's from JDK8 to
> JDK13 (now also confirmed on JDK14)
>
> I think I could try to implement better stack walking. Is there any
> interest to accept such patches, provided that they pass code review?
>

Sure, why not - you can definitively give it a try. On thing you
should be aware of is that this code will be usually called from the
signal handler, when the VM is already crashing. This means that the
VM may already be in an unstable state and we should be very careful
in order to not risk another error. You can start by looking at
ElfDecoder/ElfFile in decoder_elf.hpp/elfFile.hpp which already parses
an ELF file to get symbol information for a certain address. It has
the following comment :)

// ElfFile is basically an elf file parser, which can lookup the symbol
// that is the nearest to the given address.
// Beware, this code is called from vm error reporting code, when vm is already
// in "error" state, so there are scenarios, lookup will fail. We want this
// part of code to be very defensive, and bait out if anything went wrong.

I think what you'll have to do is to find out the frame size of a
native function and use that for unwinding if you have no FP. But this
can be tricky because glibc sometimes uses frameless functions. I
think if you look at gdb's unwinding code you will see that it is
quite complex.

Please also take into account that using a third party library like
libunwind is not an option for OpenJDK because of licensing issues.

Good luck and best regards,
Volker


> [1]
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options


More information about the hotspot-dev mailing list