Hotspot crash handler re-enters malloc

Mon Sep 30 13:21:47 UTC 2019

* David Holmes:

> We all know that there are many many things that you cannot safely do
> from a signal handling context, but if we stuck to what was safe it
> would not be possible to produce anything like a meaningful error
> report (though I've voiced concern that more and more and more keeps
> getting added to the error reporting!). But the error report is
> primarily intended to help diagnose problems within the JVM, or the
> JDK libraries, we're not expecting to hit faults in libc, or other
> system libraries - and so reentrant mallocs is a very rare occurrence.

It's not rare if something causes heap corruption, unfortunately.  On
the other hand, we have added more and more non-optional heap
consistency checks to glibc over time.  Those raise SIGABRT on failure
instead of crashing with SIGSEGV, and it appears that Hotspot doesn't
trigger the crash dump for that (which is a bit odd, but it does avoid
most of the malloc reentrancy issues).

>> The question is whether a verbose crash report is actually needed if the
>> PC value (as recorded in the siginfo* argument) is within libc.so.6.
>> Maybe in this case, a more limited crash dump is appropriate.
>
> I think though, that by the time we have enough information to figure
> that out it may already be too late - in this case the malloc is used
> by the Decoder which is being used to determine where the crash is
> IIUC. Though perhaps we need to look at the Decoder and see if we can
> use non-malloc'd memory.

If you read the PC from the siginfo * argument, you don't need the
decoder.  Obviously, you need to pre-record the libc.so.6 boundaries
somewhere so that you do not have to call into the dynamic loader at the
time of the crash.

But maybe that's not really helpful because too many SIGSEGV crashes
happen in libc.so.6 and the full dumps add value for diagnosing them.

Thanks,
Florian