none

Mon Sep 30 13:12:41 UTC 2019

On 30/09/2019 10:55 pm, Florian Weimer wrote:
> * Donald Kwakkel:
> 
>> Is this a known issue / are there workarounds?
> 
> Yes, sort of, we've seen it many times with in-process crash handlers.

We all know that there are many many things that you cannot safely do 
from a signal handling context, but if we stuck to what was safe it 
would not be possible to produce anything like a meaningful error report 
(though I've voiced concern that more and more and more keeps getting 
added to the error reporting!). But the error report is primarily 
intended to help diagnose problems within the JVM, or the JDK libraries, 
we're not expecting to hit faults in libc, or other system libraries - 
and so reentrant mallocs is a very rare occurrence.

>> If during malloc a crash signal occurs, the jvm will hang because it will
>> call again malloc, which is not reentrant (for explanation see
>> https://stackoverflow.com/questions/40049751/malloc-inside-linux-signal-handler-cause-deadlock).
>>
>>
>> This is for production environments a big problem because self-healing (in
>> this case an automatic restart through jsvc) will not work. Resulting in
>> manual actions by cloud administrators!
> 
> The recommended practice is to use a crash handler external to the
> process.  Of course, this is not going to fix the underlying issue which
> causes the crash.
> 
>> #11 0x00007f123a9fbbae in VMError::report_and_die(Thread*, unsigned
>> int, unsigned char*, void*, void*) () from
>> /usr/java/latest/lib/server/libjvm.so
>> #12 0x00007f123a7dea10 in JVM_handle_linux_signal () from
>> /usr/java/latest/lib/server/libjvm.so
>> #13 0x00007f123a7d2c08 in signalHandler(int, siginfo*, void*) () from
>> /usr/java/latest/lib/server/libjvm.so
>> #14 <signal handler called>
>> #15 0x0000003ce2c7856c in _int_malloc () from /lib64/libc.so.6
> 
> The question is whether a verbose crash report is actually needed if the
> PC value (as recorded in the siginfo* argument) is within libc.so.6.
> Maybe in this case, a more limited crash dump is appropriate.

I think though, that by the time we have enough information to figure 
that out it may already be too late - in this case the malloc is used by 
the Decoder which is being used to determine where the crash is IIUC. 
Though perhaps we need to look at the Decoder and see if we can use 
non-malloc'd memory.

Cheers,
David
-----

> Thanks,
> Florian
>