RFR: 8250637: UseOSErrorReporting times out (on Mac and Linux)
Thomas Stuefe
stuefe at openjdk.java.net
Thu Oct 22 20:33:19 UTC 2020
On Thu, 22 Oct 2020 20:12:55 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> hi all,
>>
>> Please review this fix for POSIX platforms, which addresses a time out that occurs while handling a crash with UseOSErrorReporting ON
>>
>> The timeout was caused by the crash handling code, looping infinitively, because it incorrectly assumed that the signal handlers were reset to their defaults, while what really was happening was that the code was resetting the signal handlers to our default signal handler.
>>
>> To avoid similar confusion in the future I did the following:
>>
>> - renamed the VMError::reset_signal_handlers() to VMError:: rearm_signal_handlers()
>> - introduced a new API VMError::clear_signal_handlers() which is implemented in PosixSignals
>>
>> PosixSignals::clear_signal_handlers() is where the actual fix is done and it simply resets all signal handlers to their system defaults.
>>
>> A similar problem occurs on Windows, with the only difference being that before a process times out (takes 2 minutes) it runs out of stack space in about 250 loops, so that's the only reason it doesn't linger for that long. Windows issue is tracked separately by https://bugs.openjdk.java.net/browse/JDK-8250782
>
> src/hotspot/os/posix/signals_posix.cpp line 1425:
>
>> 1423: for (int i = 0; i < NSIG + 1; i++) {
>> 1424: sigaction(i, &defaulthandler, NULL);
>> 1425: }
>
> Some issues:
> - 0 is not a valid signal number
> - NSIG is obscure; may be ridiculously large (e.g. AIX I believe 1024) and still may not contain all signals for some platforms (e.g. real time signals on BSD (?) are not included)
> - You now flatten here all signal handlers across the board. What if some third party app had installed their own. I would only reset those which had been originally installed for hotspot usage (beware ReduceSignalUsage).
>
> Which brings me to the point that if we fail to re-raise the original error signal, now we run with signal handlers completely disabled :( ... see my original concerns.
Thinking further, we probably want to just the default handler for the error signal that happened. And we also want to do this only for synchronous error signals (segv, sigill, sigbus, sigfpe), and possibly only if they are "real" (had not been sent by kill or pthread_kill).
-------------
PR: https://git.openjdk.java.net/jdk/pull/813
More information about the hotspot-dev
mailing list