RFR: 8250637: UseOSErrorReporting times out (on Mac and Linux)
Gerard Ziemski
gziemski at openjdk.java.net
Fri Oct 23 16:19:40 UTC 2020
On Thu, 22 Oct 2020 20:30:54 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> src/hotspot/os/posix/signals_posix.cpp line 1425:
>>
>>> 1423: for (int i = 0; i < NSIG + 1; i++) {
>>> 1424: sigaction(i, &defaulthandler, NULL);
>>> 1425: }
>>
>> Some issues:
>> - 0 is not a valid signal number
>> - NSIG is obscure; may be ridiculously large (e.g. AIX I believe 1024) and still may not contain all signals for some platforms (e.g. real time signals on BSD (?) are not included)
>> - You now flatten here all signal handlers across the board. What if some third party app had installed their own. I would only reset those which had been originally installed for hotspot usage (beware ReduceSignalUsage).
>>
>> Which brings me to the point that if we fail to re-raise the original error signal, now we run with signal handlers completely disabled :( ... see my original concerns.
>
> Thinking further, we probably want to just the default handler for the error signal that happened. And we also want to do this only for synchronous error signals (segv, sigill, sigbus, sigfpe), and possibly only if they are "real" (had not been sent by kill or pthread_kill).
> Some issues:
>
> * 0 is not a valid signal number
Right, I used to start at 1, but then copied the "for loop" code from another part of existing code for "NSIG" and forgot to fix the start value" I will fix that.
> * NSIG is obscure; may be ridiculously large (e.g. AIX I believe 1024) and still may not contain all signals for some platforms (e.g. real time signals on BSD (?) are not included)
1. The fix will work for any NSIG value, why would its value be relevant here? That's the value we use anywhere else, so if it's wrong here, then it's wring everywhere else and we have a bigger issue to deal here than just this fix.
2. We are not accounting for real time signals on BSD anywhere else, so why do we need to do it here?
> * You now flatten here all signal handlers across the board. What if some third party app had installed their own. I would only reset those which had been originally installed for hotspot usage (beware ReduceSignalUsage).
The purpose of the "UseOSErrorReporting" flag is to let the process die, so that it can be handled by the OS. If we let third party app signal handlers jump in, we can't guarantee that "UseOSErrorReporting" does its job.
> Which brings me to the point that if we fail to re-raise the original error signal, now we run with signal handlers completely disabled :( ... see my original concerns.
I will loosely quote what David Holmes likes to say about signal handlers and hope I get it right: "this is a best effort, nothing is guaranteed in signal handlers". By that I mean yes, you are right that we can't guarantee correct behavior, but at this point of process lifecycle we caught the crash and produced hs_err log, anything else is just great if it works and no big loss if things go wrong now (how bad exactly can things go at this point?). From my experience, however, it works.
-------------
PR: https://git.openjdk.java.net/jdk/pull/813
More information about the hotspot-dev
mailing list