RFR: 8250637: UseOSErrorReporting times out (on Mac and Linux)

Gerard Ziemski gziemski at openjdk.java.net
Fri Oct 23 16:23:38 UTC 2020


On Fri, 23 Oct 2020 16:17:21 GMT, Gerard Ziemski <gziemski at openjdk.org> wrote:

>> Thinking further, we probably want to just the default handler for the error signal that happened. And we also want to do this only for synchronous error signals (segv, sigill, sigbus, sigfpe), and possibly only if they are "real" (had not been sent by kill or pthread_kill).
>
>> Some issues:
>> 
>> * 0 is not a valid signal number
> 
> Right, I used to start at 1, but then copied the "for loop" code from another part of existing code for "NSIG" and forgot to fix the start value" I will fix that.
> 
>> * NSIG is obscure; may be ridiculously large (e.g. AIX I believe 1024) and still may not contain all signals for some platforms (e.g. real time signals on BSD (?) are not included)
> 
> 1. The fix will work for any NSIG value, why would its value be relevant here? That's the value we use anywhere else, so if it's wrong here, then it's wrong everywhere else and we have a bigger issue to deal here than just this fix.
> 2. We are not accounting for real time signals on BSD anywhere else, so why do we need to do it here?
> 
>> * You now flatten here all signal handlers across the board. What if some third party app had installed their own. I would only reset those which had been originally installed for hotspot usage (beware ReduceSignalUsage).
> 
> The purpose of the "UseOSErrorReporting" flag is to let the process die, so that it can be handled by the OS. If we let third party app signal handlers jump in, we can't guarantee that "UseOSErrorReporting" does its job.
> 
>> Which brings me to the point that if we fail to re-raise the original error signal, now we run with signal handlers completely disabled :( ... see my original concerns.
> 
> I will loosely quote what David Holmes likes to say about signal handlers and hope I get it right: "this is a best effort, nothing is guaranteed in signal handlers". By that I mean yes, you are right that we can't guarantee correct behavior, but at this point of process lifecycle we caught the crash and produced hs_err log, anything else is just great if it works and no big loss if things go wrong now (how bad exactly can things go at this point?). From my experience, however, it works.

> Thinking further, we probably want to just the default handler for the error signal that happened. And we also want to do this only for synchronous error signals (segv, sigill, sigbus, sigfpe), and possibly only if they are "real" (had not been sent by kill or pthread_kill).

Handling "UseOSErrorReporting" is the last thing we ask the process to do before letting it crash and get caught by the OS. I'm not sure we need to finesse here with which signals we want to flatten. This keeps the code simple, though I will add a comment to "PosixSignals::clear_signal_handlers()" describing the assumptions and expectations.

-------------

PR: https://git.openjdk.java.net/jdk/pull/813


More information about the hotspot-dev mailing list