RFR: JDK-8252533: Signal handlers should run with synchronous error signals unblocked

Wed Oct 28 06:39:19 UTC 2020

On Tue, 27 Oct 2020 20:39:17 GMT, Gerard Ziemski <gziemski at openjdk.org> wrote:

> hi Thomas,
> 
> Looks good as far as I can tell.
> 
Thank you!

> I'm still learning the POSIX signal code, so excuse the question, but if our signal handlers are capable of handling the synchronous signals while doing fatal error handling, then why don't we unblock it for all the signals, so that we can handle any kind of a situation inside the fatal error handling?

Its unnecessary and safer. 

When signals which can be deferred (lets say, SIGPIPE or SIGCHILD) happen while they are blocked in the receiving thread's signal mask, nothing really happens: they are just added to an internal kernel queue for the time being, until the receiving thread unblocks, eg. by returning from signal handling.

Were they unblocked and happen to be delivered to the thread during fatal error handling - so, the reporting thread is somewhere inside VMError::report(), working on one of the error reporting steps - the error reporting step gets interrupted and we execute (in a new call frame):

crash_handler(SIGPIPE) -> 
VMError::report_and_die(SIGPIPE) -> _first_error_tid == our tid -> 
we print "[Error occurred during error reporting" into the hs-err file ->
re-enter VM::report() again.

Now call stack looks like this:

<OS signal handler 1>
javaSignalHandler & JVM_handle_xx_signal
VMError::report_and_die
VMError::report
<OS signal handler 2>
crash_handler
VMError::report_and_die
VMError::report

When re-entering VMError::report(), we skip the error reporting step during which we got the second signal, because we assume it was an error signal. We just continue with the next step.

The whole thing is quite ingenious in that it allows us to do all kind of unsafe things for the sake of error reporting. If one step fails, it does not endanger the next step. Also, we never unwind the stack, so when we dump the core after error handling, we have the full stack (plus 0-n incarnations of above frames). Unwinding the stack may also be dangerous since the stack may be corrupted.

The only caveat is that we artificially limit the number of recursive error signals that can happen to 30 (not sure why) and of course we may run out of stack after too many recursive errors.

Bottomline, that whole mechanism is written with synchronous error signals in mind, and we would have to rethink it if we allow all signals (eg it would not be necessary to skip an error reporting step because we get a SIGPIPE). Since blocking works fine for those signals, there is no need to do this.

> 
> Also, since we're touching the "VMError::reset_signal_handlers()" any chance we can improve on the name here, as you yourself suggested in other review?

Sure, but I'd like to do this in a separate patch. I was actually counting on you in that other patch :) but I have some vague cleanup in mind at some point, unless you are faster.

Cheers, Thomas

> 
> cheers

-------------

PR: https://git.openjdk.java.net/jdk/pull/839