RFR: JDK-8252533: Signal handlers should run with synchronous error signals unblocked [v2]

Thomas Stuefe stuefe at openjdk.java.net
Thu Oct 29 08:55:57 UTC 2020


> Hi,
> 
> may I have reviews please for the following patch which takes care to unblock synchronous error signals at the entrance of our main signal handler (including some minor code cleanups).
> 
> When a signal happens which cannot be deferred (SIGFPE, SIGILL, SIGSEGV, SIGBUS) but whose delivery is blocked, bad things happen. This is undefined territory, and we have observed the following cases:
> 
> - on Linux, the default handler is invoked instead of the user handler, which in case of error signals causes the process to core immediately. This is actually one of the better outcomes, we still have a core.
> - on AIX and PASE both, the process just becomes unresponsive and hangs.
> - on HPUX - one of our internal platform - the process just vanishes without a trace.
> 
> I did not test other platforms but would guess similar things happen there. Posix documentation [4] states:
> _"If any of the SIGFPE, SIGILL, SIGSEGV, or SIGBUS signals are generated while they are blocked, the result is undefined, unless the signal was generated by the kill() function, the sigqueue() function, or the raise() function."_
> 
> At the moment, undeferrable error signals are unblocked outside the signal handler (see hotspot sigmask) and, since JDK-8065895, inside the error handler (see crash_handler setup). This leaves us with a window where the hotspot signal handler runs but before he has decided to invoke fatal error handling. Inside that window, for any platform but AIX error signals are still blocked. So any crash inside them tears down the VM immediately without giving us a useful hs-err file.
> 
> On AIX they are not blocked because we added an AIX-only patch a while ago which unblocks them at the entrance of the AIX signal handler. This was before we contributed the port to OpenJDK, so no history in the official repos. But that behavior makes sense for all posix platforms.
> 
> For more details see discussion from Nov 2014 [2][3].
> 
> (Side note, these effects only show for truly synchronous error signals. You cannot artificially create such a scenario e.g. by raising SIGSEGV with kill.)
> 
> About sa_sigaction.sa_mask vs pthread_sigmask: At the first glance, it would be more elegant to just unblock error signals in the signal mask specified when installing the signal handler. However, the way that mask works is that it is ANDed together with the process signal mask in effect when the signal was raised. So if one or more of the error signals were blocked at that point, they would continue to be blocked in the handler, sa_mask notwithstanding.
> It seems easier and clearer to just manually unblock those signals at the entrance of our handlers.
> 
> Changes in detail:
> 
> - At the entrance of javaSignalHandler() we now unblock error signals for all platforms, not just for AIX.
> - Nothing changes at the entrance of the secondary error handler (crash_handler()): just better code reuse.
> - In os::signal(), for AIX only, we used to set the sa_mask for the handler-to-be-installed to unblock error signals. I decided to just remove this section. The reason is: I want to get rid of this AIX specific section, and was torn between leaving it in for all platforms or removing it. os::signal is used to install other signal handlers too, and I do not want to change their behavior with this patch. Therefore I just removed this section.
> 
> Tests:
> 
> I tested the change manually by triggering a segfault inside javaSignalHandler. That reliably tore down the process immediately. With this patch, once error signals are unblocked, we continue to run - we get the secondary crash, which then leads to VMError::report_and_die() called since its a real error (I prevented recursion for this test. See my comments about recursion in the JBS issue.)
> 
> We will put this through our SAP internal tests before pushing.
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8065895
> [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2014-November/013346.html
> [3] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2015-January/013718.html
> [4] https://pubs.opengroup.org/onlinepubs/009695399/functions/sigprocmask.html

Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:

  Feedback David

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/839/files
  - new: https://git.openjdk.java.net/jdk/pull/839/files/eabe3e38..8d86e25a

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=839&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=839&range=00-01

  Stats: 74 lines in 2 files changed: 54 ins; 18 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/839.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/839/head:pull/839

PR: https://git.openjdk.java.net/jdk/pull/839


More information about the hotspot-runtime-dev mailing list