RFR: JDK-8255711: Fix and unify hotspot signal handlers [v4]
Gerard Ziemski
gziemski at openjdk.java.net
Fri Nov 6 15:23:01 UTC 2020
On Fri, 6 Nov 2020 12:56:19 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> Hi all,
>>
>> may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers.
>>
>> Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues.
>>
>> This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code.
>>
>> This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324.
>>
>> ----
>>
>> This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit.
>>
>> See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history.
>>
>> Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that.
>>
>> ---
>>
>> The fixed issues:
>>
>> 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed.
>>
>> Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early.
>> But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed.
>>
>> 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls).
>>
>> 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported.
>>
>> 4) Every platform handler has this section:
>>
>> JavaThread* thread = NULL;
>> VMThread* vmthread = NULL;
>> if (PosixSignals::are_signal_handlers_installed()) {
>> if (t != NULL ){
>> if(t->is_Java_thread()) {
>> thread = t->as_Java_thread();
>> }
>> else if(t->is_VM_thread()){
>> vmthread = (VMThread *)t;
>> }
>> }
>> }
>>
>> `vmthread` is unused on all platforms and can be removed.
>>
>> 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM):
>>
>> if (sig == SIGPIPE || sig == SIGXFSZ) {
>> // allow chained handler to go first
>> if (PosixSignals::chained_handler(sig, info, ucVoid)) {
>> return true;
>> } else {
>> // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219
>> return true;
>> }
>> }
>>
>> - On s390 and ppc, we miss SIGXFSZ handling
>> _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_.
>> - both paths return true - section can be shortened
>>
>> Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler.
>>
>> 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal:
>>
>>> // unmask current signal
>>> sigset_t newset;
>>> sigemptyset(&newset);
>>> sigaddset(&newset, sig);
>>> sigprocmask(SIG_UNBLOCK, &newset, NULL);
>>>
>>
>> - Use of `sigprocmask()` is UB in a multithreaded program.
>> - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler.
>>
>> 7) the JFR crash protection is not consistently checked in all platform handlers.
>>
>> 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism).
>>
>> 9) on Linux ppc64 and AIX, we have this section:
>>
>>> if (sig == SIGILL && (pc < (address) 0x200)) {
>>> goto report_and_die;
>>> }
>>
>> which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide).
>>
>> This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway.
>>
>> 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`.
>>
>> On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice.
>>
>> ----
>>
>> The changes in this patch:
>>
>> a) hotspot signal handling is now done by the following functions:
>>
>> <OS> <foreign code, probably signal handler too>
>> | |
>> v v
>> javaSignalHandler JVM_handle_linux_signal()
>> | /
>> v v
>> javaSignalHandler_inner
>> |
>> v
>> PosixSignals::pd_hotspot_signal_handler()
>>
>>
>> The right branch only exists to support the `AllowUserSignalHandlers` case, see [4].
>>
>> `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before.
>> `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions.
>>
>>
>> b) I commonized prologue- and epilogue coding.
>> - I simplified (4) to a single line in the shared handler
>> - I moved the JFR thread crash protection (7) up to the shared handler
>> - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6)
>> - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept.
>> - I simplified (5) and commonized it, and removed (9) completely
>> - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out.
>>
>> Thanks for reviewing.
>>
>> Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times.
>>
>> I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change.
>>
>> Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too.
>>
>> ----
>>
>> [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html
>> [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html
>> [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html
>> [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html
>> [5] https://bugs.openjdk.java.net/browse/JDK-8253742
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
>
> - Review feedback
> - Merge
> - Feedback David
> - Remove unnecessary call-once guard from PosixSignals::install_signal_handlers()
> - Initial patch
Marked as reviewed by gziemski (Committer).
-------------
PR: https://git.openjdk.java.net/jdk/pull/1034
More information about the hotspot-dev
mailing list