RFR: JDK-8255711: Fix and unify hotspot signal handlers
Thomas Stüfe
thomas.stuefe at gmail.com
Wed Nov 4 06:34:38 UTC 2020
Hi David,
On Wed, Nov 4, 2020 at 6:56 AM David Holmes <david.holmes at oracle.com> wrote:
> Hi Thomas,
>
> Some initial comments as this is quite big - but thanks for all the
> detailed explanations.
>
>
>
<snip>
> > See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for
> untangling this bit of history.
> >
> > Unfortunately there is no official documentation from Sun or Oracle, and
> zero regression tests. So I try to preserve this interface as best as I
> can. I plan to add a proper regression test with a later change, but for
> now I don't have the time for that.
>
> On the documentation front we could at least explain what the flag
> really does. Rather than saying:
>
> product(bool, AllowUserSignalHandlers, false,
> \
> "Do not complain if the application installs signal handlers
> "
>
> we could say something like:
>
> product(bool, AllowUserSignalHandlers, false,
> \
> "Allow the application to install the primary signal handlers
> instead of the JVM." \
>
> and we (I?) could update the java manpage.
>
>
I can update the text description, but I would like to defer any additional
work for this switch to some other RFE.
> >
> > ---
> >
> > The fixed issues:
> >
> > 1) `PosixSignals::_are_signal_handlers_installed` is used inside the
> platform handlers to guard a part of the platform handlers against
> execution in case the signal handlers are not yet installed.
> >
> > Initially this confused me since when this handler is called it would of
> course be installed. So that boolean would always be true. The only
> explanation I found was that since these handlers can be invoked directly
> from outside, this is some (ineffective) form of guard against calling this
> handler too early.
> > But that guard can be left out and that boolean removed. Our signal
> handlers are safe to call before VM initialization is completed.
>
> The code that is guarded by this check is implicitly safe as it is
> trivial and early in VM init the current thread will be null anyway and
> so we won't execute the guarded code.
>
Yes, that's what I thought.
>
> > 2) The return code of JVM_handle_xxx_signal() was inconsistently set
> (some as bool, some as int) as well as unused in normal code paths
> (excluding outside calls).
>
> Ok.
>
> > 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there
> is a day-zero bug which caused it to not be exported.
> >
> > 4) Every platform handler has this section:
> >
> > JavaThread* thread = NULL;
> > VMThread* vmthread = NULL;
> > if (PosixSignals::are_signal_handlers_installed()) {
> > if (t != NULL ){
> > if(t->is_Java_thread()) {
> > thread = t->as_Java_thread();
> > }
> > else if(t->is_VM_thread()){
> > vmthread = (VMThread *)t;
> > }
> > }
> > }
> >
> > `vmthread` is unused on all platforms and can be removed.
>
> Ok. Once upon a time we probably did something different if in the
> vmThread versus some other non-JavaThread.
>
Solaris did use it (and AIX), but only comparing with NULL (I guess as a
shorthand for "t->is_VM_thread()").
>
> > 5) Every platform handler has some variant of this section, to ignore
> SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM):
> >
> > if (sig == SIGPIPE || sig == SIGXFSZ) {
> > // allow chained handler to go first
> > if (PosixSignals::chained_handler(sig, info, ucVoid)) {
> > return true;
> > } else {
> > // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219
> > return true;
> > }
> > }
> >
> > - On s390 and ppc, we miss SIGXFSZ handling
> > _Update: Fixed separately for easier backport, see [
> https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_.
> > - both paths return true - section can be shortened
> >
> > Side note: having handlers for those signals may be unnecessary. We
> could just set the signal handler to `SIG_IGN`. We would have to tiptoe
> around any third party handlers for those signals, but it still may be
> simpler.
> >
> > 6) At the end of every platform header, before calling into fatal error
> handling, we unblock the signal:
> >
> >> // unmask current signal
> >> sigset_t newset;
> >> sigemptyset(&newset);
> >> sigaddset(&newset, sig);
> >> sigprocmask(SIG_UNBLOCK, &newset, NULL);
> >>
> >
> > - Use of `sigprocmask()` is UB in a multithreaded program.
> > - but then, this section is unnecessary anyway, since [
> https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask
> error signals at the start of the signal handler.
>
> But is this guaranteed to be one of the error signals? What if the
> application calls our handler for some other signal? I guess that is
> their problem.
>
>
Good point, but:
- we only need to unblock error signals. There is no reasonable need to
unblock other signals in fatal error handling (to the contrary, the rest
should be kept blocked to not interfere with hs-err printing)
- we only install handlers for: SIGSEGV, SIGPIPE, SIGBUS, SIGILL, SIGFPE,
SIGTRAP, SIGXFSZ. Of those, we don't unblock SIGPIPE and SIGXFSZ. But those
are handled at the entrance of javaSignalHandler_inner, so we should never
enter fatal error handling because of those.
But you raise an interesting point, I am not sure whether SIGXFSZ can be
deferred. What happens if we write a big core file and that hits the file
limit, but SIGXSFZ is blocked from delivery? Will we just get terminated?
Well, maybe we should get terminated. But this is a question for another
RFE.
> > 7) the JFR crash protection is not consistently checked in all platform
> handlers.
> >
> > 8) On Zero, when entering fatal error handling, we do so via fatal()
> instead of VMError::report_and_die(), thereby discarding the real crash
> context and obfuscating the register content in the hs-err file (we still
> see registers, but those stem from the assertion-poison-page mechanism).
> >
> > 9) on Linux ppc64 and AIX, we have this section:
> >
> >> if (sig == SIGILL && (pc < (address) 0x200)) {
> >> goto report_and_die;
> >> }
> >
> > which is related to the fact that the zero page on AIX is readable,
> filled with 0, and reading instructions from it will yield us a SIGILL, not
> a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide).
> >
> > This coding is irrelevant on Linux. On AIX, it can also be removed,
> since this SIGILL would be unrecognized by the hotspot and later count as
> fatal error anyway.
> >
> > 10) When invoking the fatal error handler, we extract the pc from the
> context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is
> not totally correct. According to POSIX [3], for those signals the address
> of the faulting instruction is handed over in `si_info.si_addr`.
> >
> > On most platforms this does not matter, they are the same. But on some
> architectures the pc in the signal context actually points somewhere else,
> e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the
> better choice.
>
> The Posix spec also states "For some implementations, the value of
> si_addr may be inaccurate." - so I'm not at all sure which "pc" we
> should be trusting here? I thought the ucontext was the detailed
> platform specific "context" object that we should extract information
> from. Which architectures give different values in the two and is there
> some documentation stating what happens for any given os/cpu?
>
>
I overlooked this. Well, this is a helpful standard :)
I saw this happen on s390 and on pa-risc. Before this patch, I did correct
this in the platform handler. Out of caution I could #ifdef this section to
s390.
> > ----
> >
> > The changes in this patch:
> >
> > a) hotspot signal handling is now done by the following functions:
> >
> > <OS> <foreign code, probably signal handler too>
> > | |
> > v v
> > javaSignalHandler JVM_handle_linux_signal()
> > | /
> > v v
> > javaSignalHandler_inner
>
> Not clear why we need the _inner version. Why can't we just have
> javaSignalHandler which is installed as the handler and which is called
> by JVM_handle_XXX_signal?
>
Because JVM_handle_XXX_signal has one more argument than the standard
signal handler (abort_if_unrecognized).
>
> > |
> > v
> > PosixSignals::pd_hotspot_signal_handler()
> >
> >
> > The right branch only exists to support the `AllowUserSignalHandlers`
> case, see [4].
> >
> > `javaSignalHandler` is registered as handler, as it was before;
> JVM_handle_linux_signal() is exported as before.
> > `javaSignalHandler_inner` contains the shared portion of the signal
> handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining
> platform dependent portions.
> >
> >
> > b) I commonized prologue- and epilogue coding.
> > - I simplified (4) to a single line in the shared handler
> > - I moved the JFR thread crash protection (7) up to the shared handler
> > - I moved the complete epilogue up to the shared handler. That
> includes calling the chained handlers, should they exist, as well as
> invoking the fatal error handler. That fixes (8) and (6)
> > - Zero has this tradition of showing a robot telling a cat about the
> error signal, which I like, and kept.
> > - I simplified (5) and commonized it, and removed (9) completely
> > - In PosixSignals::install_signal_handlers(), I removed the
> `signal_handlers_are_installed` guard and replaced it with an assert.
> Unfortunately this causes lots of indentation changes. @gerard-ziemski: if
> this clashes too much with your patch for JDK-8253742, I'll leave that part
> out.
> >
> > Thanks for reviewing.
> >
> > Testing: this patch ran through our nightlies, in an earlier form. They
> will be re-ran some more times.
> >
> > I'd be happy if aarch64 porters could take a look at the aarch64 portion
> of this change.
> >
> > Please note that I had to draw a line somewhere - this is an open ended
> issue and a lot more could be cleaned. See also Gerard's work on [5], which
> is under review too.
>
> :) I've actually gone through this in far more detail as I've been
> composing this email and overall it is looking very good. I've made a
> couple of comments directly in the PR, in addition to the above.
>
>
Thank you! I'll wait some more, then prepare an update.
Cheers, Thomas
> Thanks,
> David
> -----
>
> >
> > ----
> >
> > [1]
> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html
> > [2]
> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html
> > [3]
> https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html
> > [4]
> https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html
> > [5] https://bugs.openjdk.java.net/browse/JDK-8253742
> >
> > -------------
> >
> > Commit messages:
> > - Initial patch
> >
> > Changes: https://git.openjdk.java.net/jdk/pull/1034/files
> > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1034&range=00
> > Issue: https://bugs.openjdk.java.net/browse/JDK-8255711
> > Stats: 916 lines in 13 files changed: 116 ins; 686 del; 114 mod
> > Patch: https://git.openjdk.java.net/jdk/pull/1034.diff
> > Fetch: git fetch https://git.openjdk.java.net/jdk
> pull/1034/head:pull/1034
> >
> > PR: https://git.openjdk.java.net/jdk/pull/1034
> >
>
More information about the hotspot-dev
mailing list