RFR: 8283337: Posix signal handler modification warning triggering incorrectly
Thomas Stuefe
stuefe at openjdk.java.net
Fri Mar 18 07:01:28 UTC 2022
On Thu, 17 Mar 2022 19:23:10 GMT, Kevin Walls <kevinw at openjdk.org> wrote:
> PosixSignals::print_signal_handler() is incorrectly detecting a changed signal handler. The signal handler for SIGBREAK/SIGQUIT is now set in src/hotspot/os/posix/signals_posix.cpp with the bool parameter do_check set to false.
>
> set_signal_handler should only store a handler in vm_handlers when do_check is true.
>
> However I don't see a simple way of getting a valid warning for the SIGQUIT handler, if it is added later when os::initialize_jdk_signal_support() calls os::signal().
>
> If only signals added directly by src/hotspot/os/posix/signals_posix.cpp have the warning for a handler changing, then we never had a warning for SIGQUIT, so just this simple change is needed to remove the bogus warning.
Trying to recall the details of Xin's [JDK-8279124](https://bugs.openjdk.java.net/browse/JDK-8279124) to wrap my head around this one...
The original problem was that we install the SIGQUIT handler when initializing the JDK as part of `os::initialize_jdk_signal_support()`. In contrast to other signals, that happens at the end of VM initialization. VM initialization can take time (eg large heap and +AlwaysPreTouch). So it leaves a window where a SIGQUIT would run into the default handler, which tears down the VM process. Unfortunately tools like jcmd use SIGQUIT as signaling mechanism, so using jcmd on a seemingly unresponsive VM - that just takes its time initializing - kills the VM.
The solution was to install a first stop-gap signal handler for SIGQUIT at the very start of VM initialization, together with all other signals. That one just ignored SIGQUIT. Then we replace later that handler with the real one.
The problem then was that signal change recognition was established for the first handler, and triggers warnings after the first handler is replaced by the second one. But both handlers are legit, so the warning was bogus. Xin solved this by disabling handler change recognition for SIGQUIT. He did not include `print_signal_handler()` though, so we see that warning in hs-err files and in jcmd VM.info always.
Have I gotten this right, or have I forgotten something?
--
For a workaround the proposed change is fine. It gets rid of the always present warning in hs-err files, at the cost of hiding it also when it would be useful. But that is okay, a warning that is always present gets quickly ignored.
One code improvement would be to remove `do_check_signal_periodically` altogether, since it is now redundant with `vm_handlers.get() == NULL`. So, leaving NULL or resetting to NULL in SavedSignalHandlers could mean "dont test".
A better solution may be to save the second SIGQUIT handler too and reestablish the checks for SIGQUIT. Since we want to know if a third-party app changes SIGQUIT, and Xcheck:jni to tell us if that happens. A possible scenario, Customer complains about "vanishing VMs", but some third party lib just reset SIGQUIT and VM dies as soon as a Thread dump is triggered. Running with -Xcheck:jni would be the first thing I try, but right now it would not help.
Cheers, Thomas
-------------
PR: https://git.openjdk.java.net/jdk/pull/7858
More information about the hotspot-runtime-dev
mailing list