RFR: JDK-8288556: VM crashes if it gets sent SIGUSR2 from outside

Mon Jun 20 05:29:53 UTC 2022

On Thu, 16 Jun 2022 07:47:02 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> The VM uses SIGUSR2 (can be overridden via _JAVA_SR_SIGNUM) to implement suspend/resume on java threads. It sends, via pthread_kill, SIGUSR2 to targeted threads to interrupt them. It knows the target thread, and the target thread is always a VM-attached thread.
> 
> However, if SIGUSR2 gets sent from outside, any thread may receive the signal, and if the target thread is not attached to the VM (e.g. primordial), it is unable to handle it. The result is an assert (debug VM) or a crash (release VM). On my box, this can be reliably reproduced by sending SIGUSR2 to any VM.
> 
> This has been discussed here: https://mail.openjdk.org/pipermail/core-libs-dev/2022-June/091450.html
> 
> The proposed solutions range from "works as designed" (on the ground that sending arbitrary signals to the JVM is an error in itself, and we should rather crash hard and fast) to "lets catch and ignore the signal".
> 
> ----
> 
> In this patch I opt for:
> 
> - Debug: keep asserting, but make the message more helpful by including signal info for the stray SR signal. Includes sender pid and signal number (in case SR signal had been overridden).
> 
> 
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (/shared/projects/openjdk/jdk-jdk/source/src/hotspot/os/posix/signals_posix.cpp:1611), pid=139712, tid=139712
> #  assert(thread != __null) failed: Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 6681, si_uid: 1027)..
> 
> - Release: write a message to tty about this signal, including sender pid and signal name. Otherwise, ignore the signal, dont crash. Repeated signals will generate repeated output:
> 
> 
> thomas at starfish:/shared/projects/openjdk/jdk-jdk/output-release$ ./images/jdk/bin/java -cp $REPROS_JAR de.stuefe.repros.Simple
> <press key>
> Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 239773, si_uid: 1027).
> Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 239774, si_uid: 1027).
> Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 239775, si_uid: 1027).
> 
> 
> Notes:
> - In release builds, we also could quit the VM instead of continuing. I prefer gracefully ignoring the signal, because in our experience quitting - regardless of how good the diagnostic message is - often just leads to frustrated users complaining about VMs mysteriously vanishing. Same goes for crashes, it just pools into the general "java is unstable" notion. I'm open for discussing this.
> - I use tty for the diagnostic  message, which goes to stdout. I really dislike that, error output should go to stderr. But since the rest of the VM handles diagnostic output the same way, I use tty here too.
> 
> Thanks, Thomas

Hi David,

> As per the discussion this is a much broader problem as it could apply to a range of signals and can be wrong even if the thread is not attached. Even if you want to restrict this to SR_SIGNUM the current proposal only handles one case.

I disagree about this being a large issue. Quoting https://mail.openjdk.org/pipermail/core-libs-dev/2022-June/091577.html

*I'd say we limit this to*
*1) signal for which we registered handlers: the handlers should at least
not crash or vanish the VM without a trace*
*2) signals which may conceivably be sent to the VM in the normal course of
events: if the default action is to terminate the VM, we should handle them*

The (1) set is rather small and contains:

- SEGV, ILL, FPE, BUS: These should crash, so crashing out the VM if the signal is sent manually is reasonable (in fact, I use this sometimes).
- TRAP (power only): Used by the compiler; relies not on Thread::current, just on the pc from the context. Sending it from outside should be benign.
- PIPE, XFSZ: As I wrote in the mail thread, we already gracefully ignore these signals when receiving them, for both debug and release builds.
- QUIT (BREAK_SIGNAL): We assert !ReduceSignalUsage in debug. In Release, we gracefully ignore this signal.
- HUP, SIGTERM (SHUTDOWN1_SIGNAL, SHUTDOWN3_SIGNAL): End the VM immediately as expected.
- QUIT (SHUTDOWN2_SIGNAL): Prints thread dump. Does not shutdown the VM.

All of these signals are already handled correctly. Note that with several of them (PIPE, XFSZ) we already established the pattern of ignoring signals instead of vanishing the VM.

The set (2) is atm unknown to me. Are there any more? SIGCHILD is ignored by default. SIGUSR1 exists and exits the VM; this may be another case, but atm we don't handle it and I would not add a handler to it since user apps may use this signal. Any others?

----

The way I see it, my patch would introduce the same handling for SIGUSR2 we already have established for SIGPIPE, SIGXFSZ, and arguably for SIGINT.

Cheers, Thomas

-------------

PR: https://git.openjdk.org/jdk/pull/9181