RFR: JDK-8288556: VM crashes if it gets sent SIGUSR2 from outside

David Holmes dholmes at openjdk.org
Mon Jun 20 12:52:59 UTC 2022


On Thu, 16 Jun 2022 07:47:02 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> The VM uses SIGUSR2 (can be overridden via _JAVA_SR_SIGNUM) to implement suspend/resume on java threads. It sends, via pthread_kill, SIGUSR2 to targeted threads to interrupt them. It knows the target thread, and the target thread is always a VM-attached thread.
> 
> However, if SIGUSR2 gets sent from outside, any thread may receive the signal, and if the target thread is not attached to the VM (e.g. primordial), it is unable to handle it. The result is an assert (debug VM) or a crash (release VM). On my box, this can be reliably reproduced by sending SIGUSR2 to any VM.
> 
> This has been discussed here: https://mail.openjdk.org/pipermail/core-libs-dev/2022-June/091450.html
> 
> The proposed solutions range from "works as designed" (on the ground that sending arbitrary signals to the JVM is an error in itself, and we should rather crash hard and fast) to "lets catch and ignore the signal".
> 
> ----
> 
> In this patch I opt for:
> 
> - Debug: keep asserting, but make the message more helpful by including signal info for the stray SR signal. Includes sender pid and signal number (in case SR signal had been overridden).
> 
> 
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (/shared/projects/openjdk/jdk-jdk/source/src/hotspot/os/posix/signals_posix.cpp:1611), pid=139712, tid=139712
> #  assert(thread != __null) failed: Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 6681, si_uid: 1027)..
> 
> - Release: write a message to tty about this signal, including sender pid and signal name. Otherwise, ignore the signal, dont crash. Repeated signals will generate repeated output:
> 
> 
> thomas at starfish:/shared/projects/openjdk/jdk-jdk/output-release$ ./images/jdk/bin/java -cp $REPROS_JAR de.stuefe.repros.Simple
> <press key>
> Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 239773, si_uid: 1027).
> Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 239774, si_uid: 1027).
> Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 239775, si_uid: 1027).
> 
> 
> Notes:
> - In release builds, we also could quit the VM instead of continuing. I prefer gracefully ignoring the signal, because in our experience quitting - regardless of how good the diagnostic message is - often just leads to frustrated users complaining about VMs mysteriously vanishing. Same goes for crashes, it just pools into the general "java is unstable" notion. I'm open for discussing this.
> - I use tty for the diagnostic  message, which goes to stdout. I really dislike that, error output should go to stderr. But since the rest of the VM handles diagnostic output the same way, I use tty here too.
> 
> Thanks, Thomas

As the saying goes "it's complicated". Whether Linux signal delivery has the same properties today I have no idea. I'm somewhat bemused that SIGUSR1 and SIGUSR2 are not adjacent signals - weird design to say the least. I also don't understand how you can possibly get two signals pending like that at the same time when one is synchronous and the other asynchronous.  4355769 is an interesting read but I'm not sure I can really agree with the analysis (and note that one comment seems to contradict an earlier one, so exactly what happened is unclear). So yeah setting an alternative SR_SIGNUM is problematic.

On the si_pid part ... I have no prior knowledge of this (didn't even know it existed), so have no idea whether it is reliable or not.

Seems to me we have far greater risk of breaking something unexpectedly with changing this code than we potentially benefit from making the change.

So I'd vote for doing nothing.

-------------

PR: https://git.openjdk.org/jdk/pull/9181


More information about the hotspot-runtime-dev mailing list