RFR: JDK-8288556: VM crashes if it gets sent SIGUSR2 from outside

David Holmes dholmes at openjdk.org
Mon Jun 20 02:09:53 UTC 2022


On Thu, 16 Jun 2022 07:47:02 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> The VM uses SIGUSR2 (can be overridden via _JAVA_SR_SIGNUM) to implement suspend/resume on java threads. It sends, via pthread_kill, SIGUSR2 to targeted threads to interrupt them. It knows the target thread, and the target thread is always a VM-attached thread.
> 
> However, if SIGUSR2 gets sent from outside, any thread may receive the signal, and if the target thread is not attached to the VM (e.g. primordial), it is unable to handle it. The result is an assert (debug VM) or a crash (release VM). On my box, this can be reliably reproduced by sending SIGUSR2 to any VM.
> 
> This has been discussed here: https://mail.openjdk.org/pipermail/core-libs-dev/2022-June/091450.html
> 
> The proposed solutions range from "works as designed" (on the ground that sending arbitrary signals to the JVM is an error in itself, and we should rather crash hard and fast) to "lets catch and ignore the signal".
> 
> ----
> 
> In this patch I opt for:
> 
> - Debug: keep asserting, but make the message more helpful by including signal info for the stray SR signal. Includes sender pid and signal number (in case SR signal had been overridden).
> 
> 
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (/shared/projects/openjdk/jdk-jdk/source/src/hotspot/os/posix/signals_posix.cpp:1611), pid=139712, tid=139712
> #  assert(thread != __null) failed: Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 6681, si_uid: 1027)..
> 
> - Release: write a message to tty about this signal, including sender pid and signal name. Otherwise, ignore the signal, dont crash. Repeated signals will generate repeated output:
> 
> 
> thomas at starfish:/shared/projects/openjdk/jdk-jdk/output-release$ ./images/jdk/bin/java -cp $REPROS_JAR de.stuefe.repros.Simple
> <press key>
> Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 239773, si_uid: 1027).
> Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 239774, si_uid: 1027).
> Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 239775, si_uid: 1027).
> 
> 
> Notes:
> - In release builds, we also could quit the VM instead of continuing. I prefer gracefully ignoring the signal, because in our experience quitting - regardless of how good the diagnostic message is - often just leads to frustrated users complaining about VMs mysteriously vanishing. Same goes for crashes, it just pools into the general "java is unstable" notion. I'm open for discussing this.
> - I use tty for the diagnostic  message, which goes to stdout. I really dislike that, error output should go to stderr. But since the rest of the VM handles diagnostic output the same way, I use tty here too.
> 
> Thanks, Thomas

As per the discussion this is a much broader problem as it could apply to a range of signals and can be wrong even if the thread is not attached. Even if you want to restrict this to SR_SIGNUM the current proposal only handles one case.

src/hotspot/os/posix/signals_posix.cpp line 1613:

> 1611:     ss.print_raw(").");
> 1612:     assert(thread != NULL, "%s.", ss.base());
> 1613:     tty->print_cr("%s", ss.base());

Surely this should be a regular VM warning not a raw write to tty - neither of which are signal-safe.

-------------

PR: https://git.openjdk.org/jdk/pull/9181


More information about the hotspot-runtime-dev mailing list