RFR: JDK-8288556: VM crashes if it gets sent SIGUSR2 from outside [v2]

David Holmes dholmes at openjdk.org
Tue Jun 21 05:29:52 UTC 2022


On Tue, 21 Jun 2022 05:20:46 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> The VM uses SIGUSR2 (can be overridden via _JAVA_SR_SIGNUM) to implement suspend/resume on java threads. It sends, via pthread_kill, SIGUSR2 to targeted threads to interrupt them. It knows the target thread, and the target thread is always a VM-attached thread.
>> 
>> However, if SIGUSR2 gets sent from outside, any thread may receive the signal, and if the target thread is not attached to the VM (e.g. primordial), it is unable to handle it. The result is an assert (debug VM) or a crash (release VM). On my box, this can be reliably reproduced by sending SIGUSR2 to any VM.
>> 
>> This has been discussed here: https://mail.openjdk.org/pipermail/core-libs-dev/2022-June/091450.html
>> 
>> The proposed solutions range from "works as designed" (on the ground that sending arbitrary signals to the JVM is an error in itself, and we should rather crash hard and fast) to "lets catch and ignore the signal".
>> 
>> ----
>> 
>> In this patch I opt for:
>> 
>> - Debug: keep asserting, but make the message more helpful by including signal info for the stray SR signal. Includes sender pid and signal number (in case SR signal had been overridden).
>> 
>> 
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (/shared/projects/openjdk/jdk-jdk/source/src/hotspot/os/posix/signals_posix.cpp:1611), pid=139712, tid=139712
>> #  assert(thread != __null) failed: Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 6681, si_uid: 1027)..
>> 
>> - Release: write a message to tty about this signal, including sender pid and signal name. Otherwise, ignore the signal, dont crash. Repeated signals will generate repeated output:
>> 
>> 
>> thomas at starfish:/shared/projects/openjdk/jdk-jdk/output-release$ ./images/jdk/bin/java -cp $REPROS_JAR de.stuefe.repros.Simple
>> <press key>
>> Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 239773, si_uid: 1027).
>> Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 239774, si_uid: 1027).
>> Non-attached thread received stray SR signal (siginfo: si_signo: 12 (SIGUSR2), si_code: 0 (SI_USER), si_pid: 239775, si_uid: 1027).
>> 
>> 
>> Notes:
>> - In release builds, we also could quit the VM instead of continuing. I prefer gracefully ignoring the signal, because in our experience quitting - regardless of how good the diagnostic message is - often just leads to frustrated users complaining about VMs mysteriously vanishing. Same goes for crashes, it just pools into the general "java is unstable" notion. I'm open for discussing this.
>> - I use tty for the diagnostic  message, which goes to stdout. I really dislike that, error output should go to stderr. But since the rest of the VM handles diagnostic output the same way, I use tty here too.
>> 
>> Thanks, Thomas
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   use log_warning(os) instead of tty printing

Thanks Thomas!

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.org/jdk/pull/9181


More information about the hotspot-runtime-dev mailing list