RFR: 8294003: Don't handle si_addr == 0 && si_code == SI_KERNEL SIGSEGVs
Coleen Phillimore
coleenp at openjdk.org
Wed Sep 21 16:48:33 UTC 2022
On Wed, 21 Sep 2022 15:24:59 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> I think x86_32 can/should do the same, because faulting on bona fide incorrect address currently produces a misleading error, see below. From the reading of JDK-8015837, JDK-8004124 and related issues, it looks like this code was added for x86_32 to better handle a kernel bug with exec-shield emulation on hardware without NX bit. But even then "better handle" seems to be only about crashing with more precise message.
>>
>> I think only the ancient hardware runs without NX, and most kernels where this bug appears otherwise are long dead. So, I think we should favor faulting with proper error instead of telling (potentially misleading) things about "unstable signal handling".
>>
>>
>> $ lscpu
>> Model name: Intel(R) Atom(TM) CPU Z530 @ 1.60GHz
>>
>> $ cat /etc/debian_version
>> 11.5
>>
>> $ jdk/bin/java -version
>> openjdk version "20-testing" 2023-03-21
>> OpenJDK Runtime Environment (build 20-testing-builds.shipilev.net-openjdk-jdk-b210-20220919)
>> OpenJDK Server VM (build 20-testing-builds.shipilev.net-openjdk-jdk-b210-20220919, mixed mode, sharing)
>>
>> $ cat Crash.java
>> import java.lang.reflect.*;
>> import sun.misc.Unsafe;
>>
>> public class Crash {
>> public static void main(String... args) throws Exception {
>> Field f = Unsafe.class.getDeclaredField("theUnsafe");
>> f.setAccessible(true);
>> Unsafe u = (Unsafe) f.get(null);
>> u.getInt(-1L); // 0xF....F
>> }
>> }
>>
>> $ jdk/bin/java Crash.java
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> # Internal Error (os_linux_x86.cpp:227), pid=1033, tid=1034
>> # fatal error: An irrecoverable SI_KERNEL SIGSEGV has occurred due to unstable signal handling in this distribution.
>> #
>
>> I think x86_32 can/should do the same, because faulting on bona fide incorrect address currently produces a misleading error, see below.
>
> So I think we can just drop the entirety of `#ifndef` block:
>
>
> diff --git a/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp b/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp
> index 31afbe696a2..9cd0b9a8b58 100644
> --- a/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp
> +++ b/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp
> @@ -220,17 +220,9 @@ bool PosixSignals::pd_hotspot_signal_handler(int sig, siginfo_t* info,
> pc = (address) os::Posix::ucontext_get_pc(uc);
>
> if (sig == SIGSEGV && info->si_addr == 0 && info->si_code == SI_KERNEL) {
> -#ifndef AMD64
> - // Halt if SI_KERNEL before more crashes get misdiagnosed as Java bugs
> - // This can happen in any running code (currently more frequently in
> - // interpreter code but has been seen in compiled code)
> - fatal("An irrecoverable SI_KERNEL SIGSEGV has occurred due "
> - "to unstable signal handling in this distribution.");
> -#else
> // An irrecoverable SI_KERNEL SIGSEGV has occurred.
> // It's likely caused by dereferencing an address larger than TASK_SIZE.
> return false;
> -#endif
> }
>
> // Handle ALL stack overflow variations here
>
>
> On the test above, x86_32 failure before:
>
>
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # Internal Error (os_linux_x86.cpp:227), pid=1007, tid=1008
> # fatal error: An irrecoverable SI_KERNEL SIGSEGV has occurred due to unstable signal handling in this distribution.
> #
> # JRE version: OpenJDK Runtime Environment (20.0) (build 20-testing-builds.shipilev.net-openjdk-jdk-b210-20220919)
> # Java VM: OpenJDK Server VM (20-testing-builds.shipilev.net-openjdk-jdk-b210-20220919, mixed mode, sharing, tiered, serial gc, linux-x86)
> # Problematic frame:
> # V [libjvm.so+0xa095be] PosixSignals::pd_hotspot_signal_handler(int, siginfo_t*, ucontext_t*, JavaThread*)+0x40e
> ...
> --------------- T H R E A D ---------------
>
> Current thread (0xb6a162d0): JavaThread "main" [_thread_in_vm, id=1008, stack(0xb6bda000,0xb6c2b000)]
>
> Stack: [0xb6bda000,0xb6c2b000], sp=0xb6c29810, free space=318k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
> V [libjvm.so+0xa095be] PosixSignals::pd_hotspot_signal_handler(int, siginfo_t*, ucontext_t*, JavaThread*)+0x40e (os_linux_x86.cpp:227)
> V [libjvm.so+0xb477fa] JVM_handle_linux_signal+0x15a (signals_posix.cpp:655)
> V [libjvm.so+0xb47a23] javaSignalHandler(int, siginfo_t*, void*)+0x23 (signals_posix.cpp:683)
> C [linux-gate.so.1+0x570] __kernel_rt_sigreturn+0x0
> J 860 jdk.internal.misc.Unsafe.getInt(Ljava/lang/Object;J)I java.base at 20-testing (0 bytes) @ 0xaf3706e3 [0xaf370620+0x000000c3]
> j jdk.internal.misc.Unsafe.getInt(J)I+3 java.base at 20-testing
> j sun.misc.Unsafe.getInt(J)I+4 jdk.unsupported at 20-testing
> j Crash.main([Ljava/lang/String;)V+26
>
>
> x86_32 failure after:
>
>
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0xb78f5f53, pid=710, tid=711
> #
> # JRE version: OpenJDK Runtime Environment (20.0) (build 20-internal-adhoc.buildbot.openjdk-jdk)
> # Java VM: OpenJDK Server VM (20-internal-adhoc.buildbot.openjdk-jdk, mixed mode, sharing, tiered, serial gc, linux-x86)
> # Problematic frame:
> # V [libjvm.so+0xc35f53] Unsafe_GetInt+0xa3
>
> Current thread (0xb6a162c0): JavaThread "main" [_thread_in_vm, id=711, stack(0xb6b6b000,0xb6bbc000)]
>
> Stack: [0xb6b6b000,0xb6bbc000], sp=0xb6bbacf0, free space=319k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
> V [libjvm.so+0xc35f53] Unsafe_GetInt+0xa3 (unsafe.cpp:223)
> J 884 jdk.internal.misc.Unsafe.getInt(Ljava/lang/Object;J)I java.base at 20-internal (0 bytes) @ 0xaf372063 [0xaf371fa0+0x000000c3]
> j jdk.internal.misc.Unsafe.getInt(J)I+3 java.base at 20-internal
> j sun.misc.Unsafe.getInt(J)I+4 jdk.unsupported at 20-internal
> j Crash.main([Ljava/lang/String;)V+26
>
>
> Current hs_err does not have siginfo printout, while the hs_err with the patch does the proper:
>
>
> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x00000000
Thanks @shipilev for the comment about the origin of this change. We used to see this error a LOT randomly and it was always painful to diagnose, but I agree that this hardware/config is likely long gone and we can remove this special message.
-------------
PR: https://git.openjdk.org/jdk/pull/10340
More information about the hotspot-dev
mailing list