RFR: 8294003: Don't handle si_addr == 0 && si_code == SI_KERNEL SIGSEGVs

Coleen Phillimore coleenp at openjdk.org
Wed Sep 21 16:48:33 UTC 2022


On Wed, 21 Sep 2022 15:24:59 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> I think x86_32 can/should do the same, because faulting on bona fide incorrect address currently produces a misleading error, see below. From the reading of JDK-8015837, JDK-8004124 and related issues, it looks like this code was added for x86_32 to better handle a kernel bug with exec-shield emulation on hardware without NX bit. But even then "better handle" seems to be only about crashing with more precise message.
>> 
>> I think only the ancient hardware runs without NX, and most kernels where this bug appears otherwise are long dead. So, I think we should favor faulting with proper error instead of telling (potentially misleading) things about "unstable signal handling".
>> 
>> 
>> $ lscpu
>> Model name:                      Intel(R) Atom(TM) CPU Z530   @ 1.60GHz
>> 
>> $ cat /etc/debian_version 
>> 11.5
>> 
>> $ jdk/bin/java -version
>> openjdk version "20-testing" 2023-03-21
>> OpenJDK Runtime Environment (build 20-testing-builds.shipilev.net-openjdk-jdk-b210-20220919)
>> OpenJDK Server VM (build 20-testing-builds.shipilev.net-openjdk-jdk-b210-20220919, mixed mode, sharing)
>> 
>> $ cat Crash.java 
>> import java.lang.reflect.*;
>> import sun.misc.Unsafe;
>> 
>> public class Crash {
>>   public static void main(String... args) throws Exception {
>>     Field f = Unsafe.class.getDeclaredField("theUnsafe");
>>     f.setAccessible(true);
>>     Unsafe u = (Unsafe) f.get(null);
>>     u.getInt(-1L); // 0xF....F
>>   }
>> }
>> 
>> $ jdk/bin/java Crash.java
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (os_linux_x86.cpp:227), pid=1033, tid=1034
>> #  fatal error: An irrecoverable SI_KERNEL SIGSEGV has occurred due to unstable signal handling in this distribution.
>> #
>
>> I think x86_32 can/should do the same, because faulting on bona fide incorrect address currently produces a misleading error, see below.
> 
> So I think we can just drop the entirety of `#ifndef` block:
> 
> 
> diff --git a/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp b/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp
> index 31afbe696a2..9cd0b9a8b58 100644
> --- a/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp
> +++ b/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp
> @@ -220,17 +220,9 @@ bool PosixSignals::pd_hotspot_signal_handler(int sig, siginfo_t* info,
>      pc = (address) os::Posix::ucontext_get_pc(uc);
>  
>      if (sig == SIGSEGV && info->si_addr == 0 && info->si_code == SI_KERNEL) {
> -#ifndef AMD64
> -    // Halt if SI_KERNEL before more crashes get misdiagnosed as Java bugs
> -    // This can happen in any running code (currently more frequently in
> -    // interpreter code but has been seen in compiled code)
> -      fatal("An irrecoverable SI_KERNEL SIGSEGV has occurred due "
> -            "to unstable signal handling in this distribution.");
> -#else
>        // An irrecoverable SI_KERNEL SIGSEGV has occurred.
>        // It's likely caused by dereferencing an address larger than TASK_SIZE.
>        return false;
> -#endif
>      }
>  
>      // Handle ALL stack overflow variations here
> 
> 
> On the test above, x86_32 failure before:
> 
> 
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (os_linux_x86.cpp:227), pid=1007, tid=1008
> #  fatal error: An irrecoverable SI_KERNEL SIGSEGV has occurred due to unstable signal handling in this distribution.
> #
> # JRE version: OpenJDK Runtime Environment (20.0) (build 20-testing-builds.shipilev.net-openjdk-jdk-b210-20220919)
> # Java VM: OpenJDK Server VM (20-testing-builds.shipilev.net-openjdk-jdk-b210-20220919, mixed mode, sharing, tiered, serial gc, linux-x86)
> # Problematic frame:
> # V  [libjvm.so+0xa095be]  PosixSignals::pd_hotspot_signal_handler(int, siginfo_t*, ucontext_t*, JavaThread*)+0x40e
> ...
> ---------------  T H R E A D  ---------------
> 
> Current thread (0xb6a162d0):  JavaThread "main" [_thread_in_vm, id=1008, stack(0xb6bda000,0xb6c2b000)]
> 
> Stack: [0xb6bda000,0xb6c2b000],  sp=0xb6c29810,  free space=318k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
> V  [libjvm.so+0xa095be]  PosixSignals::pd_hotspot_signal_handler(int, siginfo_t*, ucontext_t*, JavaThread*)+0x40e  (os_linux_x86.cpp:227)
> V  [libjvm.so+0xb477fa]  JVM_handle_linux_signal+0x15a  (signals_posix.cpp:655)
> V  [libjvm.so+0xb47a23]  javaSignalHandler(int, siginfo_t*, void*)+0x23  (signals_posix.cpp:683)
> C  [linux-gate.so.1+0x570]  __kernel_rt_sigreturn+0x0
> J 860  jdk.internal.misc.Unsafe.getInt(Ljava/lang/Object;J)I java.base at 20-testing (0 bytes) @ 0xaf3706e3 [0xaf370620+0x000000c3]
> j  jdk.internal.misc.Unsafe.getInt(J)I+3 java.base at 20-testing
> j  sun.misc.Unsafe.getInt(J)I+4 jdk.unsupported at 20-testing
> j  Crash.main([Ljava/lang/String;)V+26
> 
> 
> x86_32 failure after:
> 
> 
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0xb78f5f53, pid=710, tid=711
> #
> # JRE version: OpenJDK Runtime Environment (20.0) (build 20-internal-adhoc.buildbot.openjdk-jdk)
> # Java VM: OpenJDK Server VM (20-internal-adhoc.buildbot.openjdk-jdk, mixed mode, sharing, tiered, serial gc, linux-x86)
> # Problematic frame:
> # V  [libjvm.so+0xc35f53]  Unsafe_GetInt+0xa3
> 
> Current thread (0xb6a162c0):  JavaThread "main" [_thread_in_vm, id=711, stack(0xb6b6b000,0xb6bbc000)]
> 
> Stack: [0xb6b6b000,0xb6bbc000],  sp=0xb6bbacf0,  free space=319k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
> V  [libjvm.so+0xc35f53]  Unsafe_GetInt+0xa3  (unsafe.cpp:223)
> J 884  jdk.internal.misc.Unsafe.getInt(Ljava/lang/Object;J)I java.base at 20-internal (0 bytes) @ 0xaf372063 [0xaf371fa0+0x000000c3]
> j  jdk.internal.misc.Unsafe.getInt(J)I+3 java.base at 20-internal
> j  sun.misc.Unsafe.getInt(J)I+4 jdk.unsupported at 20-internal
> j  Crash.main([Ljava/lang/String;)V+26
> 
> 
> Current hs_err does not have siginfo printout, while the hs_err with the patch does the proper:
> 
> 
> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x00000000

Thanks @shipilev for the comment about the origin of this change.  We used to see this error a LOT randomly and it was always painful to diagnose, but I agree that this hardware/config is likely long gone and we can remove this special message.

-------------

PR: https://git.openjdk.org/jdk/pull/10340


More information about the hotspot-dev mailing list