RFR: 8294211: Zero: Decode arch-specific error context if possible

Thomas Stuefe stuefe at openjdk.org
Thu Sep 22 18:44:37 UTC 2022


On Thu, 22 Sep 2022 18:20:28 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> After POSIX signal refactorings, Zero error handling had "regressed" a bit: Zero always gets `NULL` as `pc` in error handling code, and thus it fails with SEGV at pc=0x0. We can do better by implementing context decoding where possible.
> 
> Unfortunately, this introduces some arch-specific code in Zero code. The arch-specific code is copy-pasted (with inline definitions, if needed) from the relevant `os_linux_*.cpp` files. The unimplemented arches would still report the same confusing `hs_err`-s. We can emulate (and thus test) the generic behavior using new diagnostic VM option.
> 
> This reverts parts of [JDK-8259392](https://bugs.openjdk.org/browse/JDK-8259392).
> 
> Sample test:
> 
> 
> import java.lang.reflect.*;
> import sun.misc.Unsafe;
> 
> public class Crash {
>   public static void main(String... args) throws Exception {
>     Field f = Unsafe.class.getDeclaredField("theUnsafe");
>     f.setAccessible(true);
>     Unsafe u = (Unsafe) f.get(null);
>     u.getInt(42); // accesing via broken ptr
>   }
> }
> 
> 
> Linux x86_64 Zero fastdebug crash currently:
> 
> 
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x0000000000000000, pid=538793, tid=538794
> #
> ...
> # (no native frame info)
> ...
> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a
> 
> 
> Linux x86_64 Zero fastdebug crash with this patch:
> 
> 
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007fbbbf08b584, pid=520119, tid=520120
> #
> ...
> # Problematic frame:
> # V  [libjvm.so+0xcbe584]  Unsafe_GetInt+0xe4
> ....
> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a
> 
> 
> Linux x86_64 Zero fastdebug crash with this patch and `-XX:-DecodeErrorContext`:
> 
> 
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x0000000000000000, pid=520268, tid=520269
> #
> ...
> # Problematic frame:
> # C  0x0000000000000000
> ...
> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a
> 
> 
> Additional testing:
>  - [x] Linux x86_64 Zero fastdebug eyeballing crash logs
>  - [ ] Linux x86_64 Zero fastdebug, `tier1`
>  - [x] Linux {x86_64, x86_32, aarch64, arm, riscv64, s390x, ppc64le, ppc64be} Zero fastdebug builds

Good! But why make this conditional with a switch? Who would not want to have better error information?

src/hotspot/os_cpu/linux_zero/os_linux_zero.cpp line 183:

> 181:     epc = os::Posix::ucontext_get_pc(uc);
> 182:     if (ret_sp) *ret_sp = (intptr_t *)os::Linux::ucontext_get_sp(uc);
> 183:     if (ret_fp) *ret_fp = (intptr_t *)os::Linux::ucontext_get_fp(uc);

style nits: curly brackets? Remove space in `intptr_t *`?

src/hotspot/os_cpu/linux_zero/os_linux_zero.cpp line 390:

> 388: 
> 389: void os::print_context(outputStream* st, const void* ucVoid) {
> 390:   st->print_cr("No context information.");

Regrettable, maybe something for a future RFE?

src/hotspot/os_cpu/linux_zero/os_linux_zero.cpp line 404:

> 402:   // this at the end, and hope for the best.
> 403:   address pc = os::Posix::ucontext_get_pc(uc);
> 404:   print_instructions(st, pc, sizeof(char));

Does print_instructions not use safefetch like os::print_hex_dump does? If yes, remove comment? If no, should it?

-------------

PR: https://git.openjdk.org/jdk/pull/10397


More information about the hotspot-dev mailing list