RFR: 8294211: Zero: Decode arch-specific error context if possible
Thomas Stuefe
stuefe at openjdk.org
Thu Sep 22 18:44:37 UTC 2022
On Thu, 22 Sep 2022 18:20:28 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> After POSIX signal refactorings, Zero error handling had "regressed" a bit: Zero always gets `NULL` as `pc` in error handling code, and thus it fails with SEGV at pc=0x0. We can do better by implementing context decoding where possible.
>
> Unfortunately, this introduces some arch-specific code in Zero code. The arch-specific code is copy-pasted (with inline definitions, if needed) from the relevant `os_linux_*.cpp` files. The unimplemented arches would still report the same confusing `hs_err`-s. We can emulate (and thus test) the generic behavior using new diagnostic VM option.
>
> This reverts parts of [JDK-8259392](https://bugs.openjdk.org/browse/JDK-8259392).
>
> Sample test:
>
>
> import java.lang.reflect.*;
> import sun.misc.Unsafe;
>
> public class Crash {
> public static void main(String... args) throws Exception {
> Field f = Unsafe.class.getDeclaredField("theUnsafe");
> f.setAccessible(true);
> Unsafe u = (Unsafe) f.get(null);
> u.getInt(42); // accesing via broken ptr
> }
> }
>
>
> Linux x86_64 Zero fastdebug crash currently:
>
>
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x0000000000000000, pid=538793, tid=538794
> #
> ...
> # (no native frame info)
> ...
> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a
>
>
> Linux x86_64 Zero fastdebug crash with this patch:
>
>
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x00007fbbbf08b584, pid=520119, tid=520120
> #
> ...
> # Problematic frame:
> # V [libjvm.so+0xcbe584] Unsafe_GetInt+0xe4
> ....
> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a
>
>
> Linux x86_64 Zero fastdebug crash with this patch and `-XX:-DecodeErrorContext`:
>
>
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x0000000000000000, pid=520268, tid=520269
> #
> ...
> # Problematic frame:
> # C 0x0000000000000000
> ...
> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a
>
>
> Additional testing:
> - [x] Linux x86_64 Zero fastdebug eyeballing crash logs
> - [ ] Linux x86_64 Zero fastdebug, `tier1`
> - [x] Linux {x86_64, x86_32, aarch64, arm, riscv64, s390x, ppc64le, ppc64be} Zero fastdebug builds
Good! But why make this conditional with a switch? Who would not want to have better error information?
src/hotspot/os_cpu/linux_zero/os_linux_zero.cpp line 183:
> 181: epc = os::Posix::ucontext_get_pc(uc);
> 182: if (ret_sp) *ret_sp = (intptr_t *)os::Linux::ucontext_get_sp(uc);
> 183: if (ret_fp) *ret_fp = (intptr_t *)os::Linux::ucontext_get_fp(uc);
style nits: curly brackets? Remove space in `intptr_t *`?
src/hotspot/os_cpu/linux_zero/os_linux_zero.cpp line 390:
> 388:
> 389: void os::print_context(outputStream* st, const void* ucVoid) {
> 390: st->print_cr("No context information.");
Regrettable, maybe something for a future RFE?
src/hotspot/os_cpu/linux_zero/os_linux_zero.cpp line 404:
> 402: // this at the end, and hope for the best.
> 403: address pc = os::Posix::ucontext_get_pc(uc);
> 404: print_instructions(st, pc, sizeof(char));
Does print_instructions not use safefetch like os::print_hex_dump does? If yes, remove comment? If no, should it?
-------------
PR: https://git.openjdk.org/jdk/pull/10397
More information about the hotspot-dev
mailing list