RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v4]
Alexandr Miloslavskiy
duke at openjdk.org
Sun Jan 22 11:41:05 UTC 2023
On Tue, 20 Dec 2022 14:59:55 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:
>> Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration.
>>
>> Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed.
>>
>> After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values.
>>
>> Enables the following
>> ```C++
>> REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized())
>> os::print_register_info_header(st, _context);
>>
>> REENTRANT_LOOP_START(os::print_nth_register_info_max_index())
>> // decode register contents if possible
>> ResourceMark rm(_thread);
>> os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context);
>> REENTRANT_LOOP_END
>>
>> st->cr();
>>
>>
>> Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local)
>
> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
>
> - Restructure os::print_register_info interface
> - Code syle and line length
> - Merge Fix
> - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant
> - Fix whitespace
> - Add reentrant reentry limits
> - Add os::print_register_info_header
> - VMError::report reentrant for register and stack print
I'm studying JVM crashes for our product, which involves heavy JNI usage.
In 26% of all JVM crash reports we received, string `error occurred during error reporting` is present. 6% if I exclude all crashes where some kind of corrupted state is present (GC crashes, compiler crashes, low memory).
I investigated a few of these, and the most common reason for `error occurred during error reporting` is when a register contains `0x0000'0000'ffff'ffff` on Linux. The more generalized problem is when register/stack value points into `---p` memory region.
Here are some examples, trimmed down and formatted for clarity:
R9 =0x0000'0000'ffff'ffff
R9 = [error occurred during error reporting (printing register info), id 0xb, SIGSEGV (0xb) at pc=0x00007ff26665f970]
Dynamic libraries:
0'9000'0000-1'0000'0000 ---p 00000000 00:00 0
R10=0x0000'0000'd30f'12af
R10= [error occurred during error reporting (printing register info), id 0xb, SIGSEGV (0xb) at pc=0x00007fe0891df970]
Dynamic libraries:
0'c6f0'0000-1'0000'0000 ---p 00000000 00:00 0
R8 =0x0000'0000'ffff'ffff
R8 = [error occurred during error reporting (printing register info), id 0xc0000005, EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007fffcf583046]
My conclusions are:
1. The problem occurs often
2. Many, but not all, of these are rather trivial to fix by making value printing more robust.
For example, when pointer value corresponds to `---p` memory region, there's definitely nothing else to print.
It sounds reasonable to me to follow both paths at once - make code more robust where easily possible, and don't let one failed print prevent the other prints.
-------------
PR: https://git.openjdk.org/jdk/pull/11017
More information about the hotspot-dev
mailing list