RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v4]

Alexandr Miloslavskiy duke at openjdk.org
Sun Jan 22 11:41:05 UTC 2023


On Tue, 20 Dec 2022 14:59:55 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration.
>> 
>> Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed.
>> 
>> After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values.
>> 
>> Enables the following
>> ```C++
>> REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized())
>>   os::print_register_info_header(st, _context);
>> 
>>   REENTRANT_LOOP_START(os::print_nth_register_info_max_index())
>>     // decode register contents if possible
>>     ResourceMark rm(_thread);
>>     os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context);
>>   REENTRANT_LOOP_END
>> 
>>   st->cr();
>> 
>> 
>> Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local)
>
> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
> 
>  - Restructure os::print_register_info interface
>  - Code syle and line length
>  - Merge Fix
>  - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant
>  - Fix whitespace
>  - Add reentrant reentry limits
>  - Add os::print_register_info_header
>  - VMError::report reentrant for register and stack print

I'm studying JVM crashes for our product, which involves heavy JNI usage.

In 26% of all JVM crash reports we received, string `error occurred during error reporting` is present. 6% if I exclude all crashes where some kind of corrupted state is present (GC crashes, compiler crashes, low memory).

I investigated a few of these, and the most common reason for `error occurred during error reporting` is when a register contains `0x0000'0000'ffff'ffff` on Linux. The more generalized problem is when register/stack value points into `---p` memory region.

Here are some examples, trimmed down and formatted for clarity:

R9 =0x0000'0000'ffff'ffff
R9 = [error occurred during error reporting (printing register info), id 0xb, SIGSEGV (0xb) at pc=0x00007ff26665f970]

Dynamic libraries:
0'9000'0000-1'0000'0000 ---p 00000000 00:00 0



R10=0x0000'0000'd30f'12af
R10= [error occurred during error reporting (printing register info), id 0xb, SIGSEGV (0xb) at pc=0x00007fe0891df970]

Dynamic libraries:
0'c6f0'0000-1'0000'0000 ---p 00000000 00:00 0



R8 =0x0000'0000'ffff'ffff
R8 = [error occurred during error reporting (printing register info), id 0xc0000005, EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007fffcf583046]


My conclusions are:
1. The problem occurs often
2. Many, but not all, of these are rather trivial to fix by making value printing more robust.
For example, when pointer value corresponds to `---p` memory region, there's definitely nothing else to print.

It sounds reasonable to me to follow both paths at once - make code more robust where easily possible, and don't let one failed print prevent the other prints.

-------------

PR: https://git.openjdk.org/jdk/pull/11017


More information about the hotspot-dev mailing list