RFR: 8284273: Early crashes in os::print_context on AArch64
Thomas Stuefe
stuefe at openjdk.java.net
Thu May 19 13:57:49 UTC 2022
On Mon, 16 May 2022 21:58:56 GMT, Gerard Ziemski <gziemski at openjdk.org> wrote:
> Our error reporting mechanism can crash again, while reporting the original crash, because we try to access a possibly bogus memory location, for example while printing a location of memory referred by a CPU register. In such case, we skip over the already reported section (including the one where the secondary crash has occurred).
>
> This is possible, because we section the error reporting code into sections, called "steps", with a state that retains the information about the last attempted step.
>
> Here, we are concerned with two particular steps: "printing register info" and "printing registers, top of stack, instructions near pc". In those two steps we report 4 distinct chunks of info:
>
> 1. registers' raw values (safe)
> 2. registers' decoded content (risky)
> 3. memory around the "sp" (risky)
> 4. memory around the "pc" (risky)
>
> The issue here is that on some platforms (Linux aarch64) a single "step" contains 2, 3 and 4, all "risky" sections, so if we crash early in the step, we skip the rest of it and end up never reporting the later sections, ex:
>
>
> STEP("printing register info")
> print_register_info
> 1. registers' raw values (safe)
>
> STEP("printing registers, top of stack, instructions near pc")
> print_context
> 2. registers' decoded content (risky)
> 3. memory around the "sp" (risky)
> 4. memory around the "pc" (risky)
>
>
> Other platforms (Linux x64) have a single "step" containing 1, 3 and 4, and another one containing just 2, so that we always get to report section 3, ex:
>
>
> STEP("printing register info")
> print_register_info
> 2. registers' decoded content (risky)
>
> STEP("printing registers, top of stack, instructions near pc")
> print_context
> 1. registers' raw values (safe)
> 3. memory around the "sp" (risky)
> 4. memory around the "pc" (risky)
>
>
> This fix proposes to rearrange the STEPS, so that the less risky sections come first and to split the STEP with 3 sections into 2 STEPS, like so:
>
>
> STEP("printing registers")
> print_context
> 1. registers' raw values (safe)
>
> STEP("printing register info")
> print_register_info
> 2. registers' decoded content (risky)
>
> STEP("printing top of stack, instructions near pc")
> print_tos_pc
> 3. memory around the "sp" (risky)
> 4. memory around the "pc" (risky)
Hi Gerard,
I like this, thank you for choosing this route for the patch.
My only nit would be that `os::print_tos_pc` isn't terribly clear, I would prefer a longer but more descriptive name.
As Andrew wrote, a follow-up could be to harden print_location and similar places by using SafeFetch.
Cheers, Thomas
src/hotspot/share/utilities/vmError.cpp line 896:
> 894: ResourceMark rm(_thread);
> 895: os::print_register_info(st, _context);
> 896: st->cr();
Unrelated to your patch, but it would be nice if we could get rid of RA usage here, since RA may malloc and malloc is not signal safe.
-------------
Marked as reviewed by stuefe (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/8736
More information about the hotspot-dev
mailing list