RFR: 8284273: Early crashes in os::print_context on AArch64

Thomas Stuefe stuefe at openjdk.java.net
Thu May 19 13:57:49 UTC 2022


On Mon, 16 May 2022 21:58:56 GMT, Gerard Ziemski <gziemski at openjdk.org> wrote:

> Our error reporting mechanism can crash again, while reporting the original crash, because we try to access a possibly bogus memory location, for example while printing a location of memory referred by a CPU register. In such case, we skip over the already reported section (including the one where the secondary crash has occurred).
> 
> This is possible, because we section the error reporting code into sections, called "steps", with a state that retains the information about the last attempted step.
> 
> Here, we are concerned with two particular steps: "printing register info" and "printing registers, top of stack, instructions near pc". In those two steps we report 4 distinct chunks of info:
> 
> 1. registers' raw values (safe)
> 2. registers' decoded content (risky)
> 3. memory around the "sp" (risky)
> 4. memory around the "pc" (risky)
> 
> The issue here is that on some platforms (Linux aarch64) a single "step" contains 2, 3 and 4, all "risky" sections, so if we crash early in the step, we skip the rest of it and end up never reporting the later sections, ex:
> 
> 
> STEP("printing register info")
>   print_register_info
>     1. registers' raw values (safe)
> 
> STEP("printing registers, top of stack, instructions near pc")
>   print_context
>     2. registers' decoded content (risky)
>     3. memory around the "sp" (risky)
>     4. memory around the "pc" (risky)
> 
> 
> Other platforms (Linux x64) have a single "step" containing 1, 3 and 4, and another one containing just 2, so that we always get to report section 3, ex:
> 
> 
> STEP("printing register info")
>   print_register_info
>     2. registers' decoded content (risky)
> 
> STEP("printing registers, top of stack, instructions near pc")
>   print_context
>     1. registers' raw values (safe)
>     3. memory around the "sp" (risky)
>     4. memory around the "pc" (risky)
> 
> 
> This fix proposes to rearrange the STEPS, so that the less risky sections come first and to split the STEP with 3 sections into 2 STEPS, like  so:
> 
> 
> STEP("printing registers")
>   print_context
>     1. registers' raw values (safe)
> 
> STEP("printing register info")
>   print_register_info
>     2. registers' decoded content (risky)
> 
> STEP("printing top of stack, instructions near pc")
>   print_tos_pc
>     3. memory around the "sp" (risky)
>     4. memory around the "pc" (risky)

Hi Gerard,

I like this, thank you for choosing this route for the patch.

My only nit would be that `os::print_tos_pc` isn't terribly clear, I would prefer a longer but more descriptive name.

As Andrew wrote, a follow-up could be to harden print_location and similar places by using SafeFetch.

Cheers, Thomas

src/hotspot/share/utilities/vmError.cpp line 896:

> 894:        ResourceMark rm(_thread);
> 895:        os::print_register_info(st, _context);
> 896:        st->cr();

Unrelated to your patch, but it would be nice if we could get rid of RA usage here, since RA may malloc and malloc is not signal safe.

-------------

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/8736


More information about the hotspot-dev mailing list