[lworld] RFR: 8264340: [lworld] [AArch64] TestLWorld.java assertion failure in OopFlow::build_oop_map

Tue Aug 3 08:02:08 UTC 2021

On Tue, 13 Jul 2021 11:08:47 GMT, Nick Gasson <ngasson at openjdk.org> wrote:

> This happens reliably in TestLWorld::test9() scenario 0 on AArch64:
> 
> 
>   # A fatal error has been detected by the Java Runtime Environment:
>   #
>   # Internal Error (/mnt/nicgas01-pc/valhalla/src/hotspot/share/opto/buildOopMap.cpp:360), pid=8866, tid=8882
>   # assert(false) failed: there should be a oop in OopMap instead of a live raw oop at safepoint
>   #
> 
> 
> The crash can also be reproduced on x86 by running with -XX:+OptoScheduling (this is the default on AArch64).
> 
> The problem seems to be caused by a CheckCastPP node whose input is a raw pointer being scheduled after a SafePoint node such that the raw pointer is live in a register over the safepoint.
> 
> Before scheduling we have a basic block like:
> 
> 
>   R0      73  Phi  ===  15  74  30  [[ 72  71  70  69  68  67  84 ]]  #rawptr:BotPTR !jvms: SchedCrash$MyValue1::setX @ bci:-1 (line 27) SchedCrash::test9 @ bci:25 (line 39)
>   ...
>   R0      84  checkCastPP  ===  11  73  [[ 2 ]] SchedCrash$MyValue1:NotNull:exact *  Oop:SchedCrash$MyValue1:NotNull:exact * !jvms: SchedCrash$MyValue1::<init> @ bci:65 (line 24) SchedCrash$MyValue1::setX @ bci:21 (line 27) SchedCrash::test9 @ bci:25 (line 39)
>   ...
>           6  safePoint  ===  9  0  33  0  0  7  0  78  40  0  140  135  136  139  138  141  [[ 8  4 ]]  !jvms: SchedCrash::test9 @ bci:33 (line 37)
> 
> 
> But after scheduling this is transformed into:
> 
> 
>   R0      73  Phi  ===  15  74  30  [[ 72  71  70  69  68  67  84 ]]  #rawptr:BotPTR !jvms: SchedCrash$MyValue1::setX @ bci:-1 (line 27) SchedCrash::test9 @ bci:25 (line 39)
>   ...
>           6  safePoint  ===  9  0  33  0  0  7  0  78  40  0  140  135  136  139  138  141  | 164  [[ 8  4 ]]  !jvms: SchedCrash::test9 @ bci:33 (line 37)
>   ...
>   R0      84  checkCastPP  ===  11  73  | 67  68  69  70  71  72  [[ 2 ]] SchedCrash$MyValue1:NotNull:exact *  Oop:SchedCrash$MyValue1:NotNull:exact * !jvms: SchedCrash$MyValue1::<init> @ bci:65 (line 24) SchedCrash$MyValue1::setX @ bci:21 (line 27) SchedCrash::test9 @ bci:25 (line 39)
> 
> 
> Where R0 is holding the live raw pointer over the safepoint, which triggers the assertion failure.
> 
> The fix here is to add a precedence edge from any CheckCastPP with a raw pointer input to the following safepoint, which prevents them being rearranged. I'm not very familiar with this code so I can't be sure this is the correct solution, but the same logic exists in GCM's PhaseCFG::schedule_late().

I finally got a chance to debug this. Here's what I think is going on that makes this specific to Valhalla / inline types (based on `TestLWorld::test9`):
1. We buffer the inline type returned by `MyValue1.setX` in the loop right before the safepoint (for example, because `-XX:+AlwaysIncrementalInline -XX:-InlineTypeReturnedAsFields` are set). The corresponding `CheckCastPP` is connected to the safepoint.
2. On return, we re-use the `CheckCastPP` from that allocation instead of allocating again.
3. Scalarization replaces the `CheckCastPP` safepoint usage, allowing it to flow below the safepoint during scheduling.

Therefore, I think your fix is correct. Maybe add a comment explaining the details of how this can happen.

Of course, it's unfortunate that the return keeps the allocation(s) in the loop alive when it would be sufficient to allocate only on return. However, I don't think we can easily fix this and it's hopefully an edge case.

-------------

PR: https://git.openjdk.java.net/valhalla/pull/479