[lworld] RFR: 8264340: [lworld] [AArch64] TestLWorld.java assertion failure in OopFlow::build_oop_map
Nick Gasson
ngasson at openjdk.java.net
Tue Aug 3 10:34:22 UTC 2021
On Tue, 3 Aug 2021 07:57:59 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:
>> This happens reliably in TestLWorld::test9() scenario 0 on AArch64:
>>
>>
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> # Internal Error (/mnt/nicgas01-pc/valhalla/src/hotspot/share/opto/buildOopMap.cpp:360), pid=8866, tid=8882
>> # assert(false) failed: there should be a oop in OopMap instead of a live raw oop at safepoint
>> #
>>
>>
>> The crash can also be reproduced on x86 by running with -XX:+OptoScheduling (this is the default on AArch64).
>>
>> The problem seems to be caused by a CheckCastPP node whose input is a raw pointer being scheduled after a SafePoint node such that the raw pointer is live in a register over the safepoint.
>>
>> Before scheduling we have a basic block like:
>>
>>
>> R0 73 Phi === 15 74 30 [[ 72 71 70 69 68 67 84 ]] #rawptr:BotPTR !jvms: SchedCrash$MyValue1::setX @ bci:-1 (line 27) SchedCrash::test9 @ bci:25 (line 39)
>> ...
>> R0 84 checkCastPP === 11 73 [[ 2 ]] SchedCrash$MyValue1:NotNull:exact * Oop:SchedCrash$MyValue1:NotNull:exact * !jvms: SchedCrash$MyValue1::<init> @ bci:65 (line 24) SchedCrash$MyValue1::setX @ bci:21 (line 27) SchedCrash::test9 @ bci:25 (line 39)
>> ...
>> 6 safePoint === 9 0 33 0 0 7 0 78 40 0 140 135 136 139 138 141 [[ 8 4 ]] !jvms: SchedCrash::test9 @ bci:33 (line 37)
>>
>>
>> But after scheduling this is transformed into:
>>
>>
>> R0 73 Phi === 15 74 30 [[ 72 71 70 69 68 67 84 ]] #rawptr:BotPTR !jvms: SchedCrash$MyValue1::setX @ bci:-1 (line 27) SchedCrash::test9 @ bci:25 (line 39)
>> ...
>> 6 safePoint === 9 0 33 0 0 7 0 78 40 0 140 135 136 139 138 141 | 164 [[ 8 4 ]] !jvms: SchedCrash::test9 @ bci:33 (line 37)
>> ...
>> R0 84 checkCastPP === 11 73 | 67 68 69 70 71 72 [[ 2 ]] SchedCrash$MyValue1:NotNull:exact * Oop:SchedCrash$MyValue1:NotNull:exact * !jvms: SchedCrash$MyValue1::<init> @ bci:65 (line 24) SchedCrash$MyValue1::setX @ bci:21 (line 27) SchedCrash::test9 @ bci:25 (line 39)
>>
>>
>> Where R0 is holding the live raw pointer over the safepoint, which triggers the assertion failure.
>>
>> The fix here is to add a precedence edge from any CheckCastPP with a raw pointer input to the following safepoint, which prevents them being rearranged. I'm not very familiar with this code so I can't be sure this is the correct solution, but the same logic exists in GCM's PhaseCFG::schedule_late().
>
> I finally got a chance to debug this. Here's what I think is going on that makes this specific to Valhalla / inline types (based on `TestLWorld::test9`):
> 1. We buffer the inline type returned by `MyValue1.setX` in the loop right before the safepoint (for example, because `-XX:+AlwaysIncrementalInline -XX:-InlineTypeReturnedAsFields` are set). The corresponding `CheckCastPP` is connected to the safepoint.
> 2. On return, we re-use the `CheckCastPP` from that allocation instead of allocating again.
> 3. Scalarization replaces the `CheckCastPP` safepoint usage, allowing it to flow below the safepoint during scheduling.
>
> Therefore, I think your fix is correct. Maybe add a comment explaining the details of how this can happen.
>
> Of course, it's unfortunate that the return keeps the allocation(s) in the loop alive when it would be sufficient to allocate only on return. However, I don't think we can easily fix this and it's hopefully an edge case.
Thanks for looking into this @TobiHartmann. I've added some more explanation to the comment.
-------------
PR: https://git.openjdk.java.net/valhalla/pull/479
More information about the valhalla-dev
mailing list