[lworld] RFR: 8264340: [lworld] [AArch64] TestLWorld.java assertion failure in OopFlow::build_oop_map

Tue Aug 3 10:34:22 UTC 2021

On Tue, 3 Aug 2021 07:57:59 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

>> This happens reliably in TestLWorld::test9() scenario 0 on AArch64:
>> 
>> 
>>   # A fatal error has been detected by the Java Runtime Environment:
>>   #
>>   # Internal Error (/mnt/nicgas01-pc/valhalla/src/hotspot/share/opto/buildOopMap.cpp:360), pid=8866, tid=8882
>>   # assert(false) failed: there should be a oop in OopMap instead of a live raw oop at safepoint
>>   #
>> 
>> 
>> The crash can also be reproduced on x86 by running with -XX:+OptoScheduling (this is the default on AArch64).
>> 
>> The problem seems to be caused by a CheckCastPP node whose input is a raw pointer being scheduled after a SafePoint node such that the raw pointer is live in a register over the safepoint.
>> 
>> Before scheduling we have a basic block like:
>> 
>> 
>>   R0      73  Phi  ===  15  74  30  [[ 72  71  70  69  68  67  84 ]]  #rawptr:BotPTR !jvms: SchedCrash$MyValue1::setX @ bci:-1 (line 27) SchedCrash::test9 @ bci:25 (line 39)
>>   ...
>>   R0      84  checkCastPP  ===  11  73  [[ 2 ]] SchedCrash$MyValue1:NotNull:exact *  Oop:SchedCrash$MyValue1:NotNull:exact * !jvms: SchedCrash$MyValue1::<init> @ bci:65 (line 24) SchedCrash$MyValue1::setX @ bci:21 (line 27) SchedCrash::test9 @ bci:25 (line 39)
>>   ...
>>           6  safePoint  ===  9  0  33  0  0  7  0  78  40  0  140  135  136  139  138  141  [[ 8  4 ]]  !jvms: SchedCrash::test9 @ bci:33 (line 37)
>> 
>> 
>> But after scheduling this is transformed into:
>> 
>> 
>>   R0      73  Phi  ===  15  74  30  [[ 72  71  70  69  68  67  84 ]]  #rawptr:BotPTR !jvms: SchedCrash$MyValue1::setX @ bci:-1 (line 27) SchedCrash::test9 @ bci:25 (line 39)
>>   ...
>>           6  safePoint  ===  9  0  33  0  0  7  0  78  40  0  140  135  136  139  138  141  | 164  [[ 8  4 ]]  !jvms: SchedCrash::test9 @ bci:33 (line 37)
>>   ...
>>   R0      84  checkCastPP  ===  11  73  | 67  68  69  70  71  72  [[ 2 ]] SchedCrash$MyValue1:NotNull:exact *  Oop:SchedCrash$MyValue1:NotNull:exact * !jvms: SchedCrash$MyValue1::<init> @ bci:65 (line 24) SchedCrash$MyValue1::setX @ bci:21 (line 27) SchedCrash::test9 @ bci:25 (line 39)
>> 
>> 
>> Where R0 is holding the live raw pointer over the safepoint, which triggers the assertion failure.
>> 
>> The fix here is to add a precedence edge from any CheckCastPP with a raw pointer input to the following safepoint, which prevents them being rearranged. I'm not very familiar with this code so I can't be sure this is the correct solution, but the same logic exists in GCM's PhaseCFG::schedule_late().
>
> I finally got a chance to debug this. Here's what I think is going on that makes this specific to Valhalla / inline types (based on `TestLWorld::test9`):
> 1. We buffer the inline type returned by `MyValue1.setX` in the loop right before the safepoint (for example, because `-XX:+AlwaysIncrementalInline -XX:-InlineTypeReturnedAsFields` are set). The corresponding `CheckCastPP` is connected to the safepoint.
> 2. On return, we re-use the `CheckCastPP` from that allocation instead of allocating again.
> 3. Scalarization replaces the `CheckCastPP` safepoint usage, allowing it to flow below the safepoint during scheduling.
> 
> Therefore, I think your fix is correct. Maybe add a comment explaining the details of how this can happen.
> 
> Of course, it's unfortunate that the return keeps the allocation(s) in the loop alive when it would be sufficient to allocate only on return. However, I don't think we can easily fix this and it's hopefully an edge case.

Thanks for looking into this @TobiHartmann. I've added some more explanation to the comment.

-------------

PR: https://git.openjdk.java.net/valhalla/pull/479