RFR: 8331717: C2: Crash with SIGFPE Because Loop Predication Wrongly Hoists Division Requiring Zero Check [v2]
Quan Anh Mai
qamai at openjdk.org
Thu Dec 19 15:19:37 UTC 2024
On Thu, 19 Dec 2024 11:34:12 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:
>> @chhagedorn I think that the wrong control input of the `DivINode` is a potential severe issue that may have more adverse impacts beyond this case. This fix also effectively disables loop predication to run on divisions and their outputs, which may introduce regressions. I believe the case here is peculiar because the zero check is not predicated out of the loop due to failed speculation. As a result, if this issue is not very likely to be hit, I strongly prefer the proper fix of correctly wiring the `230 Div` to `175 IfTrue`. Otherwise, if you want to fix this issue ASAP, then I agree with this patch. Thanks a lot.
>
>> I think that the wrong control input of the DivINode is a potential severe issue that may have more adverse impacts beyond this case.
>
> This is a general problem which I've been thinking about many times: Should we make sure that a division, whose divisor could be zero, is always pinned at a zero check? I'm not entirely sure about all places where a division node could become control dependent on other control nodes.
>
> Back there, when I worked on [JDK-8257822](https://bugs.openjdk.org/browse/JDK-8257822), the problem was that we split a zero check through a region and the div node ended up at a region. Eventually, the division was wrongly hoisted before the actual zero check.
>
> We were not really sure about all the places where a zero check could lose the connection to its zero check. As soon as this happens, I think we always have a problem when rewiring a division, pinned at some control node that's unrelated, to an earlier dominating check: We could skip over the original check.
>
> Therefore, for JDK-8257822, we've decided to block the rewiring of division nodes instead of relying on spotting all the places where the disconnect could happen to fix it there. There were follow-up fixes to block other places and it seems that we have now found yet another place where we rewire a division that's not control dependent on the original check. Maybe it should have been handled with `depends_only_on_test()` as now done with array accesses. However, there was some indication that this has a negative performance impact (see [RFR for JDK-8242108](https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2020-April/037935.html]) where we tried this with a Cast node, maybe directly do it on Div nodes is better and could be investigated).
>
> I'm not sure how big the impact of this patch will be on performance. But my guess is that it's probably on the lower side. If we look again at the graph posted by Theo above:
> 
>
> If Loop Predication did not hoist `293 RangeCheck`, then we cannot hoist `240 RangeCheck`, either. If we hoist `293 RangeCheck` and decided to only try to hoist `240 RangeCheck` in the next round of loop opts, we would not be able to hoist it because `230 Div` would then be pinned at `190 IfTrue` inside the loop.
>
> So, from my understanding, the only cases we now forbid and that could have a performance impact are those where we hoist multiple zero checks in one round of lo...
@chhagedorn Thanks for your really detailed explanation! I really appreciate it.
I am worried that this change may prohibit hoisting division out of loops in general. For example,
loop {
int x = invar1 / invar2;
}
Then the division is anchored in the loop because the zero check of `invar2` is in the loop. Currently, this will also move the division out of the loop, but with this change, the division will stay in the loop and be pinned to the loop head. Please correct me if I am misunderstanding the situation here.
One find suggestion, though, what do you think about overriding `DivINode::depends_only_on_test` and checking if the control node is a pattern that does exclude 0 from the value range of the divisor. This will do the job as the control of `230 DivI` is an unrelated `RangeCheck`, making `depends_only_on_test` return `false` and the division stays in the loop while for the aforementioned case the control is the zero check and the division can be hoisted out of the loop. I think this can be applied to `CastNode`, too. As I believe we may have the same issue with them when a `CastNode` that `depends_only_on_test` is disconnected from its true test.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/22666#issuecomment-2554562427
More information about the hotspot-compiler-dev
mailing list