RFR: 8331717: C2: Crash with SIGFPE Because Loop Predication Wrongly Hoists Division Requiring Zero Check [v2]

Thu Dec 19 11:40:41 UTC 2024

On Wed, 18 Dec 2024 16:08:40 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

> I think that the wrong control input of the DivINode is a potential severe issue that may have more adverse impacts beyond this case.

This is a general problem which I've been thinking about many times: Should we make sure that a division, whose divisor could be zero, is always pinned at a zero check? I'm not entirely sure about all places where a division node could become control dependent on other control nodes. 

Back there, when I worked on [JDK-8257822](https://bugs.openjdk.org/browse/JDK-8257822), the problem was that we split a zero check through a region and the div node ended up at a region. Eventually, the division was wrongly hoisted before the actual zero check.

We were not really sure about all the places where a zero check could lose the connection to its zero check. As soon as this happens, I think we always have a problem when rewiring a division, pinned at some control node that's unrelated, to an earlier dominating check: We could skip over the original check.

Therefore, for JDK-8257822, we've decided to block the rewiring of division nodes instead of relying on spotting all the places where the disconnect could happen to fix it there. There were follow-up fixes to block other places and it seems that we have now found yet another place where we rewire a division that's not control dependent on the original check. Maybe it should have been handled with `depends_only_on_test()` as now done with array accesses. However, there was some indication that this has a negative performance impact (see [RFR for JDK-8242108](https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2020-April/037935.html]) where we tried this with a Cast node, maybe directly do it on Div nodes is better and could be investigated).

I'm not sure how big the impact of this patch will be on performance. But my guess is that it's probably on the lower side. If we look again at the graph posted by Theo above:
![image](https://github.com/user-attachments/assets/acf68ab1-22af-440b-b080-99ba408e9b93)

If Loop Predication did not hoist `293 RangeCheck`, then we cannot hoist `240 RangeCheck`, either. If we hoist `293 RangeCheck` and decided to only try to hoist `240 RangeCheck` in the next round of loop opts, we would not be able to hoist it because  `230 Div` would then be pinned at `190 IfTrue` inside the loop.

So, from my understanding, the only cases we now forbid and that could have a performance impact are those where we hoist multiple zero checks in one round of loop opts. But as shown above, it's unsafe to perform this hoisting because the current code expects that division nodes are not rewired and stay pinned inside the loop.

Long story short, I think there are several ways to handle this general problem of divisions floating above the actual zero check:
- Allow divisions to lose the direct pin to its zero check and disable the rewiring of such divisions:
  1. Done today with `no_dependent_zero_check()` but incomplete and requires more locations to check this.
  2. Implement a pinning with `depends_only_on_test()` as done with array accesses. There were regressions when done with additional cast nodes but it might be more efficient to directly implement it on the division nodes. Would require some performance testing. It's safer than 1) but we would possibly block more cases.
- Disallow that zero checks and divisions are separated:

  3. Need to find all the places where this happens and disallow it or find another way to make this strong connection. Then we can safely rewire divisions to dominating checks, which are always zero excluding checks.

Option 1 (i) seems to be the simplest one for now. Option 2 would be more safe/correct but possibly has a bigger performance impact and might not be justified without a failing case. Option 3 would be interesting but seems to have a big impact and might not be so easy to do - could be worth to file an RFE.

Even when going with Option 1 or 2, we could still allow rewiring of divisions to a dominating check if we can prove that this is a zero check that covers the division to be rewired (e.g. same divisor and comparison excludes zero etc.) - and maybe that's all we need (discussed this idea with @eme64). Could be investigated with/against Option 3.

This became longer than I've intended but I guess it helped to reflect on the situation and the ways to go from here. Let me know what you think about it and the suggestion to go with option 1 for now. And also if there are some more or other things to consider.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22666#issuecomment-2553484699