RFR: 8366815: C2: Delay Mod/Div by constant transformation

Thu Oct 23 05:28:01 UTC 2025

On Sun, 19 Oct 2025 15:46:06 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

> The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis.
> 
> Please let me know what you think.

We had a bit of an offline discussion in the office yesterday. Here a summary of my thoughts.

Ordering optimizations/phases in compilers is a difficult problem, it is not at all unique to this problem or even C2, all compilers have this problem.

Doing what @SirYwell does here, with delaying to IGVN is a relatively simple fix, and it at least addresses all cases where the `divisor` and the `comparison` are already parse time constants. I would consider that a win already. But the solution is a bit hacky.

The alternative that was suggested: delay it to post-loop-opts. But that is equally hacky really, it would have the same kind of delay logic where it is proposed now, just with a different "destination" (IGVN vs post-loop-opts). And it has the downside of preventing auto vectorization (SuperWord does not know how to deal with `Div/Mod`, no hardware I know of implements vectorized integer division, only floating division is supported). But delaying to post-loop-opts allows cases like @mhaessig showed, where control flow collapses during IGVN. We could also make a similar example where control flow collapses only during loop-opts, in some cases only after SuperWord even (though that would be very rare).

It is really difficult to handle all cases, and I don't know if we really need to. But it is hard to know which cases we should focus on.

Here a super intense solution that would be the most powerful I can think of right now:
- Delay `transform_int_divide` to post-loop-opts, so we can wait for constants to appear during IGVN and loop-opts.
- That would mean we have to accept regressions for the currently vectorizing cases, or we have to do some `transform_int_divide` inside SuperWord: add an `VTransform::optimize` pass somehow. This would take a "medium" amount of engineering, and it would be more C++ code to maintain and test.
- Yet another possibility: during loop-opts, try to do `transform_int_divide` not just with constant divisor, but also loop-invariant divisor. We would have to find a way to do the logic of `transform_int_divide` that finds the magic constants in C2 IR instead of C++ code (there seem to be some "failure" cases in the computation, not sure if we can resolve those). If the loop has sufficient iterations, it can be profitable to do the magic constant calculation before the loop, and do only mul/shift/add inside the loop. But this seems like an optional add-on. But it would be really powerful. And it would make the `VTransform::optimize` (SuperWord) step unnecessary.

So my current thinking is:
We have to do some kind of delay anyway, either to IGVN or post-loop-opts, or elsewhere. For now, IGVN is a step in the right direction. The "delay mechanism" is a bit hacky, but we use it in multiple places already (grep for `record_for_igvn`). It is not @SirYwell 's fault that our delay mechanism is so hacky.

So I would vote for going with delay to IGVN for now, to at least support the parse-time constants. Then file some RFE that tracks the other ideas, and see if someone wants to pick that up (figure out a loop-opts pass that works for loop-invariant divisors, and otherwise delay to post-loop-opts).

src/hotspot/share/opto/divnode.cpp line 545:

> 543: 
> 544:   // Keep this node as-is for now; we want Value() and
> 545:   // other optimizations checking for this node type to work

Do we only need `Value` done first on the `Div` node, or also on uses of it?
It might be worth explaining it in a bit more detail here.

If it was just about calling `Value` on the `Div` first, we could probably check what `Value` returns here. But I fear that is not enough, right? Because it is the `Value` here that returns some range, and then some use sees that this range has specific characteristics, and can constant fold a comparison, for example. Did I get this right?

-------------

PR Review: https://git.openjdk.org/jdk/pull/27886#pullrequestreview-3368244192
PR Review Comment: https://git.openjdk.org/jdk/pull/27886#discussion_r2453923234