RFR: 8366815: C2: Delay Mod/Div by constant transformation

Thu Oct 23 07:39:02 UTC 2025

On Thu, 23 Oct 2025 05:25:32 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis.
>> 
>> Please let me know what you think.
>
> We had a bit of an offline discussion in the office yesterday. Here a summary of my thoughts.
> 
> Ordering optimizations/phases in compilers is a difficult problem, it is not at all unique to this problem or even C2, all compilers have this problem.
> 
> Doing what @SirYwell does here, with delaying to IGVN is a relatively simple fix, and it at least addresses all cases where the `divisor` and the `comparison` are already parse time constants. I would consider that a win already. But the solution is a bit hacky.
> 
> The alternative that was suggested: delay it to post-loop-opts. But that is equally hacky really, it would have the same kind of delay logic where it is proposed now, just with a different "destination" (IGVN vs post-loop-opts). And it has the downside of preventing auto vectorization (SuperWord does not know how to deal with `Div/Mod`, no hardware I know of implements vectorized integer division, only floating division is supported). But delaying to post-loop-opts allows cases like @mhaessig showed, where control flow collapses during IGVN. We could also make a similar example where control flow collapses only during loop-opts, in some cases only after SuperWord even (though that would be very rare).
> 
> It is really difficult to handle all cases, and I don't know if we really need to. But it is hard to know which cases we should focus on.
> 
> Here a super intense solution that would be the most powerful I can think of right now:
> - Delay `transform_int_divide` to post-loop-opts, so we can wait for constants to appear during IGVN and loop-opts.
> - That would mean we have to accept regressions for the currently vectorizing cases, or we have to do some `transform_int_divide` inside SuperWord: add an `VTransform::optimize` pass somehow. This would take a "medium" amount of engineering, and it would be more C++ code to maintain and test.
> - Yet another possibility: during loop-opts, try to do `transform_int_divide` not just with constant divisor, but also loop-invariant divisor. We would have to find a way to do the logic of `transform_int_divide` that finds the magic constants in C2 IR instead of C++ code (there seem to be some "failure" cases in the computation, not sure if we can resolve those). If the loop has sufficient iterations, it can be profitable to do the magic constant calculation before the loop, and do only mul/shift/add inside the loop. But this seems like an optional add-on. But it would be really powerful. And it would make the `VTransform::optimiz...

Thanks for the summary @eme64. I totally agree that it's a bit hacky, but the current state is the least invasive. I'd also be interested in going further steps in the same direction, but I feel like the work increases significantly more than the benefits (at least as long as we don't generalize it to also optimize for loop invariant non-constants, but that's also a lot of work).

@mhaessig do you have test results already?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3435552083