RFR: 8324655: Identify integer minimum and maximum patterns created with if statements [v3]

Emanuel Peter epeter at openjdk.org
Fri Mar 1 13:03:55 UTC 2024


On Tue, 27 Feb 2024 18:23:41 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

>> @jaskarth 
>>> I've designed this benchmark
>> 
>> Nice. Can you also post the generated assembly for Baseline/Patch?
>> I'm just worried that there is some method call, or something else that does not get cleanly inlined and could mess with the benchmark.
>
> @eme64 Sure, here is the assembly for the baseline: https://gist.github.com/jaskarth/1fe6f00a5b37fe3efb0dd6a2d24840e0
> And after: https://gist.github.com/jaskarth/99c56e2f081f996987b96d7e866aca6c
> 
> I must have missed this originally when evaluating the benchmark, but looking at the assembly it seems like the baseline JDK creates a `CMove` for that ternary already. I made a quick patch to disable where `PhaseIdealLoop::conditional_move` is called, and the performance still stays the same on the benchmark. I've also attached that assembly if it's of interest: https://gist.github.com/jaskarth/7b12b688f82a3b8e854785f1827b0c20

@jaskarth
The case of Min/Max style if-statements is that both the if and else branch are actually empty, since both values are computed before the if. That is why our `PhaseIdealLoop::conditional_move` will always say that it is profitable: it thinks there is zero cost in the if/else branch, so there is basically no cost. So this kind of cost-modeling based on the if/else blocks is really insufficient.

Rather, you would have to know how much cost is behind the two inputs to the cmp. As we see in my example, the cost of `b` can basically be hidden by the branch predictor (at least a part of it). But a CMove/Min/Max has to pay the full cost of `b` before it can continue afterwards.

@jaskarth My example is extreme. Feel free to play with my example, and make the `b` part and the "post" part smaller. Maybe there is a regression case that is less extreme. If we could show that only the really extreme examples lead to regressions, then maybe we are willing to bite the bullet on those regressions for the benefit of speedups in other cases.

@jaskarth One more general issue: So far you have only shown that your optimization leads to speedups in conjunction with auto-vectorization. Do you have any exmamples which get speedups without auto-vectorization?
The thing is: I do hope to do if-conversion in auto-vectorization. Hence, it would be nice to know that your optimization has benefits in cases where if-conversion does not apply.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17574#issuecomment-1973149078
PR Comment: https://git.openjdk.org/jdk/pull/17574#issuecomment-1973153118
PR Comment: https://git.openjdk.org/jdk/pull/17574#issuecomment-1973156599


More information about the hotspot-compiler-dev mailing list