RFR: 8324655: Identify integer minimum and maximum patterns created with if statements [v2]

Jasmine Karthikeyan jkarthikeyan at openjdk.org
Thu Feb 22 17:10:53 UTC 2024


On Wed, 7 Feb 2024 09:30:37 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Don't transform highly predictable branches
>
> Maybe it would be nice if the min/max nodes had some internal probability, so that the backend could decide again in what circumstances it is profitable to turn it into a cmove, or max, or even branch?

Hey @eme64, thank you for leaving feedback on the PR! I've incorporated your suggestions to make the code cleaner, and hopefully it's easier to follow now. I would appreciate it if you could take a look at the new changes!

> Feel free to make this change (hack), and see if that would get you the desired results:

I applied this patch and it seems to have worked, here's the latest performance results with the change and changes from master:

                                                Baseline                     Patch           Improvement
Benchmark                   Mode   Cnt  Score      Error  Units    Score       Error  Units
IfMinMax.testReductionInt   avgt   15  124.153 ±   0.963  ns/op     12.793 ±  1.171  ns/op   + 9.7x
IfMinMax.testReductionLong  avgt   15  123.241 ±  17.596  ns/op    126.727 ±  0.702  ns/op   (no change)*
IfMinMax.testSingleInt      avgt   15    0.916 ±   0.009  ns/op      0.913 ±  0.008  ns/op   (no change)
IfMinMax.testSingleLong     avgt   15    0.911 ±   0.007  ns/op      0.905 ±  0.004  ns/op   (no change)
IfMinMax.testVectorInt      avgt   15   83.884 ±   1.861  ns/op     14.766 ±  1.166  ns/op   + 5.7x
IfMinMax.testVectorLong     avgt   15   81.426 ±   0.517  ns/op     28.515 ±  0.480  ns/op   + 2.9x


I can now measure an improvement with reductions. The difference in the long reduction test seems to be due to the fact that my (Zen 3) device doesn't support AVX-512, which long min/max reductions require on x86 because of `vpminsq`/`vpmaxsq`.

> From your description and the benchmark results, I think this issue is applied to a general `CMove`. While a `CMove` would simplify the control flow, enable more optimisations, especially vectorisation; a `CMove` may deteriorate the performance if the corresponding branch is predictable. As a result, I don't think only applying the transformations to min/max patterns is a sound solution.

I definitely agree that this transformation could be further reaching in the patterns it accepts, but I wanted to focus on expanding the set of code that our existing Min/Max optimizations could cover for this patch. I've experimented with `CMove` optimizations before but I've ran into cases where the performance regresses, so I think they need to be done with a lot of care and research. Did you have ideas on what other kinds of patterns you would like to see covered? Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17574#issuecomment-1959889651


More information about the hotspot-compiler-dev mailing list