RFR: 8324655: Identify integer minimum and maximum patterns created with if statements [v2]
Jasmine Karthikeyan
jkarthikeyan at openjdk.org
Thu Feb 22 17:10:53 UTC 2024
On Wed, 7 Feb 2024 09:30:37 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Don't transform highly predictable branches
>
> Maybe it would be nice if the min/max nodes had some internal probability, so that the backend could decide again in what circumstances it is profitable to turn it into a cmove, or max, or even branch?
Hey @eme64, thank you for leaving feedback on the PR! I've incorporated your suggestions to make the code cleaner, and hopefully it's easier to follow now. I would appreciate it if you could take a look at the new changes!
> Feel free to make this change (hack), and see if that would get you the desired results:
I applied this patch and it seems to have worked, here's the latest performance results with the change and changes from master:
Baseline Patch Improvement
Benchmark Mode Cnt Score Error Units Score Error Units
IfMinMax.testReductionInt avgt 15 124.153 ± 0.963 ns/op 12.793 ± 1.171 ns/op + 9.7x
IfMinMax.testReductionLong avgt 15 123.241 ± 17.596 ns/op 126.727 ± 0.702 ns/op (no change)*
IfMinMax.testSingleInt avgt 15 0.916 ± 0.009 ns/op 0.913 ± 0.008 ns/op (no change)
IfMinMax.testSingleLong avgt 15 0.911 ± 0.007 ns/op 0.905 ± 0.004 ns/op (no change)
IfMinMax.testVectorInt avgt 15 83.884 ± 1.861 ns/op 14.766 ± 1.166 ns/op + 5.7x
IfMinMax.testVectorLong avgt 15 81.426 ± 0.517 ns/op 28.515 ± 0.480 ns/op + 2.9x
I can now measure an improvement with reductions. The difference in the long reduction test seems to be due to the fact that my (Zen 3) device doesn't support AVX-512, which long min/max reductions require on x86 because of `vpminsq`/`vpmaxsq`.
> From your description and the benchmark results, I think this issue is applied to a general `CMove`. While a `CMove` would simplify the control flow, enable more optimisations, especially vectorisation; a `CMove` may deteriorate the performance if the corresponding branch is predictable. As a result, I don't think only applying the transformations to min/max patterns is a sound solution.
I definitely agree that this transformation could be further reaching in the patterns it accepts, but I wanted to focus on expanding the set of code that our existing Min/Max optimizations could cover for this patch. I've experimented with `CMove` optimizations before but I've ran into cases where the performance regresses, so I think they need to be done with a lot of care and research. Did you have ideas on what other kinds of patterns you would like to see covered? Thanks!
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17574#issuecomment-1959889651
More information about the hotspot-compiler-dev
mailing list