RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v7]
Jasmine Karthikeyan
jkarthikeyan at openjdk.org
Thu May 18 05:20:51 UTC 2023
On Thu, 18 May 2023 04:17:48 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:
>> Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine:
>>
>>
>> Baseline Patch Improvement
>> Benchmark Mode Cnt Score Error Units Score Error Units
>> Conv2BRules.testEquals0 avgt 10 47.566 ± 0.346 ns/op / 34.130 ± 0.177 ns/op + 28.2%
>> Conv2BRules.testNotEquals0 avgt 10 37.167 ± 0.211 ns/op / 34.185 ± 0.258 ns/op + 8.0%
>> Conv2BRules.testEquals1 avgt 10 35.059 ± 0.280 ns/op / 34.847 ± 0.160 ns/op (unchanged)
>> Conv2BRules.testEqualsNull avgt 10 56.768 ± 2.600 ns/op / 34.330 ± 0.625 ns/op + 39.5%
>> Conv2BRules.testNotEqualsNull avgt 10 47.447 ± 1.193 ns/op / 34.142 ± 0.303 ns/op + 28.0%
>>
>> Reviews would be greatly appreciated!
>>
>> Testing: tier1-2 on linux x64, GHA
>
> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision:
>
> Changes from code review
Apologies for the delayed update, and thanks for the reviews!
I have aarch64 performance results, from an M1 mac:
Baseline Patch Improvement
Benchmark Mode Cnt Score Error Units Score Error Units
Conv2BRules.testEquals0 avgt 12 41.697 ± 0.127 ns/op / 40.724 ± 0.086 ns/op + 2.4%
Conv2BRules.testNotEquals0 avgt 12 39.522 ± 0.143 ns/op / 40.608 ± 0.046 ns/op - 2.7%
Conv2BRules.testEquals1 avgt 12 40.168 ± 0.136 ns/op / 40.679 ± 0.044 ns/op (unchanged)
Conv2BRules.testEqualsNull avgt 12 48.922 ± 0.498 ns/op / 42.046 ± 0.018 ns/op + 15.1%
Conv2BRules.testNotEqualsNull avgt 12 41.725 ± 0.264 ns/op / 42.063 ± 0.043 ns/op - 0.8%
It seems like the patch doesn't have much of an impact other than `testEqualsNull`, which would make sense as the Conv2B rule is using the same `cset` instruction as the 0 and 1 rule for CMoveI. I was unfortunately not able to test for arm32, but I think it should still be beneficial as the Conv2B rules there used two cmoves and had a fixme, whereas with this patch it would only use one cmove.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13345#issuecomment-1552411461
More information about the hotspot-compiler-dev
mailing list