RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v3]
Fei Yang
fyang at openjdk.org
Fri Apr 21 00:41:47 UTC 2023
On Wed, 19 Apr 2023 04:30:39 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:
>> Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during macro expansion with conditional moves, allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backend, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine:
>>
>>
>> Baseline Patch Improvement
>> Benchmark Mode Cnt Score Error Units Score Error Units
>> Conv2BRules.testEquals0 avgt 10 47.566 ± 0.346 ns/op / 34.130 ± 0.177 ns/op + 28.2%
>> Conv2BRules.testNotEquals0 avgt 10 37.167 ± 0.211 ns/op / 34.185 ± 0.258 ns/op + 8.0%
>> Conv2BRules.testEquals1 avgt 10 35.059 ± 0.280 ns/op / 34.847 ± 0.160 ns/op (unchanged)
>> Conv2BRules.testEqualsNull avgt 10 56.768 ± 2.600 ns/op / 34.330 ± 0.625 ns/op + 39.5%
>> Conv2BRules.testNotEqualsNull avgt 10 47.447 ± 1.193 ns/op / 34.142 ± 0.303 ns/op + 28.0%
>>
>> Reviews would be greatly appreciated!
>>
>> Testing: tier1-2 on linux x64, GHA
>
> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision:
>
> Remove Conv2B from backend as it's macro expanded now
Hello, I wonder if we could make this transformation of Conv2B conditional? Architectures like RISC-V doesn't have support of conditional moves at the ISA level for now. So we set ConditionalMoveLimit parameter to 0 for this platform and conditionals moves are emulated with normal compare and branch instructions instead [1]. I don't think we would achieve better performance numbers on this platform with this change.
[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv.ad#L9583
-------------
PR Review: https://git.openjdk.org/jdk/pull/13345#pullrequestreview-1394916330
More information about the hotspot-compiler-dev
mailing list