RFR: 8051725: Questionable if-conversion involving SETNE [v2]
Tobias Hartmann
thartmann at openjdk.org
Tue Apr 18 07:06:51 UTC 2023
On Mon, 17 Apr 2023 04:32:29 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:
>> Hi, I've created optimizations for the x86 lowering of `Conv2B` nodes, when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). The optimization here is using the `sete` instruction instead of always using `setne` and flipping the bit with xor afterwards. According to the Intel optimization guide (pages 3-26 and 3-27), this sequence is preferred over `cmp $0, %src` as it prevents the need to encode the constant in the assembly sequence. A similar rule exists in the PPC backend, here: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/ppc/ppc.ad#L10462. I've attached some performance testing but I think the real world improvements will be less significant- the motivation is primarily to decrease the amount of instruc
tions that are generated, as that can help in cases where applications are I-Cache bound.
>>
>>
>> Baseline Patch Improvement
>> Benchmark Mode Cnt Score Error Units Score Error Units
>> Conv2BRules.testEquals0 avgt 10 47.566 ± 0.346 ns/op / 37.904 ± 1.856 ns/op + 22.6%
>> Conv2BRules.testNotEquals0 avgt 10 37.167 ± 0.211 ns/op / 37.352 ± 1.529 ns/op (unchanged)
>> Conv2BRules.testEquals1 avgt 10 35.059 ± 0.280 ns/op / 34.847 ± 0.160 ns/op (unchanged)
>> Conv2BRules.testEqualsNull avgt 10 56.768 ± 2.600 ns/op / 46.916 ± 0.308 ns/op + 19.0%
>> Conv2BRules.testNotEqualsNull avgt 10 47.447 ± 1.193 ns/op / 46.974 ± 0.218 ns/op (unchanged)
>>
>>
>> This change also cleans up some code relating to `Assembler::set_byte_if_not_zero`, as that function duplicates behavior with `Assembler::setne`. The 32-bit only version of that method is never called as the only other usage is in the C1 LIR assembler, which is also guarded behind an 64-bit check so I opted to remove it entirely and replace usages with `Assembler::setne`. Reviews would be greatly appreciated!
>>
>> Testing: tier1-2 on linux x64, GHA
>
> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision:
>
> Re-work transform to happen in macro expansion
I'm a bit confused, why do you need the new match rules if Conv2B nodes are now macro expanded to CMove?
-------------
PR Review: https://git.openjdk.org/jdk/pull/13345#pullrequestreview-1389422395
More information about the hotspot-compiler-dev
mailing list