RFR: 8051725: Questionable if-conversion involving SETNE
Quan Anh Mai
qamai at openjdk.org
Tue Apr 11 18:26:32 UTC 2023
On Tue, 11 Apr 2023 17:11:57 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:
>> Hi, I've created optimizations for the x86 lowering of `Conv2B` nodes, when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). The optimization here is using the `sete` instruction instead of always using `setne` and flipping the bit with xor afterwards. According to the Intel optimization guide (pages 3-26 and 3-27), this sequence is preferred over `cmp $0, %src` as it prevents the need to encode the constant in the assembly sequence. A similar rule exists in the PPC backend, here: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/ppc/ppc.ad#L10462. I've attached some performance testing but I think the real world improvements will be less significant- the motivation is primarily to decrease the amount of instruc
tions that are generated, as that can help in cases where applications are I-Cache bound.
>>
>>
>> Baseline Patch Improvement
>> Benchmark Mode Cnt Score Error Units Score Error Units
>> Conv2BRules.testEquals0 avgt 10 47.566 ± 0.346 ns/op / 37.904 ± 1.856 ns/op + 22.6%
>> Conv2BRules.testNotEquals0 avgt 10 37.167 ± 0.211 ns/op / 37.352 ± 1.529 ns/op (unchanged)
>> Conv2BRules.testEquals1 avgt 10 35.059 ± 0.280 ns/op / 34.847 ± 0.160 ns/op (unchanged)
>> Conv2BRules.testEqualsNull avgt 10 56.768 ± 2.600 ns/op / 46.916 ± 0.308 ns/op + 19.0%
>> Conv2BRules.testNotEqualsNull avgt 10 47.447 ± 1.193 ns/op / 46.974 ± 0.218 ns/op (unchanged)
>>
>>
>> This change also cleans up some code relating to `Assembler::set_byte_if_not_zero`, as that function duplicates behavior with `Assembler::setne`. The 32-bit only version of that method is never called as the only other usage is in the C1 LIR assembler, which is also guarded behind an 64-bit check so I opted to remove it entirely and replace usages with `Assembler::setne`. Reviews would be greatly appreciated!
>>
>> Testing: tier1-2 on linux x64, GHA
>
> As far as I know, `Conv2B` is a special-case convert node where it does either `c == 0 ? 0 : 1` or `p == null ? 0 : 1`. I think an advantage of Conv2B over CMove here is that the Conv2B has more specialized rules for value() and identity(), so it can prune more types of inputs than an equivalent Cmove can. I agree that it's not great having to shuffle the node from the middle-end to the backend, but I think it's still helpful as it can remove some dead bools that CMove wouldn't be able to. Hope this clarifies a bit!
@jaskarth Yes I also think that is the case, but without advantages in the back-end maybe it's best us lowering it in macro expansion phase so the compiler can have chances to transform more primitive `CMove`, what do you think?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13345#issuecomment-1503882160
More information about the hotspot-compiler-dev
mailing list