RFR: 8051725: Questionable if-conversion involving SETNE

Tue Apr 11 18:26:32 UTC 2023

On Tue, 11 Apr 2023 17:11:57 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

>> Hi, I've created optimizations for the x86 lowering of `Conv2B` nodes, when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). The optimization here is using the `sete` instruction instead of always using `setne` and flipping the bit with xor afterwards. According to the Intel optimization guide (pages 3-26 and 3-27), this sequence is preferred over `cmp $0, %src` as it prevents the need to encode the constant in the assembly sequence. A similar rule exists in the PPC backend, here: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/ppc/ppc.ad#L10462. I've attached some performance testing but I think the real world improvements will be less significant- the motivation is primarily to decrease the amount of instruc
 tions that are generated, as that can help in cases where applications are I-Cache bound.
>> 
>> 
>>                                     Baseline                           Patch              Improvement
>> Benchmark                     Mode  Cnt  Score    Error Units     Score    Error  Units
>> Conv2BRules.testEquals0        avgt   10  47.566 ± 0.346 ns/op  /  37.904 ± 1.856  ns/op  + 22.6%
>> Conv2BRules.testNotEquals0     avgt   10  37.167 ± 0.211 ns/op  /  37.352 ± 1.529  ns/op  (unchanged)
>> Conv2BRules.testEquals1        avgt   10  35.059 ± 0.280 ns/op  /  34.847 ± 0.160  ns/op  (unchanged)
>> Conv2BRules.testEqualsNull     avgt   10  56.768 ± 2.600 ns/op  /  46.916 ± 0.308  ns/op  + 19.0%
>> Conv2BRules.testNotEqualsNull  avgt   10  47.447 ± 1.193 ns/op  /  46.974 ± 0.218  ns/op  (unchanged)
>> 
>> 
>> This change also cleans up some code relating to `Assembler::set_byte_if_not_zero`, as that function duplicates behavior with `Assembler::setne`. The 32-bit only version of that method is never called as the only other usage is in the C1 LIR assembler, which is also guarded behind an 64-bit check so I opted to remove it entirely and replace usages with `Assembler::setne`. Reviews would be greatly appreciated!
>> 
>> Testing: tier1-2 on linux x64, GHA
>
> As far as I know, `Conv2B` is a special-case convert node where it does either `c == 0 ? 0 : 1` or `p == null ? 0 : 1`. I think an advantage of Conv2B over CMove here is that the Conv2B has more specialized rules for value() and identity(), so it can prune more types of inputs than an equivalent Cmove can. I agree that it's not great having to shuffle the node from the middle-end to the backend, but I think it's still helpful as it can remove some dead bools that CMove wouldn't be able to. Hope this clarifies a bit!

@jaskarth Yes I also think that is the case, but without advantages in the back-end maybe it's best us lowering it in macro expansion phase so the compiler can have chances to transform more primitive `CMove`, what do you think?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13345#issuecomment-1503882160