RFR: 8051725: Questionable if-conversion involving SETNE

Wed Apr 5 05:08:10 UTC 2023

Hi, I've created optimizations for the x86 lowering of `Conv2B` nodes, when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). The optimization here is using the `sete` instruction instead of always using `setne` and flipping the bit with xor afterwards. According to the Intel optimization guide (pages 3-26 and 3-27), this sequence is preferred over `cmp $0, %src` as it prevents the need to encode the constant in the assembly sequence. A similar rule exists in the PPC backend, here: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/ppc/ppc.ad#L10462. I've attached some performance testing but I think the real world improvements will be less significant- the motivation is primarily to decrease the amount of instructio
 ns that are generated, as that can help in cases where applications are I-Cache bound.

                                    Baseline                           Patch              Improvement
Benchmark                     Mode  Cnt  Score    Error Units     Score    Error  Units
Conv2BRules.testEquals0        avgt   10  47.566 ± 0.346 ns/op  /  37.904 ± 1.856  ns/op  + 22.6%
Conv2BRules.testNotEquals0     avgt   10  37.167 ± 0.211 ns/op  /  37.352 ± 1.529  ns/op  (unchanged)
Covn2BRules.testEquals1        avgt   10  35.059 ± 0.280 ns/op  /  34.847 ± 0.160  ns/op  (unchanged)
Covn2BRules.testEqualsNull     avgt   10  56.768 ± 2.600 ns/op  /  46.916 ± 0.308  ns/op  + 19.0%
Conv2BRules.testNotEqualsNull  avgt   10  47.447 ± 1.193 ns/op  /  46.974 ± 0.218  ns/op  (unchanged)

This change also cleans up some code relating to `Assembler::set_byte_if_not_zero`, as that function duplicates behavior with `Assembler::setne`. The 32-bit only version of that method is never called as the only other usage is in the C1 LIR assembler, which is also guarded behind an 64-bit check so I opted to remove it entirely and replace usages with `Assembler::setne`. Reviews would be greatly appreciated!

Testing: tier1-2 on linux x64, GHA

-------------

Commit messages:
 - Fix whitespace and add bug tag to IR test
 - Merge branch 'master' into conv2b-x86-lowering
 - Merge branch 'master' into conv2b-x86-lowering
 - Merge branch 'master' into conv2b-x86-lowering
 - Merge branch 'master' into conv2b-x86-lowering
 - Add flipped versions of Conv2B rules to decrease generated code complexity

Changes: https://git.openjdk.org/jdk/pull/13345/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13345&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8051725
  Stats: 265 lines in 7 files changed: 251 ins; 11 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/13345.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13345/head:pull/13345

PR: https://git.openjdk.org/jdk/pull/13345