RFR: 8283694: Improve several operations on x86

Quan Anh Mai duke at openjdk.java.net
Sat Mar 26 06:25:16 UTC 2022


Hi, this patch improves some operations on x86_64:

- Base variable scalar shifts have bad performance implications and should be replaced by their bmi2 counterparts if possible:
  + Bounded operands
  + Multiple uops both in fused and unfused domains
  + May result in flag stall since the operations have unpredictable flag output

- Flag to general-purpose registers operation currently uses `cmovcc`, which requires set up and 1 more spare register for constant, this could be replaced by set, which transforms the sequence:

        xorl dst, dst
        sometest
        movl tmp, 0x01
        cmovlcc dst, tmp

        into:

        sometest
        setbcc dst
        movzbl dst, dst

This sequence does not need a spare register and without any drawbacks.
(Note: `movzx` is often elided and thus has 0 latency)

- Some small improvements:
  + Add memory variances to `tzcnt` and `lzcnt`
  + Add memory variances to `rolx` and `rorx`
  + Add missing `rolx` rules (note that `rolx dst, imm` is actually `rorx dst, size - imm`)

The speedup can be observed for variable shift instructions

        Before:
        Benchmark               (size)  Mode  Cnt   Score   Error  Units
        Integers.shiftLeft         500  avgt    5   0.836 ± 0.030  us/op
        Integers.shiftRight        500  avgt    5   0.843 ± 0.056  us/op
        Integers.shiftURight       500  avgt    5   0.830 ± 0.057  us/op
        Longs.shiftLeft            500  avgt    5   0.827 ± 0.026  us/op
        Longs.shiftRight           500  avgt    5   0.828 ± 0.018  us/op
        Longs.shiftURight          500  avgt    5   0.829 ± 0.038  us/op

        After:
        Benchmark               (size)  Mode  Cnt   Score   Error  Units
        Integers.shiftLeft         500  avgt    5   0.761 ± 0.016  us/op
        Integers.shiftRight        500  avgt    5   0.762 ± 0.071  us/op
        Integers.shiftURight       500  avgt    5   0.765 ± 0.056  us/op
        Longs.shiftLeft            500  avgt    5   0.755 ± 0.026  us/op
        Longs.shiftRight           500  avgt    5   0.753 ± 0.017  us/op
        Longs.shiftURight          500  avgt    5   0.759 ± 0.031  us/op

For `cmovcc 1, 0`, I have not been able to create a reliable microbenchmark since the benefits are mostly regarding register allocation.

Thank you very much.

-------------

Commit messages:
 - pipe
 - fix, benchmarks
 - pipe_class
 - reduce register pressure
 - cmov fix
 - operand
 - setcc for long
 - initial commit

Changes: https://git.openjdk.java.net/jdk/pull/7968/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7968&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8283694
  Stats: 603 lines in 8 files changed: 566 ins; 4 del; 33 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7968.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7968/head:pull/7968

PR: https://git.openjdk.java.net/jdk/pull/7968


More information about the hotspot-compiler-dev mailing list