RFR: 8283694: Improve several operations on x86
Quan Anh Mai
duke at openjdk.java.net
Sat Mar 26 06:25:16 UTC 2022
Hi, this patch improves some operations on x86_64:
- Base variable scalar shifts have bad performance implications and should be replaced by their bmi2 counterparts if possible:
+ Bounded operands
+ Multiple uops both in fused and unfused domains
+ May result in flag stall since the operations have unpredictable flag output
- Flag to general-purpose registers operation currently uses `cmovcc`, which requires set up and 1 more spare register for constant, this could be replaced by set, which transforms the sequence:
xorl dst, dst
sometest
movl tmp, 0x01
cmovlcc dst, tmp
into:
sometest
setbcc dst
movzbl dst, dst
This sequence does not need a spare register and without any drawbacks.
(Note: `movzx` is often elided and thus has 0 latency)
- Some small improvements:
+ Add memory variances to `tzcnt` and `lzcnt`
+ Add memory variances to `rolx` and `rorx`
+ Add missing `rolx` rules (note that `rolx dst, imm` is actually `rorx dst, size - imm`)
The speedup can be observed for variable shift instructions
Before:
Benchmark (size) Mode Cnt Score Error Units
Integers.shiftLeft 500 avgt 5 0.836 ± 0.030 us/op
Integers.shiftRight 500 avgt 5 0.843 ± 0.056 us/op
Integers.shiftURight 500 avgt 5 0.830 ± 0.057 us/op
Longs.shiftLeft 500 avgt 5 0.827 ± 0.026 us/op
Longs.shiftRight 500 avgt 5 0.828 ± 0.018 us/op
Longs.shiftURight 500 avgt 5 0.829 ± 0.038 us/op
After:
Benchmark (size) Mode Cnt Score Error Units
Integers.shiftLeft 500 avgt 5 0.761 ± 0.016 us/op
Integers.shiftRight 500 avgt 5 0.762 ± 0.071 us/op
Integers.shiftURight 500 avgt 5 0.765 ± 0.056 us/op
Longs.shiftLeft 500 avgt 5 0.755 ± 0.026 us/op
Longs.shiftRight 500 avgt 5 0.753 ± 0.017 us/op
Longs.shiftURight 500 avgt 5 0.759 ± 0.031 us/op
For `cmovcc 1, 0`, I have not been able to create a reliable microbenchmark since the benefits are mostly regarding register allocation.
Thank you very much.
-------------
Commit messages:
- pipe
- fix, benchmarks
- pipe_class
- reduce register pressure
- cmov fix
- operand
- setcc for long
- initial commit
Changes: https://git.openjdk.java.net/jdk/pull/7968/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7968&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8283694
Stats: 603 lines in 8 files changed: 566 ins; 4 del; 33 mod
Patch: https://git.openjdk.java.net/jdk/pull/7968.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/7968/head:pull/7968
PR: https://git.openjdk.java.net/jdk/pull/7968
More information about the hotspot-compiler-dev
mailing list