RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v3]
Tobias Hotz
duke at openjdk.org
Fri Aug 4 09:53:57 UTC 2023
> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction.
> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value.
> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe.
> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed:
> Results on Intel Core i5-8250U CPU
> Before this patch:
>
> Benchmark Mode Cnt Score Error Units
> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ± 1.751 ns/op
> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ± 0.002 ns/op
> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ± 0.310 ns/op
> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ± 0.002 ns/op
> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ± 0.215 ns/op
> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ± 0.002 ns/op
> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ± 0.666 ns/op
> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ± 0.001 ns/op
> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ± 2.034 ns/op
> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ± 0.002 ns/op
> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ± 0.602 ns/op
> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183 ± 0.001 ns/op
>
> After this patch:
>
> Benchmark Mode Cnt Score Error Units Change
> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ± 4.747 ns/op ~29% faster
> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ± 0.002 ns/op (unchanged)
> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ± 1.094 ns/op (unchanged)
> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ± 0.011...
Tobias Hotz has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision:
- Merge remote-tracking branch 'upstream/master' into testPeephole
- Use LF instead of CRLF
- Add IR test
Currently, the peephole only works for branches, not conditional moves.
- Add assert to verify that the machProj and the test operate on the same register.
Also fix compilation on macos
- Use a new approach by telling the peephole which rules set and clear which flags
By using this approach, the peephole rule is much more general and can cover more cases. This also means we can remove test instructions after add instructions if only specific flags are required.
- Merge remote-tracking branch 'upstream/master' into testPeephole
- Remove the old peepreplace empty block - we didn't use them
- Add more benchmark cases
- Add new benchmarks
Also fix an error in the xor long peep definition
- Merge remote-tracking branch 'upstream/master' into testPeephole
- ... and 9 more: https://git.openjdk.org/jdk/compare/aad05427...18c6f790
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/14172/files
- new: https://git.openjdk.org/jdk/pull/14172/files/71737a77..18c6f790
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=14172&range=02
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=14172&range=01-02
Stats: 48273 lines in 1039 files changed: 27104 ins; 15761 del; 5408 mod
Patch: https://git.openjdk.org/jdk/pull/14172.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/14172/head:pull/14172
PR: https://git.openjdk.org/jdk/pull/14172
More information about the hotspot-compiler-dev
mailing list