RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set
Jasmine Karthikeyan
jkarthikeyan at openjdk.org
Mon Jul 24 13:41:52 UTC 2023
On Fri, 26 May 2023 10:32:00 GMT, Tobias Hotz <duke at openjdk.org> wrote:
> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction.
> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value.
> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe.
> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed:
> Results on Intel Core i5-8250U CPU
> Before this patch:
>
> Benchmark Mode Cnt Score Error Units
> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ± 1.751 ns/op
> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ± 0.002 ns/op
> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ± 0.310 ns/op
> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ± 0.002 ns/op
> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ± 0.215 ns/op
> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ± 0.002 ns/op
> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ± 0.666 ns/op
> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ± 0.001 ns/op
> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ± 2.034 ns/op
> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ± 0.002 ns/op
> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ± 0.602 ns/op
> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183 ± 0.001 ns/op
>
> After this patch:
>
> Benchmark Mode Cnt Score Error Units Change
> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ± 4.747 ns/op ~29% faster
> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ± 0.002 ns/op (unchanged)
> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ± 1.094 ns/op (unchanged)
> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ± 0.011...
The changes to the ad file is nice! It's a lot more readable to me now. Is it possible to add some more entries to the JMH benchmark to show the results of the other `and` matches, such as the ones with memory? And do you need a JBS entry for this PR?
Hmm, is it possible to add multiple `peepmatch` attributes to a single peephole currently? Since the `peepprocedure`s and `peepreplace`s in here are the same across data type, it may be nice to consolidate them in that way for maintainability, especially if more rules are added for the other operations as well.
src/hotspot/cpu/x86/peephole_x86_64.cpp line 137:
> 135: // This function removes the TEST instruction when it detected shapes likes AND r1, r2; TEST r1, r1
> 136: bool test_may_remove_helper(Block* block, int block_index, PhaseCFG* cfg_, PhaseRegAlloc* ra_,
> 137: MachNode* (*new_root)(), uint inst0_rule, std::initializer_list<uint> rules_to_match) {
Hotspot code avoids the usage of the standard library, the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md) has more information and rationale.
src/hotspot/cpu/x86/peephole_x86_64.cpp line 142:
> 140:
> 141: Node* inst1 = inst0->in(1);
> 142: // Only remove test if the block order is inst1 -> MachProjNode (cause AND specifies KILL cr) -> inst0
Suggestion:
// Only remove test if the block order is inst1 -> MachProjNode (because AND specifies KILL cr) -> inst0
-------------
PR Review: https://git.openjdk.org/jdk/pull/14172#pullrequestreview-1460885752
PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1566338728
PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1216158482
PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1216144334
More information about the hotspot-compiler-dev
mailing list