RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v2]
Tobias Hotz
duke at openjdk.org
Fri Aug 4 09:53:57 UTC 2023
On Fri, 4 Aug 2023 09:39:12 GMT, Tobias Hotz <duke at openjdk.org> wrote:
>> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction.
>> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value.
>> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe.
>> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed:
>> Results on Intel Core i5-8250U CPU
>> Before this patch:
>>
>> Benchmark Mode Cnt Score Error Units
>> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ± 1.751 ns/op
>> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ± 0.002 ns/op
>> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ± 0.310 ns/op
>> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ± 0.002 ns/op
>> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ± 0.215 ns/op
>> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ± 0.002 ns/op
>> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ± 0.666 ns/op
>> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ± 0.001 ns/op
>> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ± 2.034 ns/op
>> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ± 0.002 ns/op
>> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ± 0.602 ns/op
>> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183 ± 0.001 ns/op
>>
>> After this patch:
>>
>> Benchmark Mode Cnt Score Error Units Change
>> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ± 4.747 ns/op ~29% faster
>> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ± 0.002 ns/op (unchanged)
>> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ± 1.094 ns/op (unchanged)
>> TestRemovalPeephole.bench...
>
> Tobias Hotz has updated the pull request incrementally with two additional commits since the last revision:
>
> - Add IR test
>
> Currently, the peephole only works for branches, not conditional moves.
> - Add assert to verify that the machProj and the test operate on the same register.
>
> Also fix compilation on macos
The peephole only applies the instruction that would be emitted immediately before the test is the input of the test. If that is the case, it must operate on the same register as the result of the above instruction. We then check the flags of the nodes.
We need all the information cause specific operation require specific flags. For example, if the instruction following the test checks if the value is greater (than zero), the sign and zero flag need to be set and the overflow flag needs to be cleared. Not all instructions (such as add) satisfy this requirement, so in this case we would need to emit the test, but we could omit it if we only check for zero, as that only requires the ZF, which the test instruction sets.
I also noticed another problem during the construction of the IR tests: Test instructions before conditional moves are currently not removed. This is due to the Matcher thinking that it needs to load a zero into one register when it emits a setb instruction. This is not the case, but the matcher does not know this and 1) emits a pointless register clear and 2) still has the loadConI0 in its graph, which causes the test peephole to bail out as its input is not the preceeding instruction.
I think the removal of the loadConI0 in this case is a topic for another PR though. Fixing this will result in the peephole also working for setb instructions, which would be a huge win.

-------------
PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1665334251
More information about the hotspot-compiler-dev
mailing list