RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set
Tobias Hotz
duke at openjdk.org
Wed Aug 2 15:14:51 UTC 2023
On Fri, 26 May 2023 10:32:00 GMT, Tobias Hotz <duke at openjdk.org> wrote:
> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction.
> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value.
> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe.
> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed:
> Results on Intel Core i5-8250U CPU
> Before this patch:
>
> Benchmark Mode Cnt Score Error Units
> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ± 1.751 ns/op
> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ± 0.002 ns/op
> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ± 0.310 ns/op
> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ± 0.002 ns/op
> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ± 0.215 ns/op
> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ± 0.002 ns/op
> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ± 0.666 ns/op
> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ± 0.001 ns/op
> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ± 2.034 ns/op
> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ± 0.002 ns/op
> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ± 0.602 ns/op
> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183 ± 0.001 ns/op
>
> After this patch:
>
> Benchmark Mode Cnt Score Error Units Change
> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ± 4.747 ns/op ~29% faster
> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ± 0.002 ns/op (unchanged)
> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ± 1.094 ns/op (unchanged)
> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ± 0.011...
First of all, thanks for the feedback.
It already handles the case when flags are set depending on the operation. Right now, there are 3 cases that can be expressed:
- The flag is set based on the result of the operation. That means OF is set if an overflow occured during the Operation, ZF is set if the result is zero, etc. This is expressed through the `KILL CR` effect in combination with the flag `Flag_sets_xxx_flag`
- The flag is zeroed by the instruction. An example for this is the OF and CF flags in case of `and`/`test`. This is expressed through the `KILL CR` effect in combination with the flag `Flag_clears_xxx_flag`
- The state of the flag in unknown/undefined. Examples here are the PF flag in case of the `andn` instruction. This is also the case for all flags for the `bsr` instruction. This is expressed through the `KILL CR` effect and no matching Flag that specifies sets or clears.
I mainly used https://www.felixcloutier.com/x86/index.html as a reference (which is based on the official intel developer manuel) to specify all these flags.
Adding support for additional states such as "Sets flag based on source operand" would take up even more space in the flags int, and as there is only room for 2 additional flags, this would not work, as flags is a juint. Also, I think the cases where we would be able to remove a TEST instruction based on that would be very rare due to 1) Not many instructions that are commonly used settings flags based on the input and 2) The instructions that set flags due to the input not setting many flags, only leaving simple zero checks as valid.
Regarding IR tests: Yeah it seems like that would make sense. I will look into that.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1662391908
More information about the hotspot-compiler-dev
mailing list