RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v2]
Quan Anh Mai
duke at openjdk.org
Tue Aug 9 13:28:31 UTC 2022
On Tue, 9 Aug 2022 13:18:05 GMT, Quan Anh Mai <duke at openjdk.org> wrote:
>> The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction.
>>
>> Testing: Manually run the test case in the JBS and look at the compiled code.
>>
>> I also do some small clean-ups in x86_64.ad:
>>
>> - For some reasons, `incl(Address)` is less efficient than `addl(Address, int)` as the former results in 3 uops in the fused domain as opposed to 2 in cases of the latter (according to [uops.info](uops.info)). As a result, I propose to remove the corresponding rules.
>> - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed.
>> - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation.
>> - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed
>> - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed.
>> - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register.
>>
>> Please kindly review, thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
>
> add benchmark
Thanks for your review, the pattern mentioned in the JBS is to load from memory, do an addition with a constant and have the result as a local variable, which is different from `incI_mem` which requires the result to be stored back to the exact same memory.
`incI_mem` is removed because the pattern has already been covered by `addI_mem_imm` and the latter produces better code. Similarly, the pattern covered by `salI_rReg_1` is already covered by `salI_rReg_imm` and these 2 rules otherwise are exactly the same, which is redundant.
I have added a JMH benchmark for the affected cases and the results are as follow:
Before After
Benchmark Mode Cnt Score Error Score Error Units Change
BasicRules.add_mem_con avgt 5 200.078 ± 1.555 200.663 ± 1.919 ns/op +0.3%
BasicRules.inc_mem avgt 5 355.942 ± 0.695 322.630 ± 1.740 ns/op -9.4%
-------------
PR: https://git.openjdk.org/jdk/pull/9791
More information about the hotspot-compiler-dev
mailing list