RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v2]

Tue Aug 9 13:28:31 UTC 2022

On Tue, 9 Aug 2022 13:18:05 GMT, Quan Anh Mai <duke at openjdk.org> wrote:

>> The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction.
>> 
>> Testing: Manually run the test case in the JBS and look at the compiled code.
>> 
>> I also do some small clean-ups in x86_64.ad:
>> 
>> - For some reasons, `incl(Address)` is less efficient than `addl(Address, int)` as the former results in 3 uops in the fused domain as opposed to 2 in cases of the latter (according to [uops.info](uops.info)). As a result, I propose to remove the corresponding rules.
>> - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed.
>> - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation.
>> - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed
>> - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed.
>> - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register.
>> 
>> Please kindly review, thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
> 
>   add benchmark

Thanks for your review, the pattern mentioned in the JBS is to load from memory, do an addition with a constant and have the result as a local variable, which is different from `incI_mem` which requires the result to be stored back to the exact same memory.

`incI_mem` is removed because the pattern has already been covered by `addI_mem_imm` and the latter produces better code. Similarly, the pattern covered by `salI_rReg_1` is already covered by `salI_rReg_imm` and these 2 rules otherwise are exactly the same, which is redundant.

I have added a JMH benchmark for the affected cases and the results are as follow:

                                            Before             After
    Benchmark               Mode  Cnt    Score   Error     Score   Error  Units   Change
    BasicRules.add_mem_con  avgt    5  200.078 ± 1.555   200.663 ± 1.919  ns/op    +0.3%
    BasicRules.inc_mem      avgt    5  355.942 ± 0.695   322.630 ± 1.740  ns/op    -9.4%

-------------

PR: https://git.openjdk.org/jdk/pull/9791