RFR: 8282204: Use lea instructions for arithmetic operations on x86_64 [v2]
Quan Anh Mai
duke at openjdk.java.net
Thu Feb 24 12:13:07 UTC 2022
On Wed, 23 Feb 2022 13:54:24 GMT, Jie Fu <jiefu at openjdk.org> wrote:
>> What benefits you have with these changes in real world?
>> It could be fine to use `lea` for merging several instruction, as you did.
>> But last time I was told that `lea` instruction has larger latency than shift instruction because it uses addressing module in CPU. I am not sure it is fine to replace it.
>> Also why you removed match rule which moved result of `Add` to different register?
>
>> @vnkozlov Given `lea` is a really efficient instruction, merging multiple ones into it offers a lot of benefits and all other compilers do so.
>
> So any benchmark to show the perf improvement?
@DamonFool I found benchmarking these single arithmetic-instruction optimizations is hard, especially these rules which contain constant immediates. This is the result from a benchmark I wrote, my machine doesn't match 3-operand rules so the result of `B_I_D_*` is the same, and `B_IS_D_int` seems to suffer from loop alignment which leads to decoder bottleneck trying to read a lot of `nop`.
Thank you very much.
Before:
Benchmark Mode Cnt Score Error Units
LeaInstruction.B_IS_D_int avgt 10 1171.089 ± 12.051 ns/op
LeaInstruction.B_IS_D_long avgt 10 1214.248 ± 164.069 ns/op
LeaInstruction.B_IS_int avgt 10 908.979 ± 57.721 ns/op
LeaInstruction.B_IS_long avgt 10 1218.707 ± 2.169 ns/op
LeaInstruction.B_I_D_int avgt 10 842.187 ± 65.795 ns/op
LeaInstruction.B_I_D_long avgt 10 1289.333 ± 9.978 ns/op
LeaInstruction.IS_D_int avgt 10 533.597 ± 1.302 ns/op
LeaInstruction.IS_D_long avgt 10 533.198 ± 0.559 ns/op
After:
Benchmark Mode Cnt Score Error Units
LeaInstruction.B_IS_D_int avgt 10 1217.740 ± 4.110 ns/op
LeaInstruction.B_IS_D_long avgt 10 809.962 ± 8.156 ns/op
LeaInstruction.B_IS_int avgt 10 536.518 ± 5.076 ns/op
LeaInstruction.B_IS_long avgt 10 534.041 ± 1.158 ns/op
LeaInstruction.B_I_D_int avgt 10 808.131 ± 0.965 ns/op
LeaInstruction.B_I_D_long avgt 10 1287.391 ± 10.025 ns/op
LeaInstruction.IS_D_int avgt 10 305.940 ± 9.886 ns/op
LeaInstruction.IS_D_long avgt 10 308.969 ± 0.844 ns/op
-------------
PR: https://git.openjdk.java.net/jdk/pull/7560
More information about the hotspot-compiler-dev
mailing list