RFR: 8282204: Use lea instructions for arithmetic operations on x86_64 [v2]

Quan Anh Mai duke at openjdk.java.net
Thu Feb 24 12:13:07 UTC 2022


On Wed, 23 Feb 2022 13:54:24 GMT, Jie Fu <jiefu at openjdk.org> wrote:

>> What benefits you have with these changes in real world?
>> It could be fine to use `lea` for merging several instruction, as you did.
>> But last time I was told that `lea` instruction has larger latency than shift instruction because it uses addressing module in CPU. I am not sure it is fine to replace it.
>> Also why you removed match rule which moved result of `Add` to different register?
>
>> @vnkozlov Given `lea` is a really efficient instruction, merging multiple ones into it offers a lot of benefits and all other compilers do so.
> 
> So any benchmark to show the perf improvement?

@DamonFool I found benchmarking these single arithmetic-instruction optimizations is hard, especially these rules which contain constant immediates. This is the result from a benchmark I wrote, my machine doesn't match 3-operand rules so the result of `B_I_D_*` is the same, and `B_IS_D_int` seems to suffer from loop alignment which leads to decoder bottleneck trying to read a lot of `nop`.
Thank you very much.

    Before:
    Benchmark                   Mode  Cnt     Score     Error  Units
    LeaInstruction.B_IS_D_int   avgt   10  1171.089 ±  12.051  ns/op
    LeaInstruction.B_IS_D_long  avgt   10  1214.248 ± 164.069  ns/op
    LeaInstruction.B_IS_int     avgt   10   908.979 ±  57.721  ns/op
    LeaInstruction.B_IS_long    avgt   10  1218.707 ±   2.169  ns/op
    LeaInstruction.B_I_D_int    avgt   10   842.187 ±  65.795  ns/op
    LeaInstruction.B_I_D_long   avgt   10  1289.333 ±   9.978  ns/op
    LeaInstruction.IS_D_int     avgt   10   533.597 ±   1.302  ns/op
    LeaInstruction.IS_D_long    avgt   10   533.198 ±   0.559  ns/op

    After:
    Benchmark                   Mode  Cnt     Score    Error  Units
    LeaInstruction.B_IS_D_int   avgt   10  1217.740 ±  4.110  ns/op
    LeaInstruction.B_IS_D_long  avgt   10   809.962 ±  8.156  ns/op
    LeaInstruction.B_IS_int     avgt   10   536.518 ±  5.076  ns/op
    LeaInstruction.B_IS_long    avgt   10   534.041 ±  1.158  ns/op
    LeaInstruction.B_I_D_int    avgt   10   808.131 ±  0.965  ns/op
    LeaInstruction.B_I_D_long   avgt   10  1287.391 ± 10.025  ns/op
    LeaInstruction.IS_D_int     avgt   10   305.940 ±  9.886  ns/op
    LeaInstruction.IS_D_long    avgt   10   308.969 ±  0.844  ns/op

-------------

PR: https://git.openjdk.java.net/jdk/pull/7560


More information about the hotspot-compiler-dev mailing list