RFR: 8282204: Use lea instructions for arithmetic operations on x86_64 [v5]

Mon Feb 28 10:51:52 UTC 2022

On Sat, 26 Feb 2022 01:46:52 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:

> I'm really confused here because I can't reproduce this locally as well as on Godbolt with all relevant architectures, and also I don't see why clang is trying to change stack frame in such a small function in your generated code. Maybe the compiler is a little different on MacOS?
> 
> By the way, as noted in the manual as well as experimenting with various different compilers, I think 2 `add`s would be more efficient for pre-Icelakes since it offers a little better latency and removes the concern of bottleneck on port 1.

Yes, I see.

It will generate `lea` for `base + index + disp` pattern only when fast-3op is supported.
So it would be smarter than gcc and llvm.

The patch still generates rpb as base register, which should be avoided according to Intel's manual.

040     B2: #   out( B2 B3 ) <- in( B1 B2 ) Loop( B2-B2 inner ) Freq: 1007.4
040     addl    RDX, RBP        # int
042     addr32 leal RBP, [RBP + RDX + #30]      # int
047     addl    RDX, #10        # int
04a     incl    R11     # int
04d     cmpl    R11, #1000
054     jl,s   B2       # loop end  P=0.999007 C=260663.000000

-------------

PR: https://git.openjdk.java.net/jdk/pull/7560