RFR: 8282204: Use lea instructions for arithmetic operations on x86_64 [v6]

Jie Fu jiefu at openjdk.java.net
Fri Mar 4 05:09:08 UTC 2022


On Mon, 28 Feb 2022 23:42:13 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:

>> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   reviews
>
> The tool measures the throughput of the operations, which is the number of cycles per iteration. Because the processor can execute multiple instructions at the same time, to measure the latency, you should create a dependency chain between the output of the instruction and its input in the next iteration. The technique used by uops.info is to `movsx` (which is an instruction that is not elided) from the output operand back to the input operand, so that the processor must wait for the result of the previous iteration before executing the next one, instead of executing multiple iterations concurrently when there is a lack of dependencies.
> 
> A simple `lea rax, [rbp + rcx + 0x8]; movsx rbp, eax` gives the throughput of 4 cycles, minus the latency of the `movsx` which is 1 gives you the documented latency of 3 (this is the latency between the output and the base operand, similar experiment will give the same answer for the latency between the output and the index operand).
> 
> Thanks.

Hi @merykitty ,

Thanks for your update.

Instead of removing `leaI_rReg_immI`, I would suggest enabling it.

I tried to match it based on your latest patch like this and saw about 2% perf improvement of B_D_int.

diff --git a/src/hotspot/cpu/x86/x86_64.ad b/src/hotspot/cpu/x86/x86_64.ad
index f86dbbb..45f4730 100644
--- a/src/hotspot/cpu/x86/x86_64.ad
+++ b/src/hotspot/cpu/x86/x86_64.ad
@@ -7388,6 +7388,19 @@ instruct addI_rReg(rRegI dst, rRegI src, rFlagsReg cr)
   ins_pipe(ialu_reg_reg);
 %}
 
+instruct leaI_rReg_immI(rRegI dst, no_rbp_r13_RegI base, immI disp)
+%{
+  predicate(VM_Version::supports_fast_2op_lea());
+  match(Set dst (AddI base disp));
+
+  ins_cost(110);
+  format %{ "addr32 leal $dst, [$base + $disp]\t# int" %}
+  ins_encode %{
+    __ leal($dst$$Register, Address($base$$Register, $disp$$constant));
+  %}
+  ins_pipe(ialu_reg_reg);
+%}
+
 instruct addI_rReg_imm(rRegI dst, immI src, rFlagsReg cr)
 %{
   match(Set dst (AddI dst src));
diff --git a/test/micro/org/openjdk/bench/vm/compiler/LeaInstruction.java b/test/micro/org/openjdk/bench/vm/compiler/LeaInstruction.java
index 02b10d7..335c032 100644
--- a/test/micro/org/openjdk/bench/vm/compiler/LeaInstruction.java
+++ b/test/micro/org/openjdk/bench/vm/compiler/LeaInstruction.java
@@ -46,6 +46,17 @@ public class LeaInstruction {
     }
 
     @Benchmark
+    public void B_D_int(Blackhole bh) { 
+        int x = this.x, y = this.y;
+        for (int i = 0; i < ITERATION; i++) {
+            x = x + 10;
+            y = y + 20;
+            bh.consume(x);
+            bh.consume(y);
+        }
+    }
+
+    @Benchmark
     public void B_I_D_int(Blackhole bh) {
         int x = this.x, y = this.y;
         for (int i = 0; i < ITERATION; i++) {


There are also `leaL_rReg_immL` and `leaP_rReg_imm` to be enabled.

What do you think?

-------------

PR: https://git.openjdk.java.net/jdk/pull/7560


More information about the hotspot-compiler-dev mailing list