RFR: 8282204: Use lea instructions for arithmetic operations on x86_64 [v6]

Quan Anh Mai duke at openjdk.java.net
Mon Feb 28 23:48:29 UTC 2022


On Mon, 28 Feb 2022 12:01:31 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:

>> Hi,
>> 
>> This patch adds several matching rules for x86_64 backend to use `lea` instructions for several fused arithmetic operations. Also, `lea`s don't kill flags and allow different `dst` and `src`, so it is preferred over `sll` if possible, too. 
>> 
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
> 
>   reviews

The tool measures the throughput of the operations, which is the number of cycles per iteration. Because the processor can execute multiple instructions at the same time, to measure the latency, you should create a dependency chain between the output of the instruction and its input in the next iteration. The technique used by uops.info is to `movsx` (which is an instruction that is not elided) from the output operand back to the input operand, so that the processor must wait for the result of the previous iteration before executing the next one, instead of executing multiple iterations concurrently when there is a lack of dependencies.

A simple `lea rax, [rbp + rcx + 0x8]; movsx rbp, eax` gives the throughput of 4 cycles, minus the latency of the `movsx` which is 1 gives you the documented latency of 3 (this is the latency between the output and the base operand, similar experiment will give the same answer for the latency between the output and the index operand).

Thanks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7560


More information about the hotspot-compiler-dev mailing list