RFR: 8282204: Use lea instructions for arithmetic operations on x86_64 [v6]
Jie Fu
jiefu at openjdk.java.net
Tue Mar 1 00:12:06 UTC 2022
On Mon, 28 Feb 2022 23:42:13 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:
> The tool measures the throughput of the operations, which is the number of cycles per iteration. Because the processor can execute multiple instructions at the same time, to measure the latency, you should create a dependency chain between the output of the instruction and its input in the next iteration. The technique used by uops.info is to `movsx` (which is an instruction that is not elided) from the output operand back to the input operand, so that the processor must wait for the result of the previous iteration before executing the next one, instead of executing multiple iterations concurrently when there is a lack of dependencies.
>
> A simple `lea rax, [rbp + rcx + 0x8]; movsx rbp, eax` gives the throughput of 4 cycles, minus the latency of the `movsx` which is 1 gives you the documented latency of 3 (this is the latency between the output and the base operand, similar experiment will give the same answer for the latency between the output and the index operand).
>
> Thanks.
Thanks for your clarification.
I will try it later today.
I would suggest reverting unnecessary changes of the other register operand definitions in the AD file.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7560
More information about the hotspot-compiler-dev
mailing list