RFR: 8296548: Improve MD5 intrinsic for x86_64
Yi-Fan Tsai
duke at openjdk.org
Wed Nov 9 08:04:09 UTC 2022
The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput.
This change replaces
LEA: r1 = r1 + rsi * 1 + t
with
ADDs: r1 += t; r1 += rsi.
Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc.
No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc.
Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake.
-------------
Commit messages:
- 8296548: Improve MD5 intrinsic for x86_64
Changes: https://git.openjdk.org/jdk/pull/11054/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11054&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8296548
Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod
Patch: https://git.openjdk.org/jdk/pull/11054.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/11054/head:pull/11054
PR: https://git.openjdk.org/jdk/pull/11054
More information about the hotspot-dev
mailing list