RFR: 8296548: Improve MD5 intrinsic for x86_64

Yi-Fan Tsai duke at openjdk.org
Wed Nov 9 08:04:09 UTC 2022


The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput.

This change replaces
    LEA:  r1 = r1 + rsi * 1 + t
with
    ADDs: r1 += t; r1 += rsi.

Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc.

No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc.

Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake.

-------------

Commit messages:
 - 8296548: Improve MD5 intrinsic for x86_64

Changes: https://git.openjdk.org/jdk/pull/11054/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11054&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8296548
  Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/11054.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/11054/head:pull/11054

PR: https://git.openjdk.org/jdk/pull/11054


More information about the hotspot-dev mailing list