RFR: 8296548: Improve MD5 intrinsic for x86_64 [v2]

Yi-Fan Tsai duke at openjdk.org
Tue Nov 15 23:43:12 UTC 2022


> The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput.
> 
> This change replaces
>     LEA:  r1 = r1 + rsi * 1 + t
> with
>     ADDs: r1 += t; r1 += rsi.
> 
> Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc.
> 
> No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc.
> 
> Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake.

Yi-Fan Tsai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:

 - Merge branch 'openjdk:master' into JDK-8296548
 - 8296548: Improve MD5 intrinsic for x86_64
   
   The LEA instruction loads the effective address, but MD5 intrinsic uses
   it for computing values than addresses. This usage potentially uses
   more cycles than ADDs and reduces the throughput.
   
   This change replaces
       LEA:  r1 = r1 + rsi * 1 + t
   with
       ADDs: r1 += t; r1 += rsi.
   
   Microbenchmark evaluation shows ~40% performance improvement on Haswell,
   Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd
   gen Epyc.
   
   No performance change for the same microbenchmark on Ice Lake and 3rd
   gen Epyc.
   
   Similar results can also be observed in TestMD5Intrinsics and
   TestMD5MultiBlockIntrinsics with a more moderate improvement, e.g. ~15%
   improvement in throughput on Haswell.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/11054/files
  - new: https://git.openjdk.org/jdk/pull/11054/files/6ed4348c..be07b342

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=11054&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11054&range=00-01

  Stats: 11165 lines in 460 files changed: 4691 ins; 4515 del; 1959 mod
  Patch: https://git.openjdk.org/jdk/pull/11054.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/11054/head:pull/11054

PR: https://git.openjdk.org/jdk/pull/11054


More information about the hotspot-dev mailing list