RFR: 8296548: Improve MD5 intrinsic for x86_64 [v2]
Yi-Fan Tsai
duke at openjdk.org
Tue Nov 15 23:43:12 UTC 2022
> The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput.
>
> This change replaces
> LEA: r1 = r1 + rsi * 1 + t
> with
> ADDs: r1 += t; r1 += rsi.
>
> Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc.
>
> No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc.
>
> Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake.
Yi-Fan Tsai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
- Merge branch 'openjdk:master' into JDK-8296548
- 8296548: Improve MD5 intrinsic for x86_64
The LEA instruction loads the effective address, but MD5 intrinsic uses
it for computing values than addresses. This usage potentially uses
more cycles than ADDs and reduces the throughput.
This change replaces
LEA: r1 = r1 + rsi * 1 + t
with
ADDs: r1 += t; r1 += rsi.
Microbenchmark evaluation shows ~40% performance improvement on Haswell,
Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd
gen Epyc.
No performance change for the same microbenchmark on Ice Lake and 3rd
gen Epyc.
Similar results can also be observed in TestMD5Intrinsics and
TestMD5MultiBlockIntrinsics with a more moderate improvement, e.g. ~15%
improvement in throughput on Haswell.
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/11054/files
- new: https://git.openjdk.org/jdk/pull/11054/files/6ed4348c..be07b342
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=11054&range=01
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=11054&range=00-01
Stats: 11165 lines in 460 files changed: 4691 ins; 4515 del; 1959 mod
Patch: https://git.openjdk.org/jdk/pull/11054.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/11054/head:pull/11054
PR: https://git.openjdk.org/jdk/pull/11054
More information about the hotspot-dev
mailing list