RFR: 8296548: Improve MD5 intrinsic for x86_64

Tue Nov 15 12:56:00 UTC 2022

On Wed, 9 Nov 2022 07:57:30 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote:

> The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput.
> 
> This change replaces
>     LEA:  r1 = r1 + rsi * 1 + t
> with
>     ADDs: r1 += t; r1 += rsi.
> 
> Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc.
> 
> No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc.
> 
> Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake.

Performance without the optimization on Cascade Lake:

Benchmark                    (digesterName)  (length)  (provider)   Mode  Cnt     Score     Error   Units
MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  3315.328 ±  65.799  ops/ms
MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    27.482 ±   0.006  ops/ms
MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  2916.207 ± 127.293  ops/ms
MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    27.381 ±   0.003  ops/ms

Performance with optimization on Cascade Lake:

Benchmark                    (digesterName)  (length)  (provider)   Mode  Cnt     Score     Error   Units
MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  4474.780 ±  17.583  ops/ms
MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    38.926 ±   0.005  ops/ms
MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  3796.684 ± 153.887  ops/ms
MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    38.724 ±   0.005  ops/ms

-------------

PR: https://git.openjdk.org/jdk/pull/11054