RFR: 8296548: Improve MD5 intrinsic for x86_64

Ludovic Henry luhenry at openjdk.org
Mon Nov 14 15:49:31 UTC 2022


On Wed, 9 Nov 2022 07:57:30 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote:

> The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput.
> 
> This change replaces
>     LEA:  r1 = r1 + rsi * 1 + t
> with
>     ADDs: r1 += t; r1 += rsi.
> 
> Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc.
> 
> No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc.
> 
> Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake.

Could you please post JMH microbenchmarks with and without this change? You can run them with `org.openjdk.bench.java.security.MessageDigests` [1]

[1] https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/security/MessageDigests.java

-------------

PR: https://git.openjdk.org/jdk/pull/11054


More information about the hotspot-compiler-dev mailing list