RFR: 8296548: Improve MD5 intrinsic for x86_64

Vladimir Kozlov kvn at openjdk.org
Tue Nov 15 17:24:20 UTC 2022


On Tue, 15 Nov 2022 13:51:24 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote:

>> The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput.
>> 
>> This change replaces
>>     LEA:  r1 = r1 + rsi * 1 + t
>> with
>>     ADDs: r1 += t; r1 += rsi.
>> 
>> Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc.
>> 
>> No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc.
>> 
>> Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake.
>
> Performance without the optimization on Ice Lake:
> 
> Benchmark                    (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  5402.018 ± 17.033  ops/ms
> MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    43.722 ±  0.003  ops/ms
> MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  4652.620 ± 35.432  ops/ms
> MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    43.573 ±  0.016  ops/ms
> 
> 
> Performance with optimization on Ice Lake:
> 
> Benchmark                    (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  5348.594 ± 14.303  ops/ms
> MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    43.671 ±  0.008  ops/ms
> MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  4583.530 ± 12.752  ops/ms
> MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    43.545 ±  0.006  ops/ms

@yftsai can you merge latest JDK sources? Some of GHA testing failures should be fixed.

-------------

PR: https://git.openjdk.org/jdk/pull/11054


More information about the hotspot-dev mailing list