RFR: 8307555: Reduce memory reads in x86 MD5 intrinsic

Yi-Fan Tsai duke at openjdk.org
Sat May 6 01:22:27 UTC 2023


The optimization is addressing the redundant memory reads below.


loop0:
  movl(rax, Address(rdi, 0));       // 4) read the value at the address stored in rdi (The value was just written to the memory.)
  // loop body
  addl(Address(rdi, 0), rax);       // 1) read the value at the address stored in rdi, 2) add the value of rax, 3) write back to the address stored in rdi
  // jump to loop0


This pattern is optimized by removing the redundant memory reads.


  movl(rax, Address(rdi, 0));
loop0:
  // loop body
  addl(rax, Address(rdi, 0));       // 1) read the value at the address stored in rdi, 2) add the value to rax
  movl(Address(rdi, 0), rax);       // 3) write the value to the address stored in rdi
  // jump to loop0


The following tests passed.

jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java
jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java


The performance is improved by ~ 1-2% with `micro:org.openjdk.bench.java.security.MessageDigests`.

|              | digest | digest | getAndDigest | getAndDigest  |       |
|--------------|-----------------------|-----------------------|-----------------------------|------------------------------|-------|
|              | 64                    | 16,384                | 64                          | 16,384                       | bytes |
| Ice Lake     | -0.19%                | 1.63%                 | -0.07%                      | 1.69%
| Cascade Lake | -0.28%                | 0.98%                 | 0.43%                       | 0.96%
| Haswell      | -0.47%                | 2.16%                 | 1.02%                       | 1.94%

Ice Lake

Benchmark                    (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
-- Baseline ---------------------------------------------------------------------------------------------
MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  5350.876 ± 12.489  ops/ms
MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    43.691 ±  0.013  ops/ms
MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  4545.059 ± 55.981  ops/ms
MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    43.523 ±  0.012  ops/ms
-- Optimized --------------------------------------------------------------------------------------------
MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  5340.630 ± 17.155  ops/ms
MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    44.401 ±  0.011  ops/ms
MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  4541.748 ± 13.583  ops/ms
MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    44.257 ±  0.025  ops/ms


Cascade Lake

Benchmark                    (digesterName)  (length)  (provider)   Mode  Cnt     Score     Error   Units
-- Baseline ---------------------------------------------------------------------------------------------
MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  4483.860 ±  12.864  ops/ms
MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    38.924 ±   0.006  ops/ms
MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  3682.282 ± 159.619  ops/ms
MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    38.695 ±   0.007  ops/ms
-- Optimized --------------------------------------------------------------------------------------------
MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  4471.167 ±  16.366  ops/ms
MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    39.307 ±   0.006  ops/ms
MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  3698.120 ± 162.463  ops/ms
MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    39.066 ±   0.008  ops/ms


Haswell

Benchmark                    (digesterName)  (length)  (provider)   Mode  Cnt     Score     Error   Units
-- Baseline ---------------------------------------------------------------------------------------------
MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  3673.925 ±  33.793  ops/ms
MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    33.526 ±   0.107  ops/ms
MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  3092.655 ± 120.806  ops/ms
MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    33.479 ±   0.135  ops/ms
-- Optimized --------------------------------------------------------------------------------------------
MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  3656.642 ±  47.520  ops/ms
MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    34.251 ±   0.089  ops/ms
MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  3124.269 ± 121.331  ops/ms
MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    34.130 ±   0.117  ops/ms

-------------

Commit messages:
 - Merge branch 'openjdk:master' into JDK-8307555
 - 8307555: Reduce memory reads in x86 MD5 intrinsic

Changes: https://git.openjdk.org/jdk/pull/13845/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13845&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8307555
  Stats: 16 lines in 1 file changed: 6 ins; 5 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/13845.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13845/head:pull/13845

PR: https://git.openjdk.org/jdk/pull/13845


More information about the hotspot-dev mailing list