RFR: 8307555: Reduce memory reads in x86 MD5 intrinsic
Yi-Fan Tsai
duke at openjdk.org
Sat May 6 01:22:27 UTC 2023
The optimization is addressing the redundant memory reads below.
loop0:
movl(rax, Address(rdi, 0)); // 4) read the value at the address stored in rdi (The value was just written to the memory.)
// loop body
addl(Address(rdi, 0), rax); // 1) read the value at the address stored in rdi, 2) add the value of rax, 3) write back to the address stored in rdi
// jump to loop0
This pattern is optimized by removing the redundant memory reads.
movl(rax, Address(rdi, 0));
loop0:
// loop body
addl(rax, Address(rdi, 0)); // 1) read the value at the address stored in rdi, 2) add the value to rax
movl(Address(rdi, 0), rax); // 3) write the value to the address stored in rdi
// jump to loop0
The following tests passed.
jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java
jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java
The performance is improved by ~ 1-2% with `micro:org.openjdk.bench.java.security.MessageDigests`.
| | digest | digest | getAndDigest | getAndDigest | |
|--------------|-----------------------|-----------------------|-----------------------------|------------------------------|-------|
| | 64 | 16,384 | 64 | 16,384 | bytes |
| Ice Lake | -0.19% | 1.63% | -0.07% | 1.69%
| Cascade Lake | -0.28% | 0.98% | 0.43% | 0.96%
| Haswell | -0.47% | 2.16% | 1.02% | 1.94%
Ice Lake
Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units
-- Baseline ---------------------------------------------------------------------------------------------
MessageDigests.digest md5 64 DEFAULT thrpt 15 5350.876 ± 12.489 ops/ms
MessageDigests.digest md5 16384 DEFAULT thrpt 15 43.691 ± 0.013 ops/ms
MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 4545.059 ± 55.981 ops/ms
MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 43.523 ± 0.012 ops/ms
-- Optimized --------------------------------------------------------------------------------------------
MessageDigests.digest md5 64 DEFAULT thrpt 15 5340.630 ± 17.155 ops/ms
MessageDigests.digest md5 16384 DEFAULT thrpt 15 44.401 ± 0.011 ops/ms
MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 4541.748 ± 13.583 ops/ms
MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 44.257 ± 0.025 ops/ms
Cascade Lake
Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units
-- Baseline ---------------------------------------------------------------------------------------------
MessageDigests.digest md5 64 DEFAULT thrpt 15 4483.860 ± 12.864 ops/ms
MessageDigests.digest md5 16384 DEFAULT thrpt 15 38.924 ± 0.006 ops/ms
MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 3682.282 ± 159.619 ops/ms
MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 38.695 ± 0.007 ops/ms
-- Optimized --------------------------------------------------------------------------------------------
MessageDigests.digest md5 64 DEFAULT thrpt 15 4471.167 ± 16.366 ops/ms
MessageDigests.digest md5 16384 DEFAULT thrpt 15 39.307 ± 0.006 ops/ms
MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 3698.120 ± 162.463 ops/ms
MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 39.066 ± 0.008 ops/ms
Haswell
Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units
-- Baseline ---------------------------------------------------------------------------------------------
MessageDigests.digest md5 64 DEFAULT thrpt 15 3673.925 ± 33.793 ops/ms
MessageDigests.digest md5 16384 DEFAULT thrpt 15 33.526 ± 0.107 ops/ms
MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 3092.655 ± 120.806 ops/ms
MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 33.479 ± 0.135 ops/ms
-- Optimized --------------------------------------------------------------------------------------------
MessageDigests.digest md5 64 DEFAULT thrpt 15 3656.642 ± 47.520 ops/ms
MessageDigests.digest md5 16384 DEFAULT thrpt 15 34.251 ± 0.089 ops/ms
MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 3124.269 ± 121.331 ops/ms
MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 34.130 ± 0.117 ops/ms
-------------
Commit messages:
- Merge branch 'openjdk:master' into JDK-8307555
- 8307555: Reduce memory reads in x86 MD5 intrinsic
Changes: https://git.openjdk.org/jdk/pull/13845/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13845&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8307555
Stats: 16 lines in 1 file changed: 6 ins; 5 del; 5 mod
Patch: https://git.openjdk.org/jdk/pull/13845.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/13845/head:pull/13845
PR: https://git.openjdk.org/jdk/pull/13845
More information about the hotspot-dev
mailing list