RFR: 8307555: Reduce memory reads in x86 MD5 intrinsic
Paul Hohensee
phh at openjdk.org
Mon May 15 18:41:43 UTC 2023
On Fri, 5 May 2023 21:08:30 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote:
> The optimization is addressing the redundant memory reads below.
>
>
> loop0:
> movl(rax, Address(rdi, 0)); // 4) read the value at the address stored in rdi (The value was just written to the memory.)
> // loop body
> addl(Address(rdi, 0), rax); // 1) read the value at the address stored in rdi, 2) add the value of rax, 3) write back to the address stored in rdi
> // jump to loop0
>
>
> This pattern is optimized by removing the redundant memory reads.
>
>
> movl(rax, Address(rdi, 0));
> loop0:
> // loop body
> addl(rax, Address(rdi, 0)); // 1) read the value at the address stored in rdi, 2) add the value to rax
> movl(Address(rdi, 0), rax); // 3) write the value to the address stored in rdi
> // jump to loop0
>
>
> The following tests passed.
>
> jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java
> jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java
>
>
> The performance is improved by ~ 1-2% with `micro:org.openjdk.bench.java.security.MessageDigests`.
>
> | | digest | digest | getAndDigest | getAndDigest | |
> |--------------|-----------------------|-----------------------|-----------------------------|------------------------------|-------|
> | | 64 | 16,384 | 64 | 16,384 | bytes |
> | Ice Lake | -0.19% | 1.63% | -0.07% | 1.69%
> | Cascade Lake | -0.28% | 0.98% | 0.43% | 0.96%
> | Haswell | -0.47% | 2.16% | 1.02% | 1.94%
>
> Ice Lake
>
> Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units
> -- Baseline ---------------------------------------------------------------------------------------------
> MessageDigests.digest md5 64 DEFAULT thrpt 15 5350.876 ± 12.489 ops/ms
> MessageDigests.digest md5 16384 DEFAULT thrpt 15 43.691 ± 0.013 ops/ms
> MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 4545.059 ± 55.981 ops/ms
> MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 43.523 ± 0.012 ops/ms
> -- Optimized --------------------------------------------------------------------------------------------
> MessageDigests.digest ...
Lgtm.
-------------
Marked as reviewed by phh (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/13845#pullrequestreview-1427152219
More information about the hotspot-dev
mailing list