RFR: 8307555: Reduce memory reads in x86 MD5 intrinsic

Volker Simonis simonis at openjdk.org
Mon May 15 13:47:48 UTC 2023


On Fri, 5 May 2023 21:08:30 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote:

> The optimization is addressing the redundant memory reads below.
> 
> 
> loop0:
>   movl(rax, Address(rdi, 0));       // 4) read the value at the address stored in rdi (The value was just written to the memory.)
>   // loop body
>   addl(Address(rdi, 0), rax);       // 1) read the value at the address stored in rdi, 2) add the value of rax, 3) write back to the address stored in rdi
>   // jump to loop0
> 
> 
> This pattern is optimized by removing the redundant memory reads.
> 
> 
>   movl(rax, Address(rdi, 0));
> loop0:
>   // loop body
>   addl(rax, Address(rdi, 0));       // 1) read the value at the address stored in rdi, 2) add the value to rax
>   movl(Address(rdi, 0), rax);       // 3) write the value to the address stored in rdi
>   // jump to loop0
> 
> 
> The following tests passed.
> 
> jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java
> jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java
> 
> 
> The performance is improved by ~ 1-2% with `micro:org.openjdk.bench.java.security.MessageDigests`.
> 
> |              | digest | digest | getAndDigest | getAndDigest  |       |
> |--------------|-----------------------|-----------------------|-----------------------------|------------------------------|-------|
> |              | 64                    | 16,384                | 64                          | 16,384                       | bytes |
> | Ice Lake     | -0.19%                | 1.63%                 | -0.07%                      | 1.69%
> | Cascade Lake | -0.28%                | 0.98%                 | 0.43%                       | 0.96%
> | Haswell      | -0.47%                | 2.16%                 | 1.02%                       | 1.94%
> 
> Ice Lake
> 
> Benchmark                    (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> -- Baseline ---------------------------------------------------------------------------------------------
> MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  5350.876 ± 12.489  ops/ms
> MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    43.691 ±  0.013  ops/ms
> MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  4545.059 ± 55.981  ops/ms
> MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    43.523 ±  0.012  ops/ms
> -- Optimized --------------------------------------------------------------------------------------------
> MessageDigests.digest        ...

Looks good to me.

-------------

Marked as reviewed by simonis (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/13845#pullrequestreview-1426604784


More information about the hotspot-dev mailing list