Integrated: 8307555: Reduce memory reads in x86 MD5 intrinsic

Yi-Fan Tsai duke at openjdk.org
Mon May 15 18:44:52 UTC 2023


On Fri, 5 May 2023 21:08:30 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote:

> The optimization is addressing the redundant memory reads below.
> 
> 
> loop0:
>   movl(rax, Address(rdi, 0));       // 4) read the value at the address stored in rdi (The value was just written to the memory.)
>   // loop body
>   addl(Address(rdi, 0), rax);       // 1) read the value at the address stored in rdi, 2) add the value of rax, 3) write back to the address stored in rdi
>   // jump to loop0
> 
> 
> This pattern is optimized by removing the redundant memory reads.
> 
> 
>   movl(rax, Address(rdi, 0));
> loop0:
>   // loop body
>   addl(rax, Address(rdi, 0));       // 1) read the value at the address stored in rdi, 2) add the value to rax
>   movl(Address(rdi, 0), rax);       // 3) write the value to the address stored in rdi
>   // jump to loop0
> 
> 
> The following tests passed.
> 
> jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java
> jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java
> 
> 
> The performance is improved by ~ 1-2% with `micro:org.openjdk.bench.java.security.MessageDigests`.
> 
> |              | digest | digest | getAndDigest | getAndDigest  |       |
> |--------------|-----------------------|-----------------------|-----------------------------|------------------------------|-------|
> |              | 64                    | 16,384                | 64                          | 16,384                       | bytes |
> | Ice Lake     | -0.19%                | 1.63%                 | -0.07%                      | 1.69%
> | Cascade Lake | -0.28%                | 0.98%                 | 0.43%                       | 0.96%
> | Haswell      | -0.47%                | 2.16%                 | 1.02%                       | 1.94%
> 
> Ice Lake
> 
> Benchmark                    (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> -- Baseline ---------------------------------------------------------------------------------------------
> MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  5350.876 ± 12.489  ops/ms
> MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    43.691 ±  0.013  ops/ms
> MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  4545.059 ± 55.981  ops/ms
> MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    43.523 ±  0.012  ops/ms
> -- Optimized --------------------------------------------------------------------------------------------
> MessageDigests.digest        ...

This pull request has now been integrated.

Changeset: 43c8c650
Author:    Yi-Fan Tsai <yifan.tsai at gmail.com>
Committer: Paul Hohensee <phh at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/43c8c650afe3c86ce4d59390eb0648548ed33126
Stats:     16 lines in 1 file changed: 6 ins; 5 del; 5 mod

8307555: Reduce memory reads in x86 MD5 intrinsic

Reviewed-by: simonis, phh

-------------

PR: https://git.openjdk.org/jdk/pull/13845


More information about the hotspot-dev mailing list