RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency [v2]

Oli Gillespie ogillespie at openjdk.org
Thu Sep 26 14:58:49 UTC 2024


> As suggested in https://github.com/animetosho/md5-optimisation?tab=readme-ov-file#dependency-shortcut-in-g-function, we can delay the dependency on 'b' by recognizing that the ((d & b) | (~d & c)) is equivalent to ((d & b) + (~d & c)) in this scenario, and we can perform those additions independently, leaving our dependency on b to the final addition. This speeds it up around 5%.
> 
> Benchmark results on my two hosts:
> 
> 
> Benchmark                  (algorithm)  (dataSize)  (provider)   Mode  Cnt    Score   Error  Units
> 
> x86 Before:
> MessageDigestBench.digest          MD5     1048576              thrpt   10  636.389 ± 0.240  ops/s
> 
> x86 After:
> MessageDigestBench.digest          MD5     1048576              thrpt   10  671.611 ± 0.226  ops/s (+5.5%)
> 
> 
> aarch64 Before:
> MessageDigestBench.digest          MD5     1048576              thrpt   10  498.613 ± 0.359  ops/s
> 
> aarch64 After:
> MessageDigestBench.digest          MD5     1048576              thrpt   10  526.008 ± 0.491  ops/s (+5.6%)

Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision:

  Fix aarch64 bug

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/21203/files
  - new: https://git.openjdk.org/jdk/pull/21203/files/d7641133..e6d95c2f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=21203&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21203&range=00-01

  Stats: 5 lines in 1 file changed: 1 ins; 1 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/21203.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21203/head:pull/21203

PR: https://git.openjdk.org/jdk/pull/21203


More information about the hotspot-dev mailing list