RFR: JDK-8299158: Improve MD5 intrinsic on AArch64

Yi-Fan Tsai duke at openjdk.org
Wed Dec 21 02:00:35 UTC 2022


There are two optimizations to reduce the length of the data path.
1) Replace

    __ eorw(rscratch3, rscratch3, r4);
    __ addw(rscratch3, rscratch3, rscratch1);
    __ addw(rscratch3, rscratch3, rscratch4);

with

    __ eorw(rscratch3, rscratch3, r4);
    __ addw(rscratch4, rscratch4, rscratch1);
    __ addw(rscratch3, rscratch3, rscratch4);

The eorw and the first addw can be computed in parallel.

2) Replace

    __ eorw(rscratch2, r2, r3);
    __ andw(rscratch3, rscratch2, r4);
    __ eorw(rscratch3, rscratch3, r3);

with

    __ andw(rscratch3, r2, r4);
    __ bicw(rscratch4, r3, r4);
    __ orrw(rscratch3, rscratch3, rscratch4);

The transformation is based on the equation `((r2 ^ r3) & r4) ^ r3 == (r2 & r4) | (r3 & -r4)`.
The two subexpressions on RHS can be computed in parallel.

Correctness proof

r2 r3 r4 (r2 ^ r3) ((r2 ^ r3) & r4) LHS (r2 & r4) (r3 & -r4) RHS
 0  0  0     0                0      0      0         0       0
 0  0  1     0                0      0      0         0       0
 0  1  0     1                0      1      0         1       1
 0  1  1     1                1      0      0         0       0
 1  0  0     1                0      0      0         0       0
 1  0  1     1                1      1      1         0       1
 1  1  0     0                0      1      0         1       1
 1  1  1     0                0      1      1         0       1


The change has been tested by TestMD5Intrinsics and TestMD5MultiBlockIntrinsics.

The performance is measured on EC2 m6g instance (Graviton2) and shows 18-25% improvement.
Baseline

Benchmark                    (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  2989.149 ? 54.895  ops/ms
MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    24.927 ?  0.002  ops/ms
MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  2433.184 ? 74.616  ops/ms
MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    24.736 ?  0.002  ops/ms

Optimized

Benchmark                    (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  3719.214 ? 23.087  ops/ms
MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    31.280 ?  0.003  ops/ms
MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  2874.308 ? 88.455  ops/ms
MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    31.014 ?  0.060  ops/ms

-------------

Commit messages:
 - transform GG
 - Reduce the length of data path

Changes: https://git.openjdk.org/jdk/pull/11748/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11748&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8299158
  Stats: 8 lines in 1 file changed: 1 ins; 1 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/11748.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/11748/head:pull/11748

PR: https://git.openjdk.org/jdk/pull/11748


More information about the hotspot-compiler-dev mailing list