RFR: JDK-8299158: Improve MD5 intrinsic on AArch64

Yi-Fan Tsai duke at openjdk.org
Tue Jan 3 23:36:49 UTC 2023


On Wed, 21 Dec 2022 01:52:32 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote:

> There are two optimizations to reduce the length of the data path.
> 1) Replace
> 
>     __ eorw(rscratch3, rscratch3, r4);
>     __ addw(rscratch3, rscratch3, rscratch1);
>     __ addw(rscratch3, rscratch3, rscratch4);
> 
> with
> 
>     __ eorw(rscratch3, rscratch3, r4);
>     __ addw(rscratch4, rscratch4, rscratch1);
>     __ addw(rscratch3, rscratch3, rscratch4);
> 
> The eorw and the first addw can be computed in parallel.
> 
> 2) Replace
> 
>     __ eorw(rscratch2, r2, r3);
>     __ andw(rscratch3, rscratch2, r4);
>     __ eorw(rscratch3, rscratch3, r3);
> 
> with
> 
>     __ andw(rscratch3, r2, r4);
>     __ bicw(rscratch4, r3, r4);
>     __ orrw(rscratch3, rscratch3, rscratch4);
> 
> The transformation is based on the equation `((r2 ^ r3) & r4) ^ r3 == (r2 & r4) | (r3 & -r4)`.
> The two subexpressions on RHS can be computed in parallel.
> 
> Correctness proof
> 
> r2 r3 r4 (r2 ^ r3) ((r2 ^ r3) & r4) LHS (r2 & r4) (r3 & -r4) RHS
>  0  0  0     0                0      0      0         0       0
>  0  0  1     0                0      0      0         0       0
>  0  1  0     1                0      1      0         1       1
>  0  1  1     1                1      0      0         0       0
>  1  0  0     1                0      0      0         0       0
>  1  0  1     1                1      1      1         0       1
>  1  1  0     0                0      1      0         1       1
>  1  1  1     0                0      1      1         0       1
> 
> 
> The change has been tested by TestMD5Intrinsics and TestMD5MultiBlockIntrinsics.
> 
> The performance is measured on EC2 m6g instance (Graviton2) and shows 18-25% improvement.
> Baseline
> 
> Benchmark                    (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  2989.149 ? 54.895  ops/ms
> MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    24.927 ?  0.002  ops/ms
> MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  2433.184 ? 74.616  ops/ms
> MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    24.736 ?  0.002  ops/ms
> 
> Optimized
> 
> Benchmark                    (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest                   md5        64     DEFAULT  thrpt   15  3719.214 ? 23.087  ops/ms
> MessageDigests.digest                   md5     16384     DEFAULT  thrpt   15    31.280 ?  0.003  ops/ms
> MessageDigests.getAndDigest             md5        64     DEFAULT  thrpt   15  2874.308 ? 88.455  ops/ms
> MessageDigests.getAndDigest             md5     16384     DEFAULT  thrpt   15    31.014 ?  0.060  ops/ms

@theRealAph Hi Andrew, may you help with reviewing this change?

-------------

PR: https://git.openjdk.org/jdk/pull/11748


More information about the hotspot-compiler-dev mailing list