Integrated: JDK-8299158: Improve MD5 intrinsic on AArch64
Yi-Fan Tsai
duke at openjdk.org
Wed Jan 4 14:54:57 UTC 2023
On Wed, 21 Dec 2022 01:52:32 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote:
> There are two optimizations to reduce the length of the data path.
> 1) Replace
>
> __ eorw(rscratch3, rscratch3, r4);
> __ addw(rscratch3, rscratch3, rscratch1);
> __ addw(rscratch3, rscratch3, rscratch4);
>
> with
>
> __ eorw(rscratch3, rscratch3, r4);
> __ addw(rscratch4, rscratch4, rscratch1);
> __ addw(rscratch3, rscratch3, rscratch4);
>
> The eorw and the first addw can be computed in parallel.
>
> 2) Replace
>
> __ eorw(rscratch2, r2, r3);
> __ andw(rscratch3, rscratch2, r4);
> __ eorw(rscratch3, rscratch3, r3);
>
> with
>
> __ andw(rscratch3, r2, r4);
> __ bicw(rscratch4, r3, r4);
> __ orrw(rscratch3, rscratch3, rscratch4);
>
> The transformation is based on the equation `((r2 ^ r3) & r4) ^ r3 == (r2 & r4) | (r3 & -r4)`.
> The two subexpressions on RHS can be computed in parallel.
>
> Correctness proof
>
> r2 r3 r4 (r2 ^ r3) ((r2 ^ r3) & r4) LHS (r2 & r4) (r3 & -r4) RHS
> 0 0 0 0 0 0 0 0 0
> 0 0 1 0 0 0 0 0 0
> 0 1 0 1 0 1 0 1 1
> 0 1 1 1 1 0 0 0 0
> 1 0 0 1 0 0 0 0 0
> 1 0 1 1 1 1 1 0 1
> 1 1 0 0 0 1 0 1 1
> 1 1 1 0 0 1 1 0 1
>
>
> The change has been tested by TestMD5Intrinsics and TestMD5MultiBlockIntrinsics.
>
> The performance is measured on EC2 m6g instance (Graviton2) and shows 18-25% improvement.
> Baseline
>
> Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units
> MessageDigests.digest md5 64 DEFAULT thrpt 15 2989.149 ? 54.895 ops/ms
> MessageDigests.digest md5 16384 DEFAULT thrpt 15 24.927 ? 0.002 ops/ms
> MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 2433.184 ? 74.616 ops/ms
> MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 24.736 ? 0.002 ops/ms
>
> Optimized
>
> Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units
> MessageDigests.digest md5 64 DEFAULT thrpt 15 3719.214 ? 23.087 ops/ms
> MessageDigests.digest md5 16384 DEFAULT thrpt 15 31.280 ? 0.003 ops/ms
> MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 2874.308 ? 88.455 ops/ms
> MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 31.014 ? 0.060 ops/ms
This pull request has now been integrated.
Changeset: c6588d5b
Author: Yi-Fan Tsai <yftsai at amazon.com>
Committer: Ludovic Henry <luhenry at openjdk.org>
URL: https://git.openjdk.org/jdk/commit/c6588d5bb3f778806c9112e86dbfba964c0636fd
Stats: 8 lines in 1 file changed: 1 ins; 1 del; 6 mod
8299158: Improve MD5 intrinsic on AArch64
Reviewed-by: luhenry, haosun, aph
-------------
PR: https://git.openjdk.org/jdk/pull/11748
More information about the hotspot-compiler-dev
mailing list