RFR: JDK-8299158: Improve MD5 intrinsic on AArch64
Yi-Fan Tsai
duke at openjdk.org
Tue Jan 3 23:36:49 UTC 2023
On Wed, 21 Dec 2022 01:52:32 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote:
> There are two optimizations to reduce the length of the data path.
> 1) Replace
>
> __ eorw(rscratch3, rscratch3, r4);
> __ addw(rscratch3, rscratch3, rscratch1);
> __ addw(rscratch3, rscratch3, rscratch4);
>
> with
>
> __ eorw(rscratch3, rscratch3, r4);
> __ addw(rscratch4, rscratch4, rscratch1);
> __ addw(rscratch3, rscratch3, rscratch4);
>
> The eorw and the first addw can be computed in parallel.
>
> 2) Replace
>
> __ eorw(rscratch2, r2, r3);
> __ andw(rscratch3, rscratch2, r4);
> __ eorw(rscratch3, rscratch3, r3);
>
> with
>
> __ andw(rscratch3, r2, r4);
> __ bicw(rscratch4, r3, r4);
> __ orrw(rscratch3, rscratch3, rscratch4);
>
> The transformation is based on the equation `((r2 ^ r3) & r4) ^ r3 == (r2 & r4) | (r3 & -r4)`.
> The two subexpressions on RHS can be computed in parallel.
>
> Correctness proof
>
> r2 r3 r4 (r2 ^ r3) ((r2 ^ r3) & r4) LHS (r2 & r4) (r3 & -r4) RHS
> 0 0 0 0 0 0 0 0 0
> 0 0 1 0 0 0 0 0 0
> 0 1 0 1 0 1 0 1 1
> 0 1 1 1 1 0 0 0 0
> 1 0 0 1 0 0 0 0 0
> 1 0 1 1 1 1 1 0 1
> 1 1 0 0 0 1 0 1 1
> 1 1 1 0 0 1 1 0 1
>
>
> The change has been tested by TestMD5Intrinsics and TestMD5MultiBlockIntrinsics.
>
> The performance is measured on EC2 m6g instance (Graviton2) and shows 18-25% improvement.
> Baseline
>
> Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units
> MessageDigests.digest md5 64 DEFAULT thrpt 15 2989.149 ? 54.895 ops/ms
> MessageDigests.digest md5 16384 DEFAULT thrpt 15 24.927 ? 0.002 ops/ms
> MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 2433.184 ? 74.616 ops/ms
> MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 24.736 ? 0.002 ops/ms
>
> Optimized
>
> Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units
> MessageDigests.digest md5 64 DEFAULT thrpt 15 3719.214 ? 23.087 ops/ms
> MessageDigests.digest md5 16384 DEFAULT thrpt 15 31.280 ? 0.003 ops/ms
> MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 2874.308 ? 88.455 ops/ms
> MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 31.014 ? 0.060 ops/ms
@theRealAph Hi Andrew, may you help with reviewing this change?
-------------
PR: https://git.openjdk.org/jdk/pull/11748
More information about the hotspot-compiler-dev
mailing list