RFR[M]: Adding MD5 Intrinsic on x86-64
Ludovic Henry
luhenry at microsoft.com
Fri Jul 31 21:27:25 UTC 2020
Hi Vivek,
Thank you for your review.
> You have not added the stub generation for 32 bit.
> Did you also test with a 32 bit build?
I've added and tested it.
Webrev: http://cr.openjdk.java.net/~luhenry/md5-intrinsics/webrev.01
--
Ludovic
________________________________________
From: Vivek Deshpande <mailto:viv.desh at gmail.com>
Sent: Thursday, July 30, 2020 9:17:21 PM
To: Ludovic Henry <mailto:luhenry at microsoft.com>
Cc: Dean Long <mailto:dean.long at oracle.com>; Vladimir Ivanov <mailto:vladimir.x.ivanov at oracle.com>; mailto:hotspot-compiler-dev at openjdk.java.net <mailto:hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR[M]: Adding MD5 Intrinsic on x86-64
Hi Ludovic
Your patch looks good to me. Good reuse of existing code for SHA.
You have not added the stub generation for 32 bit.
Did you also test with a 32 bit build?
Thank you.
Regards,
Vivek
On Thu, Jul 30, 2020 at 6:26 PM Ludovic Henry <mailto:luhenry at microsoft.com> wrote:
JBS: I just got authorship status and I'll create a bug as soon as I have access to JBS
Webrev: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~luhenry%2Fmd5-intrinsics%2Fwebrev.00%2F&data=02%7C01%7Cluhenry%40microsoft.com%7C3326ebd9a7874a11b12508d83508a682%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637317658573667275&sdata=Lin4kFKrxpkZWkicMPjDaJf9JhhRECzwsS%2B7FEtWyks%3D&reserved=0
The problem ended up not being with how `ofs` was incremented, but with a callee-saved register not being restored properly before returning from the intrinsic.
The performance results from running with JMH are very encouraging. I ran the `org.openjdk.bench.java.security.MessageDigests` with MD5 only enabled, and following are the results with and without the intrinsic.
-XX:-UseMD5Intrinsics
Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units
MessageDigests.digest md5 64 DEFAULT thrpt 10 3459.747 ± 10.508 ops/ms
MessageDigests.digest md5 1024 DEFAULT thrpt 10 446.407 ± 3.383 ops/ms
MessageDigests.digest md5 16384 DEFAULT thrpt 10 30.685 ± 0.676 ops/ms
MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.483 ± 0.004 ops/ms
-XX:+UseMD5Intrinsics
Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units
MessageDigests.digest md5 64 DEFAULT thrpt 10 4011.556 ± 10.212 ops/ms
MessageDigests.digest md5 1024 DEFAULT thrpt 10 526.873 ± 2.101 ops/ms
MessageDigests.digest md5 16384 DEFAULT thrpt 10 35.012 ± 0.088 ops/ms
MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.573 ± 0.002 ops/ms
That's overall a jump from ~483MB/s to ~573MB/s on the 1M chunks, or a ~19% speedup.
Thank you,
Ludovic
--
Thanks and Regards,
Vivek Deshpande
mailto:viv.desh at gmail.com
More information about the hotspot-compiler-dev
mailing list