[16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64

Bernhard Urban-Forster beurba at microsoft.com
Tue Aug 11 20:23:50 UTC 2020


Hey Doug,

since I was curious I did a bit of digging. Here are my findings:

1. Graal is able to detect that it only needs to do the array bounds check once for all the 16 array accesses, as I expected.
2. Thus the generated code by Graal is almost as fast as the MD5 intrinsic.
3. The gap, from what I can tell, is that the SchedulePhase decides to put all the 16 FloatingReadNodes at the top of the basic block, and thus increasing register pressure and therefore ending up needing to spill on x86_64. It would be nice if the read access would be scheduled next to its usage in this case. I couldn't figure out how to do that, it has been a while since I've touched that code :-)

Here are some numbers plus the generated code of C2, the intrinsic and Graal:
https://gist.github.com/lewurm/3b874558d369fd56b3737e28f1616740

-Bernhard

________________________________________
From: Doug Simon <doug.simon at oracle.com>
Sent: Monday, August 10, 2020 15:38
To: Bernhard Urban-Forster
Cc: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64
Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64

Hi Bernhard,


On 10 Aug 2020, at 15:01, Bernhard Urban-Forster <beurba at microsoft.com<mailto:beurba at microsoft.com>> wrote:

Hey Doug,

replying on behalf for Ludovic, as he is on vacation :-)

Currently we are not planning to implement the intrinsic for Graal.

Schade ;-)

Also we didn't check the generated code by Graal. I believe it will do a better job eliminated array bounds checks, but I'm curious to learn how "compiling the relevant Java code without array bounds checks" works. Is something like that done for other methods already?

I don’t think we do that anywhere currently but I imagine it wouldn’t be hard to put the BytecodeParser into a mode whereby an array access generates a AccessIndexedNode that omits the bounds check (generated by org.graalvm.compiler.replacements.DefaultJavaLoweringProvider.getBoundsCheck).

-Doug


This is the relevant Java method for the MD5 intrinsic:
https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/733218137289d6a0eb705103ed7be30f1e68d17a/src/java.base/share/classes/sun/security/provider/MD5.java*L172__;Iw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV$<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2F733218137289d6a0eb705103ed7be30f1e68d17a%2Fsrc%2Fjava.base%2Fshare%2Fclasses%2Fsun%2Fsecurity%2Fprovider%2FMD5.java*L172__%3BIw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV%24&data=02%7C01%7Cbeurba%40microsoft.com%7C73f0bfe6e2b04b3b723f08d83d32bbe2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637326635414506507&sdata=zjIRJ0NvFOuSTXrhmNJbaPYqzCgZ3SOTLGDdo5B0cVk%3D&reserved=0>


-Bernhard

________________________________________
From: Doug Simon <doug.simon at oracle.com<mailto:doug.simon at oracle.com>>
Sent: Monday, August 10, 2020 11:55
To: Ludovic Henry
Cc: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev at openjdk.java.net<mailto:aarch64-port-dev at openjdk.java.net>; openjdk-aarch64
Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64

Hi Ludovic,

Are you considering also implementing this intrinsic in Graal?

Is the intrinsification purely about removing the array bounds checks? If so, it may be possible to have the Graal intrinsify the method by compiling the relevant Java code without array bounds checks.

-Doug

On 9 Aug 2020, at 05:19, Ludovic Henry <luhenry at microsoft.com<mailto:luhenry at microsoft.com>> wrote:

Hello,

Bug: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=C7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D&reserved=0__;JSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3$<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216%26data%3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034%26sdata%3DC7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D%26reserved%3D0__%3BJSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3%24&data=02%7C01%7Cbeurba%40microsoft.com%7C73f0bfe6e2b04b3b723f08d83d32bbe2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637326635414516501&sdata=ygTKduL7MP94XfsURzGptQR2dXaWVjaeRZaOQFDAxpc%3D&reserved=0>
Webrev: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=http:*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D&reserved=0__;JSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ$<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttp%3A*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00%26data%3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034%26sdata%3D0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D%26reserved%3D0__%3BJSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ%24&data=02%7C01%7Cbeurba%40microsoft.com%7C73f0bfe6e2b04b3b723f08d83d32bbe2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637326635414516501&sdata=4WTJV1GTOda5cssyQOIOeecPgo8IJ8HFNhuarv%2FXgkg%3D&reserved=0>

Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1

This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2):

-XX:-UseMD5Intrinsics
Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
MessageDigests.digest             md5        64     DEFAULT  thrpt   10  1616.238 ± 28.082  ops/ms
MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   215.030 ±  0.691  ops/ms
MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.228 ±  0.001  ops/ms

-XX:+UseMD5Intrinsics
Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
MessageDigests.digest             md5        64     DEFAULT  thrpt   10  2005.233 ± 40.513  ops/ms => 24% speedup
MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   275.979 ±  0.455  ops/ms => 28% speedup
MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.279 ±  0.001  ops/ms => 22% speedup

Thank you,
Ludovic

[1] https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN$<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902%26data%3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034%26sdata%3D5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D%26reserved%3D0__%3BJSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN%24&data=02%7C01%7Cbeurba%40microsoft.com%7C73f0bfe6e2b04b3b723f08d83d32bbe2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637326635414526495&sdata=gJPbg6l5kxrB79Z9CE0TIB9jjnamG7lGHp%2BZj%2Bbw73A%3D&reserved=0>



More information about the hotspot-compiler-dev mailing list