[16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64

Doug Simon doug.simon at oracle.com
Tue Aug 11 20:32:54 UTC 2020


Thanks for the digging and results Bernhard.

We’ve discussed making the SchedulePhase do latency-aware scheduling within blocks but haven’t done anything yet.

-Doug

> On 11 Aug 2020, at 22:23, Bernhard Urban-Forster <beurba at microsoft.com> wrote:
> 
> Hey Doug,
> 
> since I was curious I did a bit of digging. Here are my findings:
> 
> 1. Graal is able to detect that it only needs to do the array bounds check once for all the 16 array accesses, as I expected.
> 2. Thus the generated code by Graal is almost as fast as the MD5 intrinsic.
> 3. The gap, from what I can tell, is that the SchedulePhase decides to put all the 16 FloatingReadNodes at the top of the basic block, and thus increasing register pressure and therefore ending up needing to spill on x86_64. It would be nice if the read access would be scheduled next to its usage in this case. I couldn't figure out how to do that, it has been a while since I've touched that code :-)
> 
> Here are some numbers plus the generated code of C2, the intrinsic and Graal:
> https://urldefense.com/v3/__https://gist.github.com/lewurm/3b874558d369fd56b3737e28f1616740__;!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctRenoV51$ 
> 
> -Bernhard
> 
> ________________________________________
> From: Doug Simon <doug.simon at oracle.com>
> Sent: Monday, August 10, 2020 15:38
> To: Bernhard Urban-Forster
> Cc: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64
> Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
> 
> Hi Bernhard,
> 
> 
> On 10 Aug 2020, at 15:01, Bernhard Urban-Forster <beurba at microsoft.com<mailto:beurba at microsoft.com>> wrote:
> 
> Hey Doug,
> 
> replying on behalf for Ludovic, as he is on vacation :-)
> 
> Currently we are not planning to implement the intrinsic for Graal.
> 
> Schade ;-)
> 
> Also we didn't check the generated code by Graal. I believe it will do a better job eliminated array bounds checks, but I'm curious to learn how "compiling the relevant Java code without array bounds checks" works. Is something like that done for other methods already?
> 
> I don’t think we do that anywhere currently but I imagine it wouldn’t be hard to put the BytecodeParser into a mode whereby an array access generates a AccessIndexedNode that omits the bounds check (generated by org.graalvm.compiler.replacements.DefaultJavaLoweringProvider.getBoundsCheck).
> 
> -Doug
> 
> 
> This is the relevant Java method for the MD5 intrinsic:
> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/733218137289d6a0eb705103ed7be30f1e68d17a/src/java.base/share/classes/sun/security/provider/MD5.java*L172__;Iw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV$<https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2F733218137289d6a0eb705103ed7be30f1e68d17a*2Fsrc*2Fjava.base*2Fshare*2Fclasses*2Fsun*2Fsecurity*2Fprovider*2FMD5.java*L172__*3BIw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV*24&data=02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414506507&sdata=zjIRJ0NvFOuSTXrhmNJbaPYqzCgZ3SOTLGDdo5B0cVk*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUqJSUlJSUlJSUlJSU!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctRvXp_Fn$ >
> 
> 
> -Bernhard
> 
> ________________________________________
> From: Doug Simon <doug.simon at oracle.com<mailto:doug.simon at oracle.com>>
> Sent: Monday, August 10, 2020 11:55
> To: Ludovic Henry
> Cc: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev at openjdk.java.net<mailto:aarch64-port-dev at openjdk.java.net>; openjdk-aarch64
> Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
> 
> Hi Ludovic,
> 
> Are you considering also implementing this intrinsic in Graal?
> 
> Is the intrinsification purely about removing the array bounds checks? If so, it may be possible to have the Graal intrinsify the method by compiling the relevant Java code without array bounds checks.
> 
> -Doug
> 
> On 9 Aug 2020, at 05:19, Ludovic Henry <luhenry at microsoft.com<mailto:luhenry at microsoft.com>> wrote:
> 
> Hello,
> 
> Bug: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=C7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D&reserved=0__;JSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3$<https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216*26data*3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034*26sdata*3DC7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D*26reserved*3D0__*3BJSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3*24&data=02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414516501&sdata=ygTKduL7MP94XfsURzGptQR2dXaWVjaeRZaOQFDAxpc*3D&reserved=0__;JSUlJSUlJSUlJSUqKioqKiUlKioqKioqKiolJSolJSUlJSUlJSUlJSUl!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctewJOrwe$ >
> Webrev: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=http:*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D&reserved=0__;JSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ$<https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttp*3A*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00*26data*3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034*26sdata*3D0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D*26reserved*3D0__*3BJSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ*24&data=02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414516501&sdata=4WTJV1GTOda5cssyQOIOeecPgo8IJ8HFNhuarv*2FXgkg*3D&reserved=0__;JSUlJSUlJSUlJSUlKioqKioqJSUqKioqKioqKiUlKiUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctd8I7Gm1$ >
> 
> Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1
> 
> This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2):
> 
> -XX:-UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  1616.238 ± 28.082  ops/ms
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   215.030 ±  0.691  ops/ms
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.228 ±  0.001  ops/ms
> 
> -XX:+UseMD5Intrinsics
> Benchmark              (digesterName)  (length)  (provider)   Mode  Cnt     Score    Error   Units
> MessageDigests.digest             md5        64     DEFAULT  thrpt   10  2005.233 ± 40.513  ops/ms => 24% speedup
> MessageDigests.digest             md5      1024     DEFAULT  thrpt   10   275.979 ±  0.455  ops/ms => 28% speedup
> MessageDigests.digest             md5   1048576     DEFAULT  thrpt   10     0.279 ±  0.001  ops/ms => 22% speedup
> 
> Thank you,
> Ludovic
> 
> [1] https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN$<https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902*26data*3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034*26sdata*3D5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D*26reserved*3D0__*3BJSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN*24&data=02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414526495&sdata=gJPbg6l5kxrB79Z9CE0TIB9jjnamG7lGHp*2BZj*2Bbw73A*3D&reserved=0__;JSUlJSUlJSUlJSUqKioqKiUlKioqKioqKiolJSoqJSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctQ6f-i3M$ >
> 



More information about the hotspot-compiler-dev mailing list