[16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
Doug Simon
doug.simon at oracle.com
Tue Aug 11 20:32:54 UTC 2020
Thanks for the digging and results Bernhard.
We’ve discussed making the SchedulePhase do latency-aware scheduling within blocks but haven’t done anything yet.
-Doug
> On 11 Aug 2020, at 22:23, Bernhard Urban-Forster <beurba at microsoft.com> wrote:
>
> Hey Doug,
>
> since I was curious I did a bit of digging. Here are my findings:
>
> 1. Graal is able to detect that it only needs to do the array bounds check once for all the 16 array accesses, as I expected.
> 2. Thus the generated code by Graal is almost as fast as the MD5 intrinsic.
> 3. The gap, from what I can tell, is that the SchedulePhase decides to put all the 16 FloatingReadNodes at the top of the basic block, and thus increasing register pressure and therefore ending up needing to spill on x86_64. It would be nice if the read access would be scheduled next to its usage in this case. I couldn't figure out how to do that, it has been a while since I've touched that code :-)
>
> Here are some numbers plus the generated code of C2, the intrinsic and Graal:
> https://urldefense.com/v3/__https://gist.github.com/lewurm/3b874558d369fd56b3737e28f1616740__;!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctRenoV51$
>
> -Bernhard
>
> ________________________________________
> From: Doug Simon <doug.simon at oracle.com>
> Sent: Monday, August 10, 2020 15:38
> To: Bernhard Urban-Forster
> Cc: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64
> Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
>
> Hi Bernhard,
>
>
> On 10 Aug 2020, at 15:01, Bernhard Urban-Forster <beurba at microsoft.com<mailto:beurba at microsoft.com>> wrote:
>
> Hey Doug,
>
> replying on behalf for Ludovic, as he is on vacation :-)
>
> Currently we are not planning to implement the intrinsic for Graal.
>
> Schade ;-)
>
> Also we didn't check the generated code by Graal. I believe it will do a better job eliminated array bounds checks, but I'm curious to learn how "compiling the relevant Java code without array bounds checks" works. Is something like that done for other methods already?
>
> I don’t think we do that anywhere currently but I imagine it wouldn’t be hard to put the BytecodeParser into a mode whereby an array access generates a AccessIndexedNode that omits the bounds check (generated by org.graalvm.compiler.replacements.DefaultJavaLoweringProvider.getBoundsCheck).
>
> -Doug
>
>
> This is the relevant Java method for the MD5 intrinsic:
> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/733218137289d6a0eb705103ed7be30f1e68d17a/src/java.base/share/classes/sun/security/provider/MD5.java*L172__;Iw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV$<https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2F733218137289d6a0eb705103ed7be30f1e68d17a*2Fsrc*2Fjava.base*2Fshare*2Fclasses*2Fsun*2Fsecurity*2Fprovider*2FMD5.java*L172__*3BIw!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6ijVLDV*24&data=02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414506507&sdata=zjIRJ0NvFOuSTXrhmNJbaPYqzCgZ3SOTLGDdo5B0cVk*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUqJSUlJSUlJSUlJSU!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctRvXp_Fn$ >
>
>
> -Bernhard
>
> ________________________________________
> From: Doug Simon <doug.simon at oracle.com<mailto:doug.simon at oracle.com>>
> Sent: Monday, August 10, 2020 11:55
> To: Ludovic Henry
> Cc: hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>; aarch64-port-dev at openjdk.java.net<mailto:aarch64-port-dev at openjdk.java.net>; openjdk-aarch64
> Subject: Re: [16] RFR[S]: 8251216: Implement MD5 intrinsics on AArch64
>
> Hi Ludovic,
>
> Are you considering also implementing this intrinsic in Graal?
>
> Is the intrinsification purely about removing the array bounds checks? If so, it may be possible to have the Graal intrinsify the method by compiling the relevant Java code without array bounds checks.
>
> -Doug
>
> On 9 Aug 2020, at 05:19, Ludovic Henry <luhenry at microsoft.com<mailto:luhenry at microsoft.com>> wrote:
>
> Hello,
>
> Bug: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=C7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D&reserved=0__;JSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3$<https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8251216*26data*3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034*26sdata*3DC7Bi8BTsmtR3HFgWgYTw7jww63BcHGutNXE8o9x2bdY*3D*26reserved*3D0__*3BJSUlJSUlJSUlJSUlJSU!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E97IPBA3*24&data=02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414516501&sdata=ygTKduL7MP94XfsURzGptQR2dXaWVjaeRZaOQFDAxpc*3D&reserved=0__;JSUlJSUlJSUlJSUqKioqKiUlKioqKioqKiolJSolJSUlJSUlJSUlJSUl!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctewJOrwe$ >
> Webrev: https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=http:*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D&reserved=0__;JSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ$<https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttp*3A*2F*2Fcr.openjdk.java.net*2F*luhenry*2F8251216*2Fwebrev.00*26data*3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034*26sdata*3D0CZOMfpmtPZiy64za8NYYpVjCdawmjGacEOc3WfADDA*3D*26reserved*3D0__*3BJSUlfiUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E84nlzLJ*24&data=02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414516501&sdata=4WTJV1GTOda5cssyQOIOeecPgo8IJ8HFNhuarv*2FXgkg*3D&reserved=0__;JSUlJSUlJSUlJSUlKioqKioqJSUqKioqKioqKiUlKiUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctd8I7Gm1$ >
>
> Testing: Linux-AArch64, fastdebug, test/hotspot/jtreg/compiler/intrinsics/sha/ test/hotspot/jtreg:tier1 test/jdk:tier1
>
> This patch implements the MD5 intrinsic on AArch64 following its implementation on x86 [1]. The performance improvements are the following (on Linux-AArch64 on a Marvell TX2):
>
> -XX:-UseMD5Intrinsics
> Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units
> MessageDigests.digest md5 64 DEFAULT thrpt 10 1616.238 ± 28.082 ops/ms
> MessageDigests.digest md5 1024 DEFAULT thrpt 10 215.030 ± 0.691 ops/ms
> MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.228 ± 0.001 ops/ms
>
> -XX:+UseMD5Intrinsics
> Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units
> MessageDigests.digest md5 64 DEFAULT thrpt 10 2005.233 ± 40.513 ops/ms => 24% speedup
> MessageDigests.digest md5 1024 DEFAULT thrpt 10 275.979 ± 0.455 ops/ms => 28% speedup
> MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.279 ± 0.001 ops/ms => 22% speedup
>
> Thank you,
> Ludovic
>
> [1] https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902&data=02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034&sdata=5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN$<https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fbugs.openjdk.java.net*2Fbrowse*2FJDK-8250902*26data*3D02*7C01*7Cbeurba*40microsoft.com*7C087d5d80f9484f13ddcc08d83d138f3a*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326501506459034*26sdata*3D5KcoG5n10rnVMU9y8L076jpCoEd0NBzNqr*2F8M5ghO3c*3D*26reserved*3D0__*3BJSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!JeGQSBZgTB8CIzN7-UVXxlNivNOxJk8QFqhCQ1eJZaNvYHYqSf2gkNv2E6SPJBTN*24&data=02*7C01*7Cbeurba*40microsoft.com*7C73f0bfe6e2b04b3b723f08d83d32bbe2*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637326635414526495&sdata=gJPbg6l5kxrB79Z9CE0TIB9jjnamG7lGHp*2BZj*2Bbw73A*3D&reserved=0__;JSUlJSUlJSUlJSUqKioqKiUlKioqKioqKiolJSoqJSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!L6XXaWxHy6hbPWSWzBRyX9XuZtH1g0pfzTBa7gBrTWM3Fd7snIsiUwYctQ6f-i3M$ >
>
More information about the hotspot-compiler-dev
mailing list