[10] RFR(M): 8185976: PPC64: Implement MulAdd and SquareToLen intrinsics
Doerr, Martin
martin.doerr at sap.com
Fri Aug 11 14:14:21 UTC 2017
Hi Gustavo,
thanks for the webrev and for sharing performance numbers. I've found a couple of things which should get addressed.
First of all, C2 does not perform sign extend when calling stubs. The int parms need to get zero/sign extended. (Could even be done without extra instructions by replacing sldi -> rldicl, cmpdi -> extsw_ in some cases.)
macroAssembler_ppc.cpp:
- Indentation should be 2 spaces.
stubGenerator_ppc:cpp:
- or_, addi_ should get replaced by orr, addi when CR0 result is not needed.
- Where is lplw initialized?
- I believe that the updating load/store instructions e.g. lwzu don't perform well on some processors. At least using stwu 2 times in the loop doesn't make sense.
- Note: It should be possible to use 8 byte instead of 4 byte instructions: MacroAssembler::multiply64, addc, adde. But I'm not requesting to change that because I guess it would make the code very complicated, especially when supporting both endianess versions.
- The squareToLen stub implementation is very close the Java implementation. So it'd be interesting to understand what C2 doesn't do as well as the hand written assembly code. Do you know that? (Not absolutely necessary for accepting this change as long as the stub is measurably faster.)
Best regards,
Martin
-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet
Sent: Donnerstag, 10. August 2017 19:22
To: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
Subject: FW: [10] RFR(M): 8185976: PPC64: Implement MulAdd and SquareToLen intrinsics
-----Original Message-----
From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet
Sent: terça-feira, 8 de agosto de 2017 17:19
To: ppc-aix-port-dev at openjdk.java.net
Subject: [10] RFR(M): 8185976: PPC64: Implement MulAdd and SquareToLen intrinsics
Hi,
Could you please review this specific PPC64 change to hotspot? By implementing these intrinsics I noticed a small improvement with microbenchmarks analysis. On SpecJVM2008's crypto.rsa benchmark, only when backporting to JDK8 an improvement was noticed.
JBS: https://bugs.openjdk.java.net/browse/JDK-8185976
Webrev: https://gut.github.io/openjdk/webrev/JDK-8185976/webrev/
Motivation for this implementation: https://twitter.com/ijuma/status/698309312498835457
Best regards,
Gustavo Serra Scalet
More information about the hotspot-compiler-dev
mailing list