RFR(m) 8170094: PPC64: Keep immediate value 0 cached into a register to improve performance
Doerr, Martin
martin.doerr at sap.com
Wed Nov 23 11:33:31 UTC 2016
Hi Gustavo,
thank you very much for making these experiments.
We already had the same idea a long time ago and we had run a lot of benchmarks. Our conclusion was that not spending a dedicated register for the constant 0 was slightly better.
The key difference between our old implementation and your one seems to be that we also modified the INTPRESSURE.
Giving C2's register allocator less registers without decreasing INTPRESSURE may impact register allocation time and increase the risk of getting non-compilable methods because the register allocator is giving up.
We had compared "zero_reg with INTPRESSURE-1" against "no zero_reg with INTPRESSURE".
In general, I agree with you that there is opportunity to reduce reloading of 0. Unfortunately, I can't see how the load zero nodes were generated which have led to the code which you had posted in
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-October/024664.html
Maybe C2 could be improved for that. Or a peephole optimizer could help.
Thanks and best regards,
Martin
-----Original Message-----
From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet
Sent: Dienstag, 22. November 2016 20:23
To: hotspot-compiler-dev at openjdk.java.net
Cc: ppc-aix-port-dev at openjdk.java.net
Subject: RE: RFR(m) 8170094: PPC64: Keep immediate value 0 cached into a register to improve performance
CC'ing the ppc-aix-port-dev list.
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-
> bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet
> Sent: terça-feira, 22 de novembro de 2016 15:11
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: RFR(m) 8170094: PPC64: Keep immediate value 0 cached into a
> register to improve performance
>
> Hi,
>
> I was concerned with an optimization marked as "Hoist CSE of constant
> load"[1] as I found zeroes being stored in the memory by using a new
> reg every time.
>
> To address this issue, I reserved one of the many registers PPC64 has
> (on C1 and C2 compilers) to cache zero and then the zero can be stored
> at once with a single store. (I didn't notice a performance drop
> simply by removing 1 register from the register bank)
>
> I notice small performance gains (on average around 1%) for some
> microbenchmarks but no noticeable change on SPECjvm2008 score. Maybe
> there is still some room for improvement by using this approach?
>
> Please review if I built a correct solution:
>
> Bug and webrev for this change:
> https://bugs.openjdk.java.net/browse/JDK-8170094
> https://gut.github.io/openjdk/webrev/JDK-8170094/webrev/index.html
>
> Thanks in advance,
> Gustavo Serra Scalet
>
> References:
> [1]
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-
> October/024664.html
More information about the ppc-aix-port-dev
mailing list