RFR(s): PPC64: Use andis instead of lis/and
Igor Henrique Soares Nunes
igor.nunes at eldorado.org.br
Thu Nov 24 20:36:07 UTC 2016
Hi all,
The following rev solves an improvement suggested by Gustavo Romero
(http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-October/024664.html):
https://bugs.openjdk.java.net/browse/JDK-8170328
The issue is:
Use andis. in place of following sequence.
845 1.2e-04 : 3fff6027e134: lis r18,7
36054 0.0052 : 3fff6027e138: and r18,r17,r18
This patch resulted in small improvements in the Opto Assembly dumped code.
See the explanation below:
Situation 1.1)
03c B3: # B7 B4 <- B2 Freq: 0.899982 03c LIS R15, #133955584.hi
040 AND R14, R3, R15
044 CMPW CCR6, R14, R15
048 Beq CCR6, B7 P=0.100000 C=-1.000000
Situation 1.2)
03c B3: # B7 B4 <- B2 Freq: 0.899982
03c ANDIS R15, R3, #133955584.hi
040 LIS R17, #133955584.hi
044 CMPW CCR5, R15, R17
048 Beq CCR5, B7 P=0.100000 C=-1.000000
Situation 2.1)
370 B91: # B392 B92 <- B90 Freq: 0.000197734
370 LIS R14, #251658240.hi
374 AND R15, R3, R14
378 LIS R17, #16777216.hi
37c CMPW CCR5, R15, R17
380 Beq CCR5, B392 P=0.100000 C=-1.000000
Situation 2.2)
370 B91: # B392 B92 <- B90 Freq: 0.000197734
370 ANDIS R15, R3, #251658240.hi
374 LIS R14, #16777216.hi
378 CMPW CCR6, R15, R14
37c Beq CCR6, B392 P=0.100000 C=-1.000000
In situations 1.1 and 2.1 the patch is not applied. In 1.2 and 2.2 the patch is applied.
Comparing 2.1 and 2.2 some performance gain is seen, as one less instruction is needed.
Comparing 1.1 and 1.2, no performance gain is seen. In 1.1 the value loaded in R15 is used in AND and CMPW (no reload).
In 1.2, the ANDIS operation is executed first, so that, no register reuse is made.
Att.,
Igor Nunes
More information about the ppc-aix-port-dev
mailing list