Removing redundant addis's
Bruno Alexandre Rosa
bruno.rosa at eldorado.org.br
Mon Oct 10 21:03:47 UTC 2016
Hello, Goetz,
Thanks for the quick answer.
The places where addis is emitted are mostly related to the TOC.
I tried calling mr_if_needed() instead or issuing a nop, but that seems to break
later checks that rely on knowing the exact instructions of a TOC load.
However, I also tried forcefully NOT emiting the redundant addis and it worked
fine: I was able to inspect the jitted code with both -XX:+PrintOptoAssembly
and using perf inject. (I ran a pi digits microbenchmark for 10k digits).
I'm still getting familiar with the code and would welcome any help to
understand why the latter approached worked and why the former failed.
Here is the patch with the attempt that worked:
diff -r b2b2ec149a24 src/cpu/ppc/vm/ppc.ad
--- a/src/cpu/ppc/vm/ppc.ad Mon Oct 03 19:09:26 2016 +0000
+++ b/src/cpu/ppc/vm/ppc.ad Mon Oct 10 19:32:12 2016 -0300
@@ -2468,7 +2468,10 @@
((loadConL_hiNode*)this)->_cbuf_insts_offset = __ offset();
}
- __ addis($dst$$Register, $toc$$Register, MacroAssembler::largeoffset_si16_si16_hi(_const_toc_offset));
+ int offset = MacroAssembler::largeoffset_si16_si16_hi(_const_toc_offset);
+ if ($dst$$Register != $toc$$Register || offset != 0) {
+ __ addis($dst$$Register, $toc$$Register, offset);
+ }
%}
%} // encode
@@ -2637,7 +2640,10 @@
((loadConP_hiNode*)this)->_const_toc_offset = toc_offset;
}
- __ addis($dst$$Register, $toc$$Register, MacroAssembler::largeoffset_si16_si16_hi(_const_toc_offset));
+ int offset = MacroAssembler::largeoffset_si16_si16_hi(_const_toc_offset);
+ if ($dst$$Register != $toc$$Register || offset != 0) {
+ __ addis($dst$$Register, $toc$$Register, offset);
+ }
%}
// Postalloc expand emitter for loading a ptr constant from the method's TOC.
@@ -6117,9 +6123,13 @@
int hi = (offset + (1<<15))>>16;
int lo = offset - hi * (1<<16);
- __ addis(Rtoc, Rtoc, hi);
+ if (hi != 0) {
+ __ addis(Rtoc, Rtoc, hi);
+ }
__ lfs(Rdst, lo, Rtoc);
- __ addis(Rtoc, Rtoc, -hi);
+ if (hi != 0) {
+ __ addis(Rtoc, Rtoc, -hi);
+ }
%}
ins_pipe(pipe_class_memory);
%}
@@ -6182,9 +6192,13 @@
int hi = (offset + (1<<15))>>16;
int lo = offset - hi * (1<<16);
- __ addis(Rtoc, Rtoc, hi);
+ if (hi != 0) {
+ __ addis(Rtoc, Rtoc, hi);
+ }
__ lfd(Rdst, lo, Rtoc);
- __ addis(Rtoc, Rtoc, -hi);
+ if (hi != 0) {
+ __ addis(Rtoc, Rtoc, -hi);
+ }
%}
ins_pipe(pipe_class_memory);
%}
@@ -8420,7 +8434,10 @@
size(4);
ins_encode %{
// TODO: PPC port $archOpcode(ppc64Opcode_addis);
- __ addis($dst$$Register, $src1$$Register, ($src2$$constant)>>16);
+ int offset = ($src2$$constant)>>16;
+ if ($dst$$Register != $src1$$Register || offset != 0) {
+ __ addis($dst$$Register, $src1$$Register, offset);
+ }
%}
ins_pipe(pipe_class_default);
%}
@@ -8499,7 +8516,10 @@
size(4);
ins_encode %{
// TODO: PPC port $archOpcode(ppc64Opcode_addis);
- __ addis($dst$$Register, $src1$$Register, ($src2$$constant)>>16);
+ int offset = ($src2$$constant)>>16;
+ if ($dst$$Register != $src1$$Register || offset != 0) {
+ __ addis($dst$$Register, $src1$$Register, offset);
+ }
%}
ins_pipe(pipe_class_default);
%}
@@ -8539,7 +8559,10 @@
size(4);
ins_encode %{
// TODO: PPC port $archOpcode(ppc64Opcode_addis);
- __ addis($dst$$Register, $src1$$Register, ($src2$$constant)>>16);
+ int offset = ($src2$$constant)>>16;
+ if ($dst$$Register != $src1$$Register || offset != 0) {
+ __ addis($dst$$Register, $src1$$Register, offset);
+ }
%}
ins_pipe(pipe_class_default);
%}
Regards,
Bruno Rosa
-----Original Message-----
From: Lindenmaier, Goetz [mailto:goetz.lindenmaier at sap.com]
Sent: segunda-feira, 10 de outubro de 2016 06:08
To: Bruno Alexandre Rosa <bruno.rosa at eldorado.org.br>; ppc-aix-port-dev at openjdk.java.net
Subject: RE: Removing redundant addis's
Hi Bruno,
the emitters in the assembler_<platform>.* files are meant to be pure emitters without optimizations. Callers can rely that the instruction is emitted, which might be important for effects like code alignment or instruction scheduling.
So you should not change them.
If this is not an accidential occurance you should identify the place where this is emitted and call mr_if_needed() instead, or issue a nop.
Best regards,
Goetz.
> -----Original Message-----
> From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-
> bounces at openjdk.java.net] On Behalf Of Bruno Alexandre Rosa
> Sent: Freitag, 7. Oktober 2016 22:53
> To: ppc-aix-port-dev at openjdk.java.net
> Subject: Removing redundant addis's
>
> Hello, everyone,
>
>
>
> I've been taking a look at some performance optmization opportunities
> in the
>
> ppc64-specific HotSpot code and stumbled upon some oddities in the
> jitted code
>
> like:
>
>
>
> addis r20, r20, 0
>
>
>
> I went ahead and modified the Assembler not to emit an addis when the
> operands
>
> are the same and the offset is 0.
>
>
>
> I tried adding similar conditions on every "__ addis" call on ppc.ad,
> but
>
> thought this kind of solution is inelegant and harder to maintain.
> Then I went
>
> for a simpler solution.
>
>
>
> Here is the diff:
>
>
>
> diff -r b89f80ea707c src/cpu/ppc/vm/assembler_ppc.inline.hpp
>
> --- a/src/cpu/ppc/vm/assembler_ppc.inline.hpp Fri Oct 07 16:58:57 2016 -
> 0300
>
> +++ b/src/cpu/ppc/vm/assembler_ppc.inline.hpp Fri Oct 07 19:36:17 2016 -
> 0300
>
> @@ -82,7 +82,7 @@
>
>
>
> // PPC 1, section 3.3.8, Fixed-Point Arithmetic Instructions
>
> inline void Assembler::addi( Register d, Register a, int si16) { assert(a != R0,
> "r0 not allowed"); addi_r0ok( d, a, si16); }
>
> -inline void Assembler::addis( Register d, Register a, int si16) { assert(a !=
> R0, "r0 not allowed"); addis_r0ok(d, a, si16); }
>
> +inline void Assembler::addis( Register d, Register a, int si16) { assert(a !=
> R0, "r0 not allowed"); if (d != a || si16 != 0) addis_r0ok(d, a,
> si16); }
>
> inline void Assembler::addi_r0ok(Register d,Register a,int si16) {
> emit_int32(ADDI_OPCODE | rt(d) | ra(a) | simm(si16, 16)); }
>
> inline void Assembler::addis_r0ok(Register d,Register a,int si16) {
> emit_int32(ADDIS_OPCODE | rt(d) | ra(a) | simm(si16, 16)); }
>
> inline void Assembler::addic_( Register d, Register a, int si16) {
> emit_int32(ADDIC__OPCODE | rt(d) | ra(a) | simm(si16, 16)); }
>
> However, I'm concerned this modification might be too intrusive. So I
> came here
>
> to ask opinions on that.
>
> Regards,
>
> Bruno Rosa
More information about the ppc-aix-port-dev
mailing list