[AD file] Optimize emitted code by matching complex IR patterns

Gustavo Serra Scalet gustavo.scalet at eldorado.org.br
Mon Oct 31 19:35:08 UTC 2016


Hi Goetz,

> -----Original Message-----
> From: Lindenmaier, Goetz [mailto:goetz.lindenmaier at sap.com]
> Sent: sexta-feira, 28 de outubro de 2016 11:42
> To: Gustavo Serra Scalet <gustavo.scalet at eldorado.org.br>; hotspot-
> compiler-dev at openjdk.java.net
> Subject: RE: [AD file] Optimize emitted code by matching complex IR
> patterns
> 
> But matching LoadB and AddP in your picture will probably fail.
> 
> The matcher only matches trees or DAGs. Here the problem is that
> 
> the AddP has 3 more outs. 

Ok, then it seems like no go for this optimization. I see other situations that I'd optimize but all of them have more outs.

Thanks for pointing that out.

> I think there are some special cases, but
> 
> don't remember in detail.  We overruled this once for DecodeN, but
> 
> it was not a good idea because it increases register pressure (you will
> hold
> 
> the narrow oop and the oop in two registers at the same time although
> they
> 
> have the exact same bit pattern in 32-bit cOops mode.)
> 
> 
> 
> 
> 
> You can avoid the nop by just leaving out the size(8) line. It then
> assumes
> 
> varying size and checks on every emit.
> 
> But maybe you can avoid the 'if' if you already check for constant '0'
> in
> 
> the predicate of the operand.
> 
> 
> 
> Best regards,
> 
>   Goetz.
> 
> 
> 
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-
> bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet
> Sent: Donnerstag, 27. Oktober 2016 21:35
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: [AD file] Optimize emitted code by matching complex IR patterns
> 
> 
> 
> Hi,
> 
> 
> 
> I wanted to match this set of selected nodes:
> 
> 
> 
> 
> 
> Which basically is, for ppc, doing an "add" followed by a "lbz".
> However, if the memory pointer of LoadB has a displacement of zero, it
> can be done with a single "lbzx" instruction.
> 
> 
> 
> As I understood, I could manage doing it with a new instruction just
> like Igor did for cmpldi[1], but I don't see it working:
> 
> instruct loadUB_indexed(iRegIdst dst, indirectMemory src1, iRegLsrc
> src2) %{
> 
>   // match-rule
> 
>   match(Set dst (LoadB (AddP src1 src2)));
> 
>   predicate(n->as_Load()->is_unordered() || followed_by_acquire(n));
> 
>   // Hint that lbzx is cheaper than add + lbz
> 
>   ins_cost(MEMORY_REF_COST_LOW);
> 
>   format %{ "LBZX     $dst, $src1, $src2" %}
> 
>   size(8);
> 
>   ins_encode %{
> 
>     int Idisp = $src1$$disp + frame_slots_bias($src1$$base, ra_);
> 
>     if (Idisp) {
> 
>       __ add($dst$$Register, $src1$$base$$Register, $src2$$Register);
> 
>       __ lbz($dst$$Register, Idisp, $dst$$Register);
> 
>     } else {
> 
>       __ lbzx($dst$$Register, $src1$$base$$Register, $src2$$Register);
> 
>       __ nop();
> 
>     }
> 
>   %}
> 
>   ins_pipe(pipe_class_memory);
> 
> %}
> 
> 
> 
> 
> 
> Notes:
> 
> 1)  I would probably pack some of the ins_encode instructions to be used
> as expand. I left it there so it's easier to read.
> 
> 2)  Most of the code came from loadB_indirect_Ex so I expanded it to
> match a previous add.
> 
> 3)  The nop can probably be avoided somehow. I'd take a look once this
> feature actually works.
> 
> 
> 
> Well, it compiles, but then when I run javac:
> 
> o45     LoadB   === _ o7 o44  [[o56 o46  8 ]]
> @java/lang/String:exact+20 *, name=coder, idx=4; #byte
> 
> mach:
> 
> 12     loadConL16      === _  [[ 11 ]]   [6600012]
> 
> o10     Parm    === o3  [[o78 o72 o44 o44 o78  4  11 ]] Parm0:
> java/lang/String:NotNull:exact *  Oop:java/lang/String:NotNull:exact *
> 
> o7      Parm    === o3  [[o161 o175 o79 o72 o119 o168 o98 o45  4  11 ]]
> Memory  Memory: @BotPTR *+bot, idx=Bot;
> 
> 11     loadUB_indexed  === _ o7 o10  12  [[]]
> 
> # To suppress the following error report, specify this argument
> 
> # after -XX: or in .hotspotrc:  SuppressErrorAt=/matcher.cpp:1694
> 
> #
> 
> # A fatal error has been detected by the Java Runtime Environment:
> 
> #
> 
> #  Internal Error (/home/gut/hs-
> comp/hotspot/src/share/vm/opto/matcher.cpp:1694), pid=1649, tid=1734
> 
> #  assert(m->adr_type() == mach_at) failed: matcher should not change
> adr type
> 
> #
> 
> # JRE version: OpenJDK Runtime Environment (9.0) (slowdebug build 9-
> internal+0-2016-10-14-173706.gut.hs-comp)
> 
> # Java VM: OpenJDK 64-Bit Server VM (slowdebug 9-internal+0-2016-10-14-
> 173706.gut.hs-comp, mixed mode, tiered, compressed oops, g1 gc, linux-
> ppc64le)
> 
> # No core dump will be written. Core dumps have been disabled. To enable
> core dumping, try "ulimit -c unlimited" before starting Java again
> 
> #
> 
> # An error report file with more information is saved as:
> 
> # /home/gut/hs-comp/hs_err_pid1649.log
> 
> o34     LoadB   === _ o7 o33  [[o45 ]]  @java/lang/String:exact+20 *,
> name=coder, idx=4; #byte
> 
> mach:
> 
> 11     loadConL16      === _  [[ 10 ]]   [6900011]
> 
> o10     Parm    === o3  [[o33 o33  10 ]] Parm0:
> java/lang/String:NotNull:exact *  Oop:java/lang/String:NotNull:exact *
> 
> o7      Parm    === o3  [[o48 o34  10 ]] Memory  Memory: @BotPTR *+bot,
> idx=Bot;
> 
> 10     loadUB_indexed  === _ o7 o10  11  [[]]
> 
> [thread 1727 also had an error]
> 
> #
> 
> # Compiler replay data is saved as:
> 
> # /home/gut/hs-comp/replay_pid1649.log
> 
> #
> 
> # If you would like to submit a bug report, please visit:
> 
> #   http://bugreport.java.com/bugreport/crash.jsp
> <http://bugreport.java.com/bugreport/crash.jsp>
> 
> #
> 
> Current thread is 1734
> 
> Dumping core ...
> 
> Aborted
> 
> 
> 
> 
> 
> Even after investigating that assert (which checks for m-
> >in(MemNode::Address)->is_DecodeNarrowPtr()), I didn't quite understand
> it. If I add on my match rule a EncodeP/DecodeN between LoadB and AddP
> to satisfy the NarrowPtr check, it simply doesn't match anything so my
> expectation is that it's correct and matching.
> 
> 
> 
> Could anybody please point out what I missed?
> 
> 
> 
> I also wanted to ask if the other linked nodes of, e.g "339 Addp", is a
> concern? As the IdealGraphVisualizer points out, it doesn't have only
> the "357 LoadB" node attached to it and I wonder what it'd do if this
> new loadUB_indexed instruction was working.
> 
> 
> 
> 
> 
> Thanks in advance,
> 
> Gustavo Serra Scalet
> 
> 
> 
> References:
> 
> [1] http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-
> October/002713.html <http://mail.openjdk.java.net/pipermail/ppc-aix-
> port-dev/2016-October/002713.html>
> 
> 



More information about the hotspot-compiler-dev mailing list