[AD file] Optimize emitted code by matching complex IR patterns

Gustavo Serra Scalet gustavo.scalet at eldorado.org.br
Thu Oct 27 19:34:31 UTC 2016


Hi,



I wanted to match this set of selected nodes:

[cid:image001.png at 01D23076.1F508310]



Which basically is, for ppc, doing an "add" followed by a "lbz". However, if the memory pointer of LoadB has a displacement of zero, it can be done with a single "lbzx" instruction.



As I understood, I could manage doing it with a new instruction just like Igor did for cmpldi[1], but I don't see it working:

instruct loadUB_indexed(iRegIdst dst, indirectMemory src1, iRegLsrc src2) %{

  // match-rule

  match(Set dst (LoadB (AddP src1 src2)));

  predicate(n->as_Load()->is_unordered() || followed_by_acquire(n));

  // Hint that lbzx is cheaper than add + lbz

  ins_cost(MEMORY_REF_COST_LOW);

  format %{ "LBZX     $dst, $src1, $src2" %}

  size(8);

  ins_encode %{

    int Idisp = $src1$$disp + frame_slots_bias($src1$$base, ra_);

    if (Idisp) {

      __ add($dst$$Register, $src1$$base$$Register, $src2$$Register);

      __ lbz($dst$$Register, Idisp, $dst$$Register);

    } else {

      __ lbzx($dst$$Register, $src1$$base$$Register, $src2$$Register);

      __ nop();

    }

  %}

  ins_pipe(pipe_class_memory);

%}





Notes:

1)  I would probably pack some of the ins_encode instructions to be used as expand. I left it there so it's easier to read.

2)  Most of the code came from loadB_indirect_Ex so I expanded it to match a previous add.

3)  The nop can probably be avoided somehow. I'd take a look once this feature actually works.



Well, it compiles, but then when I run javac:

o45     LoadB   === _ o7 o44  [[o56 o46  8 ]]  @java/lang/String:exact+20 *, name=coder, idx=4; #byte

mach:

12     loadConL16      === _  [[ 11 ]]   [6600012]

o10     Parm    === o3  [[o78 o72 o44 o44 o78  4  11 ]] Parm0: java/lang/String:NotNull:exact *  Oop:java/lang/String:NotNull:exact *

o7      Parm    === o3  [[o161 o175 o79 o72 o119 o168 o98 o45  4  11 ]] Memory  Memory: @BotPTR *+bot, idx=Bot;

11     loadUB_indexed  === _ o7 o10  12  [[]]

# To suppress the following error report, specify this argument

# after -XX: or in .hotspotrc:  SuppressErrorAt=/matcher.cpp:1694

#

# A fatal error has been detected by the Java Runtime Environment:

#

#  Internal Error (/home/gut/hs-comp/hotspot/src/share/vm/opto/matcher.cpp:1694), pid=1649, tid=1734

#  assert(m->adr_type() == mach_at) failed: matcher should not change adr type

#

# JRE version: OpenJDK Runtime Environment (9.0) (slowdebug build 9-internal+0-2016-10-14-173706.gut.hs-comp)

# Java VM: OpenJDK 64-Bit Server VM (slowdebug 9-internal+0-2016-10-14-173706.gut.hs-comp, mixed mode, tiered, compressed oops, g1 gc, linux-ppc64le)

# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

#

# An error report file with more information is saved as:

# /home/gut/hs-comp/hs_err_pid1649.log

o34     LoadB   === _ o7 o33  [[o45 ]]  @java/lang/String:exact+20 *, name=coder, idx=4; #byte

mach:

11     loadConL16      === _  [[ 10 ]]   [6900011]

o10     Parm    === o3  [[o33 o33  10 ]] Parm0: java/lang/String:NotNull:exact *  Oop:java/lang/String:NotNull:exact *

o7      Parm    === o3  [[o48 o34  10 ]] Memory  Memory: @BotPTR *+bot, idx=Bot;

10     loadUB_indexed  === _ o7 o10  11  [[]]

[thread 1727 also had an error]

#

# Compiler replay data is saved as:

# /home/gut/hs-comp/replay_pid1649.log

#

# If you would like to submit a bug report, please visit:

#   http://bugreport.java.com/bugreport/crash.jsp

#

Current thread is 1734

Dumping core ...

Aborted





Even after investigating that assert (which checks for m->in(MemNode::Address)->is_DecodeNarrowPtr()), I didn't quite understand it. If I add on my match rule a EncodeP/DecodeN between LoadB and AddP to satisfy the NarrowPtr check, it simply doesn't match anything so my expectation is that it's correct and matching.



Could anybody please point out what I missed?



I also wanted to ask if the other linked nodes of, e.g "339 Addp", is a concern? As the IdealGraphVisualizer points out, it doesn't have only the "357 LoadB" node attached to it and I wonder what it'd do if this new loadUB_indexed instruction was working.





Thanks in advance,

Gustavo Serra Scalet



References:

[1] http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-October/002713.html


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20161027/f0a27c53/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 14043 bytes
Desc: image001.png
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20161027/f0a27c53/image001-0001.png>


More information about the hotspot-compiler-dev mailing list