aarch64 AD-file / matching rule
Benedikt Wedenik
benedikt.wedenik at theobroma-systems.com
Thu Apr 30 10:06:32 UTC 2015
Hi,
thanks for your quick help!
But I found out, that the pattern I was searching for is emitted here:
cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp
This means, my pattern will never match the rule in the AD-file because it is more or less “hardcoded” :)
I wrote a small simulation program to see if the rule would match in JIT-compiled code and it worked.
I’ll do some more investigation in how to optimise this pattern in the C++ code because it occurs quite often.
Thanks again,
Benedikt
On 29 Apr 2015, at 16:37, Lindenmaier, Goetz <goetz.lindenmaier at sap.com> wrote:
> Hi,
>
> I am using PrintOptoAssembly in such cases. This tells me how the IR is looking after
> matching. Together with PrintAssembly you can manage to locate the block
> with the pattern.
>
> With PrintIdeal you can see the graph before matching. You should find the pattern
> you described in the ad rule there. Hard to read, though.
>
> There is also the PrintIdealGraph flag, printing a graph you can visualize.
> I didn’t use that, though. We have instrumented the opto compiler with
> our own graph printer.
>
> I could imagine that the AndI node has more than one usage/out edge.
> Then it’s not a tree-like subgraph, and the matcher can not apply the rule.
> This is something you would check in the PrintIdeal output or in the last
> Ideal graph before matching.
>
> Best regards,
> Goetz.
>
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Benedikt Wedenik
> Sent: Mittwoch, 29. April 2015 14:50
> To: hotspot-compiler-dev at openjdk.java.net
> Cc: Dr. Philipp Tomsich; Benedikt Huber
> Subject: aarch64 AD-file / matching rule
>
> Hi!
>
> I’m writing compiler-optimisations for the aarch64 port at the moment and I am using specjbb2005 for benchmarking.
> One of the patterns I want to optimise is the following:
>
> 0x0000007f8c2961b4: and w2, w2, #0x7ffff8
> 0x0000007f8c2961b8: cmp w2, #0x0
> 0x0000007f8c2961bc: b.eq 0x0000007f8c2968f4
>
>
> Here I see an opportunity for ands, b.eq.
>
> I created a new rule in the cpu/aarch64/vm/aarch64.ad file.
> My matching looks like this:
>
> instruct and_cmp_branch(cmpOp cmp, immI0 zero, iRegIorL2I src1, immILog src2, label lbl, rFlagsReg cr) %{
> match(If cmp (CmpI (AndI src1 src2) zero) );
>
> effect(USE lbl);
> ins_cost(0); // is zero at the moment to be sure the rule is triggered.
>
> ins_encode %{
> Label* L = $lbl$$label;
> Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
> __ andsw(as_Register($src1$$reg),
> as_Register($src1$$reg),
> (unsigned long)($src2$$constant));
> __ br ((Assembler::Condition)$cmp$$cmpcode, *L);
> %}
>
> ins_pipe(pipe_cmp_branch); //TODO but not relevant yet
> %}
>
>
> As I don’t know whether my matching-rule is wrong or something else stops the rule from getting emitted I wanted to find out which “and”-rule is triggered for this pattern.
> I inserted some nop’s to locate the according rule and I found out, that most of the emitted “and”s were surrounded by nop’s except for my pattern and some few other ones like this one:
>
> 0x0000007f984bf568: eor x1, x0, x1
> 0x0000007f984bf56c: and x1, x1, #0xffffffffffffff87
> 0x0000007f984bf570: cbz x1, 0x0000007f984bf664
> 0x0000007f984bf574: and xscratch1, x1, #0x7
> 0x0000007f984bf578: cbnz xscratch1, 0x0000007f984bf5f0
> 0x0000007f984bf57c: and xscratch1, x1, #0x300
> 0x0000007f984bf580: cbnz xscratch1, 0x0000007f984bf5b8
> 0x0000007f984bf584: mov xscratch1, #0x37f // #895
> 0x0000007f984bf588: and x0, x0, xscratch1
> 0x0000007f984bf58c: orr x1, x0, xthread
> 0x0000007f984bf590: ldaxr xscratch1, [x3]
> 0x0000007f984bf594: cmp xscratch1, x0
> 0x0000007f984bf598: b.ne 0x0000007f984bf5a8
>
>
> Usually I call the program like this:
>
> ————
> JAVA=/root/bwedenik/jdk8/jdk8/build/linux-aarch64-normal-server-release/jdk/bin/java
>
> $JAVA -fullversion
> $JAVA -server -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -XX:+OptimizeStringConcat -XX:+UseBiasedLocking -XX:+UseParallelGC -XX:ParallelGCThreads=10 -XX:+UseParallelOldGC -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=15 -Xms10g -Xmx10g -Xmn4g -Xss64m -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand='print,*DeliveryTransaction.preprocess' spec.jbb.JBBmain -propfile SPECjbb.props
> ————
>
>
> I tried to figure out if this problem only occurs with c1, c2 or pure interpretation mode and these are the results (calling java as usual including the given arguments):
>
> * [-Xint] : This gives me neither the inserted nop’s nor the pattern I am searching for (as expected due to no compilation).
> * [-client -Xcomp -XX:-TieredCompilation] : Here the cmp for #0x0 only occurs about 3 times in the whole disassembly, instead of about 200 times without these flags. In addition there are no of my inserted nop’s in the disass.
> * [-server -Xcomp -XX:-TieredCompilation] : Same as -client.
>
>
> My question is now how to find out why the rule does not match / if the rule is correct and how to find the actual rule which emits the code of my desired pattern.
>
> Thanks in advance,
> Benedikt Wedenik, Theobroma-Systems.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150430/66e46630/attachment-0001.html>
More information about the hotspot-compiler-dev
mailing list