compiled leaf method vs. inlined method bounds check

Wed Dec 4 11:53:09 PST 2013

I think the HSAILPhase was copied from some early PTX backend, although I see there is still also a PTXPhase that does the same thing.
Does this just mark each object node in the graph as non-null, and thus eliminate null check code?

    private static class HSAILPhase extends Phase {
        @Override
        protected void run(StructuredGraph graph) {
            for (LocalNode local : graph.getNodes(LocalNode.class)) {
                if (local.stamp() instanceof ObjectStamp) {
                    local.setStamp(StampFactory.declaredNonNull(((ObjectStamp) local.stamp()).type()));
                }
            }
        }
    }

From: Gilles Duboscq [mailto:gilwooden at gmail.com]
Sent: Wednesday, December 04, 2013 1:22 PM
To: Deneau, Tom
Cc: graal-dev at openjdk.java.net
Subject: Re: compiled leaf method vs. inlined method bounds check

On Wed, Dec 4, 2013 at 7:07 PM, Tom Deneau <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
Gilles or others --

Still trying to understand this topic…
I made a test case where I just compile a simple method that inlines another method, no profiling.

   // compile this one
    public int intFromArrayLonger(int[] ary, int idx) {
        return intFromArrayLongerInner(ary, idx + 7);
    }

    public int intFromArrayLongerInner(int[] ary, int idx) {
        return ary[idx + 3];
    }

I ran that compiling for the HSAIL backend and then in a separate run for the AMD64 backend.
In the HSAIL case, I see the Deopt NotCompiledExceptionHandler, whereas the AMD64 case has the Deopt BoundsCheckException

I put the igv graphs up at http://cr.openjdk.java.net/~tdeneau/graal-webrevs/inline-example.xml  (the xml has the graphs from both backends)
Maybe someone can take a look

A couple of questions from the early phases of these graphs:

•         why is the graph after bytecode parsing of the inner method so different in the two cases?
in HSAILCompilationResult.java line 174, you can see the GraphBuilder used there is configured without any optimistic assumptions (OptimisticOptimizations.NONE). This means there will be explicit exception edge for everything.
This also explains the difference for the Deopt reason.

•         What is the phase called after bytecode parsing called HSAIL?
It's com.oracle.graal.hotspot.hsail.HSAILCompilationResult.HSAILPhase. it's added in HSAILCompilationResult.java line 174.

-- Tom

From: gilwooden at gmail.com<mailto:gilwooden at gmail.com> [mailto:gilwooden at gmail.com<mailto:gilwooden at gmail.com>] On Behalf Of Gilles Duboscq
Sent: Thursday, November 21, 2013 5:27 AM

To: Deneau, Tom
Cc: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>
Subject: Re: compiled leaf method vs. inlined method bounds check

Hello Tom,

sorry for the delayed answer.

I did the tests and i can not reproduce the behaviour you are seeing if no exception is ever thrown.
However i can easily reproduce it if the first method (the one containing the array accesses) has already thrown an exception while the second one (the one containing the call) has never seen an exception flow through the call.
In this case the second method assumes no exception can never flow through the call but when it inline the call, it sees that it actually needs to handle exceptions, in this case you get that NotCompiledExceptionHandler reason.

I used this to test:

public class ArrayTest extends JTTTest {
    static int[] array = {1, 2, 3};

    @Test
    public void test0() throws Throwable {
        run(new int[3], array, array, 0);
        runTest("callRun", new int[3], array, array, 0);
    }

    @Test
    public void test1() throws Throwable {
        run(new int[3], array, array, 3);
        runTest("callRun", new int[3], array, array, 0);
    }

    @Test
    public void test2() throws Throwable {
        callRun(new int[3], array, array, 3);
        runTest("callRun", new int[3], array, array, 0);
    }

    public static void run(int[] out, int[] ina, int[] inb, int gid) {
        out[gid] = ina[gid] + inb[gid];
    }

    public static void callRun(int[] out, int[] ina, int[] inb, int gid) {
        run(out, ina, inb, gid);
    }
}

In the first case (test0) i get the inlined call an the BoundsCheckException reason.
In the second case (test1) i get the inlined call and the NotCompiledExceptionHandler reason.
In the thrid case (test2) i get the full exception handling.
Note that you need to run these separately if you don't want profile pollution.

-Gilles

On Wed, Nov 13, 2013 at 1:22 AM, Tom Deneau <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
Gilles --

OK, we have a simple existing test in com.oracle.graal.compiler.hsail.test.IntAddTest which adds ints from two arrays putting the result in a third output array
    public static void run(int[] out, int[] ina, int[] inb, int gid) {
        out[gid] = ina[gid] + inb[gid];
    }

Here is the hsail code:.  (Note that we don't really handle the DeoptimizeNode but just print a comment based on the reason).

version 0:95: $full : $large;
// static method HotSpotMethod<IntAddTest.run(int[], int[], int[], int)>
kernel &run (
                kernarg_u64 %_arg0,
                kernarg_u64 %_arg1,
                kernarg_u64 %_arg2
                ) {
                ld_kernarg_u64  $d0, [%_arg0];
                ld_kernarg_u64  $d1, [%_arg1];
                ld_kernarg_u64  $d2, [%_arg2];
                workitemabsid_u32 $s0, 0;

@L0:
                ld_global_s32 $s1, [$d0 + 12];
                cmp_ge_b1_u32 $c0, $s0, $s1;
                cbr $c0, @L7;
@L1:
                ld_global_s32 $s1, [$d2 + 12];
                cmp_ge_b1_u32 $c0, $s0, $s1;
                cbr $c0, @L7;
@L2:
                ld_global_s32 $s1, [$d1 + 12];
                cmp_ge_b1_u32 $c0, $s0, $s1;
                cbr $c0, @L7;
@L3:
                cvt_s64_s32 $d3, $s0;
                mul_s64 $d3, $d3, 4;
                add_u64 $d1, $d1, $d3;
                ld_global_s32 $s1, [$d1 + 16];
                cvt_s64_s32 $d1, $s0;
                mul_s64 $d1, $d1, 4;
                add_u64 $d2, $d2, $d1;
                ld_global_s32 $s2, [$d2 + 16];
                add_s32 $s2, $s2, $s1;
                cvt_s64_s32 $d1, $s0;
                mul_s64 $d1, $d1, 4;
                add_u64 $d0, $d0, $d1;
                st_global_s32 $s2, [$d0 + 16];
                ret;
@L7:
                // Deoptimization for BoundsCheckException would occur here
                ret;
};

Then I made a new test where the run method just basically called the original IntAddTest.run
    public static void run(int[] out, int[] ina, int[] inb, int gid) {
        IntAddTest.run(out, ina, inb, gid);
    }

We compile with InlineEverything set.  We got this almost identical hsail code except for the deoptimization reason.  (In this case, there is no call to createOutOfBoundsException but I have seen it in larger test cases).  Note that in either case, no exceptions would have occurred when profiling.

version 0:95: $full : $large;
// static method HotSpotMethod<IntAddInlineTest.run(int[], int[], int[], int)>
kernel &run (
                kernarg_u64 %_arg0,
                kernarg_u64 %_arg1,
                kernarg_u64 %_arg2
                ) {
                ld_kernarg_u64  $d0, [%_arg0];
                ld_kernarg_u64  $d1, [%_arg1];
                ld_kernarg_u64  $d2, [%_arg2];
                workitemabsid_u32 $s0, 0;

@L0:
                ld_global_s32 $s1, [$d0 + 12];
                cmp_ge_b1_u32 $c0, $s0, $s1;
                cbr $c0, @L7;
@L1:
                ld_global_s32 $s1, [$d2 + 12];
                cmp_ge_b1_u32 $c0, $s0, $s1;
                cbr $c0, @L7;
@L2:
                ld_global_s32 $s1, [$d1 + 12];
                cmp_ge_b1_u32 $c0, $s0, $s1;
                cbr $c0, @L7;
@L3:
                cvt_s64_s32 $d3, $s0;
                mul_s64 $d3, $d3, 4;
                add_u64 $d1, $d1, $d3;
                ld_global_s32 $s1, [$d1 + 16];
                cvt_s64_s32 $d1, $s0;
                mul_s64 $d1, $d1, 4;
                add_u64 $d2, $d2, $d1;
                ld_global_s32 $s2, [$d2 + 16];
                add_s32 $s2, $s2, $s1;
                cvt_s64_s32 $d1, $s0;
                mul_s64 $d1, $d1, 4;
                add_u64 $d0, $d0, $d1;
                st_global_s32 $s2, [$d0 + 16];
                ret;
@L7:
                // Deoptimization for NotCompiledExceptionHandler would occur here
                ret;
};

From: gilwooden at gmail.com<mailto:gilwooden at gmail.com> [mailto:gilwooden at gmail.com<mailto:gilwooden at gmail.com>] On Behalf Of Gilles Duboscq
Sent: Tuesday, November 12, 2013 5:56 PM
To: Deneau, Tom
Cc: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>
Subject: Re: compiled leaf method vs. inlined method bounds check

Can you maybe show us the snippets you used and how you ran them?
The second scenario you describe usually happen if an ArrayIndexOutOfBoundsException has already been thrown at the array access you're looking at.
When compiling an array access, Graal will look at the profile and if it shows that exceptions are thrown there, it will compile in the exception branch (in your case the exception branch ends up into an other deopt for some reason). If profiling shows no exception has been thrown there, it will leave out the exception branch and will only place a which deoptimizes in case an exception needs to be thrown.

This should have nothing to do with inlining. When doings tests about that, be carefull not to pollute the profile for the second test with the first one.
You can change the bahviour of graal reagrding these things using the UseExceptionProbabilityForOperations flag.

-Gilles

On Tue, Nov 12, 2013 at 11:49 PM, Tom Deneau <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
I've noticed that if the graph I am compiling is simply a leaf method which accesses an array, the target of a failing bounds check is a DeoptimizeNode with reason=BoundsCheckException, action=InvalidateReprofile

But if the method that is accessing the array is being inlined into another method, the target of the failing bounds check is a ForeignCall to createOutOfBoundsException followed by a branch to a DeoptimizeNode with reason=NotCompiledExceptionHandler, action=InvalidateRecompile.

Can someone explain this difference?

-- Tom