compiled leaf method vs. inlined method bounds check

Tue Nov 12 16:22:21 PST 2013

Gilles --

OK, we have a simple existing test in com.oracle.graal.compiler.hsail.test.IntAddTest which adds ints from two arrays putting the result in a third output array
    public static void run(int[] out, int[] ina, int[] inb, int gid) {
        out[gid] = ina[gid] + inb[gid];
    }

Here is the hsail code:.  (Note that we don't really handle the DeoptimizeNode but just print a comment based on the reason).

version 0:95: $full : $large;
// static method HotSpotMethod<IntAddTest.run(int[], int[], int[], int)>
kernel &run (
                kernarg_u64 %_arg0,
                kernarg_u64 %_arg1,
                kernarg_u64 %_arg2
                ) {
                ld_kernarg_u64  $d0, [%_arg0];
                ld_kernarg_u64  $d1, [%_arg1];
                ld_kernarg_u64  $d2, [%_arg2];
                workitemabsid_u32 $s0, 0;

@L0:
                ld_global_s32 $s1, [$d0 + 12];
                cmp_ge_b1_u32 $c0, $s0, $s1;
                cbr $c0, @L7;
@L1:
                ld_global_s32 $s1, [$d2 + 12];
                cmp_ge_b1_u32 $c0, $s0, $s1;
                cbr $c0, @L7;
@L2:
                ld_global_s32 $s1, [$d1 + 12];
                cmp_ge_b1_u32 $c0, $s0, $s1;
                cbr $c0, @L7;
@L3:
                cvt_s64_s32 $d3, $s0;
                mul_s64 $d3, $d3, 4;
                add_u64 $d1, $d1, $d3;
                ld_global_s32 $s1, [$d1 + 16];
                cvt_s64_s32 $d1, $s0;
                mul_s64 $d1, $d1, 4;
                add_u64 $d2, $d2, $d1;
                ld_global_s32 $s2, [$d2 + 16];
                add_s32 $s2, $s2, $s1;
                cvt_s64_s32 $d1, $s0;
                mul_s64 $d1, $d1, 4;
                add_u64 $d0, $d0, $d1;
                st_global_s32 $s2, [$d0 + 16];
                ret;
@L7:
                // Deoptimization for BoundsCheckException would occur here
                ret;
};

Then I made a new test where the run method just basically called the original IntAddTest.run
    public static void run(int[] out, int[] ina, int[] inb, int gid) {
        IntAddTest.run(out, ina, inb, gid);
    }

We compile with InlineEverything set.  We got this almost identical hsail code except for the deoptimization reason.  (In this case, there is no call to createOutOfBoundsException but I have seen it in larger test cases).  Note that in either case, no exceptions would have occurred when profiling.

version 0:95: $full : $large;
// static method HotSpotMethod<IntAddInlineTest.run(int[], int[], int[], int)>
kernel &run (
                kernarg_u64 %_arg0,
                kernarg_u64 %_arg1,
                kernarg_u64 %_arg2
                ) {
                ld_kernarg_u64  $d0, [%_arg0];
                ld_kernarg_u64  $d1, [%_arg1];
                ld_kernarg_u64  $d2, [%_arg2];
                workitemabsid_u32 $s0, 0;

@L0:
                ld_global_s32 $s1, [$d0 + 12];
                cmp_ge_b1_u32 $c0, $s0, $s1;
                cbr $c0, @L7;
@L1:
                ld_global_s32 $s1, [$d2 + 12];
                cmp_ge_b1_u32 $c0, $s0, $s1;
                cbr $c0, @L7;
@L2:
                ld_global_s32 $s1, [$d1 + 12];
                cmp_ge_b1_u32 $c0, $s0, $s1;
                cbr $c0, @L7;
@L3:
                cvt_s64_s32 $d3, $s0;
                mul_s64 $d3, $d3, 4;
                add_u64 $d1, $d1, $d3;
                ld_global_s32 $s1, [$d1 + 16];
                cvt_s64_s32 $d1, $s0;
                mul_s64 $d1, $d1, 4;
                add_u64 $d2, $d2, $d1;
                ld_global_s32 $s2, [$d2 + 16];
                add_s32 $s2, $s2, $s1;
                cvt_s64_s32 $d1, $s0;
                mul_s64 $d1, $d1, 4;
                add_u64 $d0, $d0, $d1;
                st_global_s32 $s2, [$d0 + 16];
                ret;
@L7:
                // Deoptimization for NotCompiledExceptionHandler would occur here
                ret;
};

From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of Gilles Duboscq
Sent: Tuesday, November 12, 2013 5:56 PM
To: Deneau, Tom
Cc: graal-dev at openjdk.java.net
Subject: Re: compiled leaf method vs. inlined method bounds check

Can you maybe show us the snippets you used and how you ran them?
The second scenario you describe usually happen if an ArrayIndexOutOfBoundsException has already been thrown at the array access you're looking at.
When compiling an array access, Graal will look at the profile and if it shows that exceptions are thrown there, it will compile in the exception branch (in your case the exception branch ends up into an other deopt for some reason). If profiling shows no exception has been thrown there, it will leave out the exception branch and will only place a which deoptimizes in case an exception needs to be thrown.

This should have nothing to do with inlining. When doings tests about that, be carefull not to pollute the profile for the second test with the first one.
You can change the bahviour of graal reagrding these things using the UseExceptionProbabilityForOperations flag.

-Gilles

On Tue, Nov 12, 2013 at 11:49 PM, Tom Deneau <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
I've noticed that if the graph I am compiling is simply a leaf method which accesses an array, the target of a failing bounds check is a DeoptimizeNode with reason=BoundsCheckException, action=InvalidateReprofile

But if the method that is accessing the array is being inlined into another method, the target of the failing bounds check is a ForeignCall to createOutOfBoundsException followed by a branch to a DeoptimizeNode with reason=NotCompiledExceptionHandler, action=InvalidateRecompile.

Can someone explain this difference?

-- Tom