handling deoptimization nodes

Wed May 8 08:36:11 PDT 2013

Hi,

I posted this on the Graal developers list, but am crossposting it here as it raises more general Sumatra questions.

I've coded up an HSAIL backend for the Graal JIT compiler. When I run the below test case (Mandelbrot) with --vm server the JVM generates some Deoptimization nodes for code paths it thinks are less frequently taken. As I understand, the x86 backend handles these nodes by falling back to the interpreter. This would be okay in a single ISA mode (where we're just generating x86 code), but a different strategy would be needed in a dual ISA mode, since "falling back to the interpreter" wouldn't make sense in the scenario where we're offloading code to the GPU. 

Have people thought about how we can handle these Deoptimization nodes when we're generating code just for the GPU?

Vasanth

-----Original Message-----
From: graal-dev-bounces at openjdk.java.net [mailto:graal-dev-bounces at openjdk.java.net] On Behalf Of Venkatachalam, Vasanth
Sent: Tuesday, May 07, 2013 9:29 PM
To: graal-dev at openjdk.java.net
Subject: deoptimization nodes

Hi,

When running the test case below  with -vm server, Hotspot passes to Graal a Deoptimization Node to handle the case where the while loop of testMandelSimple  is exited due to count becoming >=maxIterations.

I suspect it's doing this because it thinks this execution path is less frequently taken. The AMD64 backend handles the Deoptimization node by invoking a runtime stub routine, which I suspect is falling back to the interpreter. (Can someone confirm whether this is the case?)

For our HSAIL backend, we don't want to handle the deoptimization node in the same way (by fall back to interpreter).

Is there a way to prevent Hotspot from generating deoptimization nodes for code paths it thinks are less frequently taken, but to instead force it to generate the complete set of nodes that would normally be generated for these paths?

Would running without the -vm server option do the trick?
We found when we run without -vm server, the complete set of nodes (for the while loop exit) are generated, and the Deoptimization nodes only get generated by Graal for array bounds checking. This is the behavior we would like to see.

Vasanth

The following test case can be run in the AMD64 backend. (We ran it in a BasicAMD64Test.java).

void setupPalette(int[] in) {
        for (int i = 0; i < in.length; i++) {
            in[i] = i;
        }
    }

    @Test
    public void testMandel() {

        final int WIDTH = 768;
        final int HEIGHT = WIDTH;
        final int maxIterations = 64;
        int loopiterations = 1;
        int iter = 0;
        final int RANGE = WIDTH * HEIGHT;
        int[] rgb = new int[RANGE];
        int[] palette = new int[RANGE];// [maxIterations];
        setupPalette(palette);
        while (iter < loopiterations) {
            for (int gid = 0; gid < RANGE; gid++) {
                testMandelSimple(rgb, palette, -1.0f, 0.0f, 3f, gid);
            }
            iter++;
        }
        test("testMandelSimple");
    }

    public static void testMandelSimple(int rgb[], int pallette[], float x_offset, float y_offset, float scale, int gid) {
        final int width = 768;
        final int height = 768;
        final int maxIterations = 64;
        float lx = (((gid % width * scale) - ((scale / 2) * width)) / width) + x_offset;
        float ly = (((gid / width * scale) - ((scale / 2) * height)) / height) + y_offset;

        int count = 0;
        float zx = lx;
        float zy = ly;
        float new_zx = 0f;

        // Iterate until the algorithm converges or until maxIterations are reached.
        while (count < maxIterations && zx * zx + zy * zy < 8) {
            new_zx = zx * zx - zy * zy + lx;
            zy = 2 * zx * zy + ly;
            zx = new_zx;
            count++;
        }

        rgb[gid] = pallette[count];

    }