Execution of a simple Lambda Method on GPU

Fri Jan 17 15:58:51 PST 2014

Bharadwaj,

Nice work.

It is great to see the Graal GPU backend and the Sumatra JVM enhancements required to dispatch to the GPU progressing like this.

We noted the recent checkins and knew something good was coming! ;)

Gary 

-----Original Message-----
From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of S. Bharadwaj Yadavalli
Sent: Friday, January 17, 2014 4:26 PM
To: sumatra-dev at openjdk.java.net; graal-dev at openjdk.java.net
Subject: Execution of a simple Lambda Method on GPU

A quick update on reaching the next milestone in executing Lambda methods on GPUs as a step towards the goals of Project Sumatra.

The JVM in Graal repo, now has the the ability to recognize a simple Lambda method defined in main class method, schedule compilation of such a method to target PTX backend, offload execution of the generated PTX code on a supported nVidia GPU hardware and get back the result to the VM.

This was made possible owing to recent enhancements and code reorganization made in Graal compiler to enhance support for GPU backends.

The current implementation is in experimental stage. For example, executing the Java class [1] results in [2]. Additional refinements to choose candidate Java methods for offloading to GPU are planned. We look forward to any open source community involvement that will help as move faster towards the next milestones of Project Sumatra.

Regards,

Bharadwaj

[1]
interface BinaryOperation {
     int apply(int a, int b);
}

class FindSumL {
    public static void main(String args[]) {
        BinaryOperation add = (x, y) -> x + y;
        int result = add.apply(8, 12);
        System.out.println("Sum is " + result);
    }
}

[2]
$ ./mx.sh vmg -XX:+TraceGPUInteraction -XX:-BootstrapGraal -G:Threads=1 FindSumL Found supported nVidia GPU device vendor : 0x10de device 0x06dd
gpu_linux::probe_gpu(): 1
[CUDA] Success: library linkage
CUDA driver initialization: Success
[CUDA] Number of compute-capable devices found: 2 [CUDA] Got the handle of first compute-device [CUDA] Unified addressing support on device 0: 1 [CUDA] Using GeForce GTX 780 Compiling Lambda method FindSumL::lambda$main$0 to PTX [CUDA] Success: Created context for device: 0 [CUDA] Success: Set current context for device: 0 [CUDA] PTX Kernel
     .version 3.0
     .target sm_30
.entry lambda$main$0 (
     .param .s32 param1,
     .param .s32 param2,
     .param .u64 param0
) {
     .reg .s32 %r3;
     .reg .s32 %r4;
     .reg .s32 %r5;
     .reg .u64 %r6;
L0:
     ld.param.s32 %r3, [param1 + 0];
     ld.param.s32 %r4, [param2 + 0];
     add.s32 %r5, %r4, %r3;
     ld.param.u64 %r6, [param0 + 0];
     st.global.s32 [%r6 + 0], %r5;
     ret;
}

[CUDA] Function name : lambda$main$0
[CUDA] Loaded data for PTX Kernel
[CUDA] Got function handle for lambda$main$0 kernel address 0x7fba686bfd80 [CUDA] Generated kernel External method:FindSumL.lambda$main$0(II)I
installCode0: ExternalCompilationResult
[CUDA] launching kernel
[CUDA] Success: Kernel Launch: X: 1 Y: 1 Z: 1 [CUDA] Success: Synchronized launch kernel [CUDA] Success: Freed device memory of return value [CUDA] Success: Destroy context Sum is 20