RFR(M) GRAAL-234 - PTX code loader

Morris Meyer morris.meyer at oracle.com
Mon Apr 29 07:44:33 PDT 2013


Folks,

I have extended the Graal PTX back-end to incorporate a light and tight 
linkage to Nvidia GPUs.  I've managed to have the Graal-generated PTX 
code from Java methods properly compile using the Cuda 5.0 drivers on my 
GTX 660 graphics card.

If you are wondering about this comment in src/share/vm/runtime/thread.cpp:

     // Initialize the os module before using TLS
     os::init();

+   // probe for warp capability
+   gpu::init();
+

Per http://docs.nvidia.com/cuda/pdf/ptx_isa_3.1.pdf - the Parallel 
Thread Execution ISA Version 3.1 from Nvidia:

"A warp is a maximal subset of threads from a single cooperative thread 
array (CTA), such that the threads execute the same instructions at the 
same time. ...Each grid of CTAs has a 1D, 2D or 3D shape"

I am currently assigned JDK-8013168, which is to extend the set of code 
pointers from Method to support multiple architectures.  This will be 
necessary to adapt the loaded GPU kernel to internal HotSpot method 
invocation, as well as architectures that have a heterogeneous CPU+APU 
configuration.

WEBREV - http://cr.openjdk.java.net/~morris/GRAAL-234.01
JIRA - https://lafo.ssw.uni-linz.ac.at/jira/browse/GRAAL-234

         --morris

"to boldly go"

__________________________________________________________________

Output:

vendor: 0x000010DE
device: 0x000011C0
model: NVIDIA GeForce GTX 660
gpu_bsd::probe_gpu(APPLE): 1
gpu::Ptx::probe_linkage
gpu_ptx::probe_linkage(APPLE): 1
gpu::initialize_gpu
gpu_ptx::_cuda_cu_init: 0
gpu_ptx::_cuda_cu_device_get_count(1): 0
gpu_ptx::_cuda_cu_device_get(0): 0
gpu_ptx::_cuda_cu_device_compute_capability(major 3, minor 0): 0
gpu_ptx::_cuda_cu_device_get_name(GeForce GTX 660): 0
gpu_ptx::_cuda_cu_ctx_create(3b013800): 0
gpu_ptx::initialize_gpu(): 1
gpu::generate_kernel
gpu::Ptx::generate_kernel
gpu_ptx::_cuda_cu_module_load_data_ex(39b575a0): 0
gpu_ptx::jit_log_buffer

gpu_ptx::_cuda_cu_module_get_function(testAddConst1I):3b119600 0
testAddConst1I:
     .version 1.4
     .target sm_10
.entry testAddConst1I (
     .param .u32 param0
) {
       .reg .pred %p,%q;
       .reg .u32 %r<16>;
L121:
     add.s32 %r2, %r2, 1;
     mov.s32 %r0, %r2;
     exit;
}


More information about the graal-dev mailing list