RFR(M) GRAAL-234 - PTX code loader

Wed May 1 09:24:38 PDT 2013

Thanks for the patch! This looks great. - thomas

On Apr 29, 2013, at 4:44 PM, Morris Meyer <morris.meyer at oracle.com> wrote:

> Folks,
> 
> I have extended the Graal PTX back-end to incorporate a light and tight linkage to Nvidia GPUs.  I've managed to have the Graal-generated PTX code from Java methods properly compile using the Cuda 5.0 drivers on my GTX 660 graphics card.
> 
> If you are wondering about this comment in src/share/vm/runtime/thread.cpp:
> 
>    // Initialize the os module before using TLS
>    os::init();
> 
> +   // probe for warp capability
> +   gpu::init();
> +
> 
> Per http://docs.nvidia.com/cuda/pdf/ptx_isa_3.1.pdf - the Parallel Thread Execution ISA Version 3.1 from Nvidia:
> 
> "A warp is a maximal subset of threads from a single cooperative thread array (CTA), such that the threads execute the same instructions at the same time. ...Each grid of CTAs has a 1D, 2D or 3D shape"
> 
> I am currently assigned JDK-8013168, which is to extend the set of code pointers from Method to support multiple architectures.  This will be necessary to adapt the loaded GPU kernel to internal HotSpot method invocation, as well as architectures that have a heterogeneous CPU+APU configuration.
> 
> WEBREV - http://cr.openjdk.java.net/~morris/GRAAL-234.01
> JIRA - https://lafo.ssw.uni-linz.ac.at/jira/browse/GRAAL-234
> 
>        --morris
> 
> "to boldly go"
> 
> __________________________________________________________________
> 
> Output:
> 
> vendor: 0x000010DE
> device: 0x000011C0
> model: NVIDIA GeForce GTX 660
> gpu_bsd::probe_gpu(APPLE): 1
> gpu::Ptx::probe_linkage
> gpu_ptx::probe_linkage(APPLE): 1
> gpu::initialize_gpu
> gpu_ptx::_cuda_cu_init: 0
> gpu_ptx::_cuda_cu_device_get_count(1): 0
> gpu_ptx::_cuda_cu_device_get(0): 0
> gpu_ptx::_cuda_cu_device_compute_capability(major 3, minor 0): 0
> gpu_ptx::_cuda_cu_device_get_name(GeForce GTX 660): 0
> gpu_ptx::_cuda_cu_ctx_create(3b013800): 0
> gpu_ptx::initialize_gpu(): 1
> gpu::generate_kernel
> gpu::Ptx::generate_kernel
> gpu_ptx::_cuda_cu_module_load_data_ex(39b575a0): 0
> gpu_ptx::jit_log_buffer
> 
> gpu_ptx::_cuda_cu_module_get_function(testAddConst1I):3b119600 0
> testAddConst1I:
>    .version 1.4
>    .target sm_10
> .entry testAddConst1I (
>    .param .u32 param0
> ) {
>      .reg .pred %p,%q;
>      .reg .u32 %r<16>;
> L121:
>    add.s32 %r2, %r2, 1;
>    mov.s32 %r0, %r2;
>    exit;
> }