Experimental support for CUDA

Fri Sep 15 04:28:48 UTC 2017

Hi all,

I would like to share an experimental support for CUDA. It’s based on expression api after major lifting (according to feedback plus some other findings) and simple transformation of lambda to CUDA via NVRTC (right now it’s just... "gluing strings”). Wit above I’ve been able to execute following kernel on GPU         

Runnable fillArrayKernel = () -> { 
            // Trick to cast to int[], as we don't want to opeate via dataRegion
            int[] data = (int[]) (Object) dataRegion; 

            // The starting index of data chunk
            int dataChunkBegin = blockIdx.x * elementsPerBlock + threadIdx.x;

            // Set values in looop
            for (int i = 0; i < loopsPerThread; i++) {

                data[dataChunkBegin + threadsCount * i] = valueToSet;
            }
        };

(There is one casting, which I allowed for fun)

It’s maybe not production grade, but shows one possible approach, and later, I think, NVRTC can be replaced by direct PTX generation.

Changes are for OSX, and samples has to be run in NVIDA/lib dir, until someone knows how to configure LD_LIBRARY_PATH on OS X...

Sample of kernel can be found here:
- https://github.com/rsmogura/cuda-java-samples (CUDA sample)

And a changes:
- https://bitbucket.org/radoslaw_smogura/panama-jdk/branch/cuda (jdk changes)
- https://bitbucket.org/radoslaw_smogura/panama-langtools/branch/javax.expression-v2 (expression api)
- https://bitbucket.org/radoslaw_smogura/panama-build/branch/javax.expression-v2 (top-level tree required by expression api)

Best regards,
Radek