Experimental support for CUDA
Radosław Smogura
mail at smogura.eu
Fri Sep 15 04:28:48 UTC 2017
Hi all,
I would like to share an experimental support for CUDA. It’s based on expression api after major lifting (according to feedback plus some other findings) and simple transformation of lambda to CUDA via NVRTC (right now it’s just... "gluing strings”). Wit above I’ve been able to execute following kernel on GPU
Runnable fillArrayKernel = () -> {
// Trick to cast to int[], as we don't want to opeate via dataRegion
int[] data = (int[]) (Object) dataRegion;
// The starting index of data chunk
int dataChunkBegin = blockIdx.x * elementsPerBlock + threadIdx.x;
// Set values in looop
for (int i = 0; i < loopsPerThread; i++) {
data[dataChunkBegin + threadsCount * i] = valueToSet;
}
};
(There is one casting, which I allowed for fun)
It’s maybe not production grade, but shows one possible approach, and later, I think, NVRTC can be replaced by direct PTX generation.
Changes are for OSX, and samples has to be run in NVIDA/lib dir, until someone knows how to configure LD_LIBRARY_PATH on OS X...
Sample of kernel can be found here:
- https://github.com/rsmogura/cuda-java-samples (CUDA sample)
And a changes:
- https://bitbucket.org/radoslaw_smogura/panama-jdk/branch/cuda (jdk changes)
- https://bitbucket.org/radoslaw_smogura/panama-langtools/branch/javax.expression-v2 (expression api)
- https://bitbucket.org/radoslaw_smogura/panama-build/branch/javax.expression-v2 (top-level tree required by expression api)
Best regards,
Radek
More information about the panama-dev
mailing list