Project Proposal: GPU support

Phil Pratt-Szeliga pcpratts at chirrup.org
Fri Aug 17 14:02:04 UTC 2012


Hi John,

Thanks for the slides. I am not in touch with experts on Java! :-)

I'll have to think about this a little. The way I made Rootbeer, I
made it work for any JVM, which required a certain design. If we are
inside the JVM, there are lots of new tradeoffs to study about how to
best do things. One tough thing about making rootbeer is that once you
send your binary to the GPU and launch it, it was pretty much a black
box for me. I couldn't see inside the GPU very well.

Some minor things that come to mind:
1. On the NVIDIA GPUs if you don't align, say, int's, to a 4 byte
boundary, the GPU will silently align the pointer and read from where
ever that is.
2. You can't make the CUDA code for the GPU include all of the Java
code. It will be too big for the GPU and compilation will take too
long. Rootbeer searches which methods are reachable from the entry
(gpuMethod in the rootbeer interface Kernel) and does things like only
include the fields accessible from those methods.
3. Some of the native code in, say, AtomicLong or Random that people
did for performance reasons made me remap those classes to pure Java
versions I made myself. Right now native code cannot be put on the GPU
with rootbeer. Research topic?
4. When I made rootbeer I cross-compiled Java Bytecode to CUDA and
then compiled the CUDA before anything runs on the GPU. If this is
happening inside the JVM, we might want to translate straight to ptx
(for NVIDIA devices).
5. Keeping track of the root objects for the garbage collector could
be tricky on the GPU because there is no runtime monitor in my work.
Just plain CUDA code.
6. Serialization and memory transfer is a huge bottleneck when using
GPUs talking over a PCI express bus.
7. Properly using the shared memory (NVIDIA term here) on the GPU is a
harder research problem. I did not get that to work yet with rootbeer.
But shared memory gives much better performance.
8. For even more optimizations people have research about converting
un-aligned GPU memory reads into GPU aligned memory reads.

These are just a few things I am thinking of right now.

Another person that maybe people should talk to is Micheal Wolfe at PGI [1].

Phil Pratt-Szeliga
Syracuse University
http://chirrup.org/

[1] http://www.pgroup.com/index.htm



More information about the discuss mailing list