Moving GPU offload policy into Java sources

Mon Mar 10 15:58:04 UTC 2014

[opening up for broader discussion]

Bharadwaj,

I think this is really a discussion on whether or not it makes sense for a whole bytecode method to be offloaded to a GPU. I cannot imagine a scenario in which this would achieve faster execution that host CPU execution. In my understanding, for GPU execution to be a win, there needs to be parallel execution. This requires either some compiler-like analysis or some library API that implies parallel execution. The latter justifies the Sumatra approach. For the former, an ideal place to do the analysis is as part of compilation. The compiler may recognize a loop whose body can offloaded to GPU. So, in that sense, you could say that offload is part of compilation. However, I still think this is different from being part of *compilation policy*. For the latter, all the analysis has to be done as part of the process deciding whether or not to compile something. Not only would this be very expensive, it would almost certainly require an analysis framework very similar to what the compiler already offers.

So, I think we agree on the worthy goal of automatic GPU offload. I just think this is best done within a compilation. Assuming you still think the required analysis is best done outside of compilation, can you describe how it can be done (efficiently) and what mechanisms it would use?

-Doug

On Mar 10, 2014, at 4:09 PM, S. Bharadwaj Yadavalli <bharadwaj.yadavalli at oracle.com> wrote:

> 
> On 03/08/2014 03:50 AM, Doug Simon wrote:
>> Why? All the context needed for the decision can be accessed from Java code. In any case, it needs to be removed from the normal compilation policy mechanism.
> 
> In my opinion, deciding which non-host target to compile and execute Java methods _is_ part of compilation policy - just like the current compilation policy decides which methods to compile and which to interpret. Enhancing the present policy to offload execution of appropriate portions of Java for better performance _transparently_ is what gives the ability to run Java applications on heterogeneous systems. Adding GPU-specific changes to JDK (similar to what AMD guys did for Streams) is at best an intermediate step. Taking that approach will require implementations of data structures such as Streams to be specialized for GPUs as well as other heterogeneous architectures like Intel's Phi. We will have to then specialize implementations of other data structures.
> 
> I believe that non-host offload should be decided by the VM based on structure of the code in a compilation unit and the nature of data that unit manipulates. Any specialization/annotation in the library code should be to provide hints to the offload policy.
> 
> Bharadwaj
>