questions about decoupling GPU target runtimes from host runtime

Thu Oct 3 07:25:17 PDT 2013

On Oct 2, 2013, at 10:48 PM, "Venkatachalam, Vasanth" <Vasanth.Venkatachalam at amd.com> wrote:

> Doug-
>  
> We’re scoping out the changes needed to isolate the GPU target runtimes (HSAIL, PTX) from the host runtime, so that the host runtime (e.g., AMD64HotspotRuntime) does not get instantiated when we generate code for HSAIL or PTX. This is in part intended to address the comments you sent in sent about http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-r15changes/webrev/  (forwarded below). We had some questions below.
>  
> 1)      All of the platform-specific runtime classes defined so far (PTXHotSpotRuntime, AMD64HotSpotRuntime) are subclassing HotSpotRuntime, which implements the CodeCacheProvider interface. If we understood you correctly, the GPU target runtimes (PtX, HSAIL) should not subclass HotSpotRuntime, because in doing so they would be inheriting code that is HotSpot specfic and not relevant to the GPU target. They should instead be independent classes which extend the DelegatingCodeCacheProvider class and delegate to HotSpotRuntime any functionality that is not target GPU specific.  Is this what you had in mind?

Yes.

> 2)      The routine graalRuntime( ) in graalVMToCompiler.cpp is checking to see if the UseGPU flag is enabled and if it is (and if the GPU target is PTX) it is instantiating PTXHotSpotGraalRuntime. It sounds like we need to add some logic to check if the GPU target is HSAIL and if so to instantiate the HSAIL runtime class. Am I on the right track?

I'm not so sure that this GPU detection and initialization code is general enough. I can envisage a system with two or more different GPU types available. I've commented on this while reviewing a proposal from Bharadwaj on better support for GPU backends in Graal. I've attached the details of this review (in our internal Crucible instance).

> a.      Can you send us the command line to use to check that the UseGPU functionality is working? When I run a PTX test case in NetBeans with –XX:+UseGPU the JVM crashes even before it gets to generate any code.

I've never run with UseGPU enabled. Hopefully Bharadwaj, Morris or Christian (Thalinger) can address this question.

> 3)      PTXHotSpotGraalRuntime is creating a PTXHotSpotBackend which looks similar to PTXBackend. What is the purpose of having both the PTXHotSpotBackend and a PTXBackend? It looks like these two classes are duplicating some functionality. Do we need to have an HSAILHotSpotBackend in addition to HSAILBackend? I have a similar question for PTXHotSpotRegisterConfiguration vs. PTXRegisterConfiguration. How should we deal with these pairs of classes in the refactoring you’re recommending in 1)? It seems to me like these PTXHotspot* classes should go away if we’re truly decoupling the GPU target runtimes from Hotspot.

Yes, they should go away. Hopefully the attached review shows a sketch of how we are thinking this could be achieved.

> 4)      Is there a reason why we need both PTXHotspotGraalRuntime in addition to PtXHotSpotRuntime instead of just having a single class that combines the work of both?

We want to have an API where there is exactly one HotSpotGraalRuntime instance that manages multiple backends. 

-Doug

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: CR-1587.txt
Url: http://mail.openjdk.java.net/pipermail/graal-dev/attachments/20131003/f822795c/CR-1587.txt 
-------------- next part --------------

> -----Original Message-----
> From: Doug Simon [mailto:doug.simon at oracle.com] 
> Sent: Wednesday, September 11, 2013 9:54 AM
> To: Venkatachalam, Vasanth
> Cc: graal-dev at openjdk.java.net
> Subject: Re: webrev: workaround for threadRegister handling
>  
> Hi Vasanth,
>  
> I think you are tackling this problem at the wrong level. All the logic that uses threadRegister and stackPointerRegister is in snippets. The point of snippets is they are the interface the compiler uses to do runtime-specific lowering. I somehow very much doubt that the snippets using these registers will make any sense in the context of the GPU. Even if you plan on implementing new/newarray/monitorenter/monitorexit etc on the GPU, the code will be quite different than that for HotSpot's "host" runtime. After all, they are very specific to HotSpot data structures such as thread local allocation buffers, G1 barriers, etc.
>  
> In my opinion, you need to have a GraalCodeCacheProvider implementation that does all the GPU specific lowering. To ensure you've got this separation right/complete, your GraalCodeCacheProvider subclass probably shouldn't even subclass HotSpotRuntime.
>  
> -Doug
>