questions about decoupling GPU target runtimes from host runtime

Thu Oct 3 15:10:55 PDT 2013

Doug-

To remind you the context behind our questions, the host runtime (AMD64HotspotRuntime) is being instantiated and treated as the "default" runtime when we are executing an HSAIL test case. 
As a result, the code ends up in an inconsistent state where the HSAIL register configuration is being used, but the underlying runtime is still AMD64.

This problem is affecting more than one place in the code. 

For example, CFGPrinterObserver line 146 looks up the runtime to be AMD64HotSpotRuntime, and as a result CompilationPrinter.debugInfoToString (line 124) gets an ArrayIndexOutOfBoundsException when it tries to lookup an HSAIL register number in an array of AMD64 registers (which has fewer registers than HSAIL). This is causing test cases that exercise a lot of HSAIL registers  to fail when we run them with flags to dump a data file for the C1visualizer.

The other example is that test cases involving method calls are causing some exception handling snippets to be invoked. The AMD64 definition of these snippets gets loaded and looks for a threadregister (which we haven't specified in HSAIL) and this leads to an assertion error.

Are you recommending that we should postpone tackling the underlying problem until the improved support you mention below is in place? Is there an interim solution that we could submit as a patch?

Vasanth

-----Original Message-----
From: Doug Simon [mailto:doug.simon at oracle.com] 
Sent: Thursday, October 03, 2013 1:55 PM
To: Venkatachalam, Vasanth
Cc: graal-dev at openjdk.java.net
Subject: Re: questions about decoupling GPU target runtimes from host runtime

Hi Vasanth,

I will be visiting the team in Linz early next week and we will spend some time improving the support in Graal for multiple backends. If possible, I recommend deferring any further work on the HSAIL backend until we've come up with something.

-Doug

On Oct 3, 2013, at 8:39 PM, "Venkatachalam, Vasanth" <Vasanth.Venkatachalam at amd.com> wrote:

> Doug-
>  
> Thanks for your reply. Some follow-up questions.
>  
> 1)      Based on this discussion thread and your attached codereview, it sounds like the details for multiple backend support haven't been fully worked out yet.
> Given this, does it still make sense for us to submit a self-contained patch that just refactors PTXHotSpotRuntime, HSAILRuntime to use the delegating interface instead of subclassing HotSpotRuntime? I'm trying to understand if there's value in such a patch before the rest of the refactoring changes are in place to introduce multiple backends and get rid of the duplicate classes (PTXHotSpotBackend ...), and if so, how you would like this done in a self-contained manner while the architecture is in flux.
>  
> 2)      You mention,
> "We want to have an API where there is exactly one HotSpotGraalRuntime instance that manages multiple backends."
>  
> Does this mean that PTXHotSpotGraalRuntime and AMD64HotSpotGraalRuntime should also go away?
>  
> Vasanth
>  
> From: Doug Simon [mailto:doug.simon at oracle.com] 
> Sent: Thursday, October 03, 2013 9:25 AM
> To: Venkatachalam, Vasanth
> Cc: graal-dev at openjdk.java.net; Bharadwaj Yadavalli
> Subject: Re: questions about decoupling GPU target runtimes from host runtime
>  
> 
> On Oct 2, 2013, at 10:48 PM, "Venkatachalam, Vasanth" <Vasanth.Venkatachalam at amd.com> wrote:
> 
> > Doug-
> >  
> > We're scoping out the changes needed to isolate the GPU target runtimes (HSAIL, PTX) from the host runtime, so that the host runtime (e.g., AMD64HotspotRuntime) does not get instantiated when we generate code for HSAIL or PTX. This is in part intended to address the comments you sent in sent about http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-r15changes/webrev/  (forwarded below). We had some questions below.
> >  
> > 1)      All of the platform-specific runtime classes defined so far (PTXHotSpotRuntime, AMD64HotSpotRuntime) are subclassing HotSpotRuntime, which implements the CodeCacheProvider interface. If we understood you correctly, the GPU target runtimes (PtX, HSAIL) should not subclass HotSpotRuntime, because in doing so they would be inheriting code that is HotSpot specfic and not relevant to the GPU target. They should instead be independent classes which extend the DelegatingCodeCacheProvider class and delegate to HotSpotRuntime any functionality that is not target GPU specific.  Is this what you had in mind?
> 
> Yes.
> 
> > 2)      The routine graalRuntime( ) in graalVMToCompiler.cpp is checking to see if the UseGPU flag is enabled and if it is (and if the GPU target is PTX) it is instantiating PTXHotSpotGraalRuntime. It sounds like we need to add some logic to check if the GPU target is HSAIL and if so to instantiate the HSAIL runtime class. Am I on the right track?
> 
> I'm not so sure that this GPU detection and initialization code is general enough. I can envisage a system with two or more different GPU types available. I've commented on this while reviewing a proposal from Bharadwaj on better support for GPU backends in Graal. I've attached the details of this review (in our internal Crucible instance).
> 
> > a.      Can you send us the command line to use to check that the UseGPU functionality is working? When I run a PTX test case in NetBeans with -XX:+UseGPU the JVM crashes even before it gets to generate any code.
> 
> I've never run with UseGPU enabled. Hopefully Bharadwaj, Morris or Christian (Thalinger) can address this question.
> 
> > 3)      PTXHotSpotGraalRuntime is creating a PTXHotSpotBackend which looks similar to PTXBackend. What is the purpose of having both the PTXHotSpotBackend and a PTXBackend? It looks like these two classes are duplicating some functionality. Do we need to have an HSAILHotSpotBackend in addition to HSAILBackend? I have a similar question for PTXHotSpotRegisterConfiguration vs. PTXRegisterConfiguration. How should we deal with these pairs of classes in the refactoring you're recommending in 1)? It seems to me like these PTXHotspot* classes should go away if we're truly decoupling the GPU target runtimes from Hotspot.
> 
> Yes, they should go away. Hopefully the attached review shows a sketch of how we are thinking this could be achieved.
> 
> > 4)      Is there a reason why we need both PTXHotspotGraalRuntime in addition to PtXHotSpotRuntime instead of just having a single class that combines the work of both?
> 
> We want to have an API where there is exactly one HotSpotGraalRuntime instance that manages multiple backends. 
> 
> -Doug
>  
> 
> 
> > -----Original Message-----
> > From: Doug Simon [mailto:doug.simon at oracle.com] 
> > Sent: Wednesday, September 11, 2013 9:54 AM
> > To: Venkatachalam, Vasanth
> > Cc: graal-dev at openjdk.java.net
> > Subject: Re: webrev: workaround for threadRegister handling
> >  
> > Hi Vasanth,
> >  
> > I think you are tackling this problem at the wrong level. All the logic that uses threadRegister and stackPointerRegister is in snippets. The point of snippets is they are the interface the compiler uses to do runtime-specific lowering. I somehow very much doubt that the snippets using these registers will make any sense in the context of the GPU. Even if you plan on implementing new/newarray/monitorenter/monitorexit etc on the GPU, the code will be quite different than that for HotSpot's "host" runtime. After all, they are very specific to HotSpot data structures such as thread local allocation buffers, G1 barriers, etc.
> >  
> > In my opinion, you need to have a GraalCodeCacheProvider implementation that does all the GPU specific lowering. To ensure you've got this separation right/complete, your GraalCodeCacheProvider subclass probably shouldn't even subclass HotSpotRuntime.

> >  
> > -Doug
> > 
> 
> <CR-1587.txt>