enhancements we'd like to see out of the graal redesign

Wed Oct 16 08:38:23 PDT 2013

Hi Doug,

You mentioned that Graal is being redesigned to support multiple GPU targets. I'd like to participate in the redesign discussion if possible. As you suggested, I've made an initial list of things we'd like to see come out of this redesign.  I know we talked about some of this already.

1.       We'd like to see a clear separation between the host runtime and the runtime of the target that we're generating code for, and the ability for the target runtime (or backend) to reuse data structures from the host if needed. Currently, AMD64HotSpotRuntime is being treated as the host and target runtime when we are generating code for HSAIL. This puts the execution in an inconsistent state where HSAIL registers are being used, but the target runtime is still being treated as AMD64. When generating code for a target other than the host runtime, the only place where we should have to rely on the host runtime is when we are reusing data structures defined in the host runtime.

2.       Related to 1), we need the ability to specify in a central location what the target runtime is (e.g., HSAIL) and have this change be automatically percolated to all parts of the code that are referring to the target runtime.

3.       As a result of the problem mentioned in 1), we're seeing errors and exceptions when we run HSAIL test cases. Two examples below.

a.       CFGPrinterObserver line 146 looks up the runtime to be AMD64HotSpotRuntime, and as a result CompilationPrinter.debugInfoToString (line 124) gets an ArrayIndexOutOfBoundsException when it tries to lookup an HSAIL register number in an array of AMD64 registers (which has fewer registers than HSAIL). This is causing test cases that exercise a lot of HSAIL registers  to fail when we run them with flags to dump a data file for the C1visualizer.

b.      The other example is that test cases involving method calls are causing some exception handling snippets to be invoked. The AMD64 definition of these snippets gets loaded and looks for a threadregister (which we haven't specified in HSAIL) and this leads to an assertion error.

              As mentioned earlier, we would like the ability to fix all such problems by making changes to a central location, as opposed to having to code around them in several places. For example, you mentioned one possible solution would be the ability to pass in an HSAIL specific CodeCacheProvider to GraalCompiler.compileGraph(), which in turn causes the right target runtime to be percolated to all of these code regions.. We'd like the redesign to make such a centralized solution possible.

4.       Currently, Graal is building up a superset of intrinsics for the host runtime (x86) and allowing the different backends to filter off of that list.  For the case of supporting GPU targets, we'd like a way for each backend to define its own intrinsics that may not be part of the x86 intrinsics.

5.       We need to be able to declare our own snippets without affecting the AMD64 snippets.

a.       Similarly our own Replacements

6.       We may have a need to define our own new nodes in the HSAIL backend, for example used by our own snippets.  It would be preferable if we can do this in a way without having to define NYI node handlers for that node in all the other backends.

7.       Since there are multiple GPU targets (.e.g, HSAIL, PTX), we need to make sure the infrastructure supports multiple ISA targets, not just one.

Vasanth

-----Original Message-----
From: Doug Simon [mailto:doug.simon at oracle.com]
Sent: Tuesday, October 08, 2013 1:08 PM
To: Venkatachalam, Vasanth
Cc: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>; dl.Runtimes
Subject: Re: handling of Math intrinsics for multiple GPU targets

This is obviously something else that needs to be vectored to each backend, allowing each to make their own decision as you say. It will be factored into the redesign currently going on. Please let us know of other abstractions like this that need to be broadened or exposed to each backend.

On Oct 8, 2013, at 6:11 PM, "Venkatachalam, Vasanth" <Vasanth.Venkatachalam at amd.com<mailto:Vasanth.Venkatachalam at amd.com>> wrote:

> Hi,

>

> I noticed that Graal is building a superset of math intrinsics for the host runtime  (x86) and then filtering out some of these methods from being intrinsified based on the value of a config parameter (e.g., config.usePopCountIinstruction, config.useAESIntrinsics, etc.).

>

> In more detail when the VM first starts up in VMToCompilerImpl.start() it gets the host runtime (which is x86) and builds a superset of intrinsics for that runtime by calling GraalMethodSubstitutions.registerReplacements( ). This in turn processes a class file MathSubstitutionsx86.class to get a list of math routines to be intrinsified, filters out some of these routines (via a call to HotSpotReplacementsImpl.registerMethodSubstitution()) and adds the remaining ones to a HashMap called registeredMethodSubstitutions.

>

> For the case of supporting multiple GPU targets, it sounds like this logic is the reverse of what we need. Instead of building a superset of intrinsics for x86 and filtering them for the target runtime, we need a way for each target runtime (e.g., HSAIL) to specify its own list of supported intrinsics. Has anyone thought about how this should be handled?

>

> Vasanth

>