enhancements we'd like to see out of the graal redesign

Thu Oct 17 11:15:05 PDT 2013

On 10/17/2013 08:12 PM, Deneau, Tom wrote:
> Doug --
>
> I don't know whether there will be snippets we can use exactly as is including all the @Fold methods they call.
>
> But I do envision us writing our own snippets which might use some of the existing @Fold methods and also might define some of our own @Fold methods.  Would that be possible?

Writing your own snippets is not only possible, but recommended ;-) You can also use any @Fold methods that don't access host backend information (or any information that is not valid when executing on the GPU).

-Doug

> -----Original Message-----
> From: Doug Simon [mailto:doug.simon at oracle.com]
> Sent: Thursday, October 17, 2013 12:00 PM
> To: Deneau, Tom; Venkatachalam, Vasanth
> Cc: dl.Runtimes; graal-dev at openjdk.java.net
> Subject: Re: enhancements we'd like to see out of the graal redesign
>
> Consider the snippet below (in NewObjectSnippets.java) for object allocation:
>
>       @Snippet
>       public static Word allocate(int size) {
>           Word thread = thread();
>           Word top = readTlabTop(thread);
>           Word end = readTlabEnd(thread);
>           Word newTop = top.add(size);
>           /*
>            * this check might lead to problems if the TLAB is within 16GB of the address space end
>            * (checked in c++ code)
>            */
>           if (probability(FAST_PATH_PROBABILITY, newTop.belowOrEqual(end))) {
>               writeTlabTop(thread, newTop);
>               return top;
>           }
>           return Word.zero();
>       }
>
> These are the support methods in HotSpotReplacementsUtil called by the first line of the snippet:
>
>       public static Word thread() {
>           return registerAsWord(threadRegister(), true, false);
>       }
>
>       @Fold
>       public static Register threadRegister() {
>           return runtime().getHostProviders().getRegisters().getThreadRegister();
>       }
>
> That last @Fold annotated method effectively binds the snippet to the host backend.
>
> We could fix this as follows:
>
>       @Snippet
>       public static Word allocate(int size, @ConstantParameter Register threadReg) {
>           Word thread = registerAsWord(threadReg, true, false);
>           ...
>       }
>
> However, before making this change, I'd like to be sure that this snippet and others like are actually usable when compiling for the HSAIL backend.
> To ascertain that, I'm suggesting you make a copy of the snippet code (and their associated infrastructure) and modify it to use HSAIL-specific @Fold methods.
> The readTlabTop/readTlabEnd/writeTlabTop methods also use @Fold helpers to get the various TLAB offsets. Are they going to be reusable as is for HSAIL?
>
> -Doug
>
> On 10/17/2013 06:13 PM, Deneau, Tom wrote:
>> Doug --
>>
>> Not sure I understand this statement
>>
>>      To prevent us from holding you up, you could simply copy the existing snippets you think are reusable in the HSAIL backend.
>>      Of course, you'd need to redirect them to alternative @Fold utility methods.
>>
>> I'm not sure of the semantics of the @Fold annotation but would we not be able to reuse some of the existing @Fold utility methods if appropriate?
>>
>> -- Tom
>>
>>
>> -----Original Message-----
>> From: Doug Simon [mailto:doug.simon at oracle.com]
>> Sent: Thursday, October 17, 2013 9:46 AM
>> To: Venkatachalam, Vasanth
>> Cc: dl.Runtimes; graal-dev at openjdk.java.net
>> Subject: Re: enhancements we'd like to see out of the graal redesign
>>
>> On 10/17/2013 01:57 AM, Doug Simon wrote:
>>> Hi Vasanth,
>>>
>>> I spent the last few days on this and am pushing a changeset now that
>>> gets us a lot closer to where we want to be. The general idea (and
>>> implementation) is that there is a single HotSpotGraalRuntime that
>>> supports a host backend as well any number of extra backends.
>>>
>>> More inline below:
>>>
>>> On 10/16/2013 05:38 PM, Venkatachalam, Vasanth wrote:
>>>>
>>>> Hi Doug,
>>>>
>>>> You mentioned that Graal is being redesigned to support multiple GPU
>>>> targets. I'd like to participate in the redesign discussion if
>>>> possible. As you suggested, I've made an initial list of things we'd
>>>> like to see come out of this redesign.  I know we talked about some
>>>> of this already.
>>>>
>>>> 1.We'd like to see a clear separation between the host runtime and
>>>> the runtime of the target that we're generating code for, and the
>>>> ability for the target runtime (or backend) to reuse data structures
>>>> from the host if needed. Currently, AMD64HotSpotRuntime is being
>>>> treated as the host and target runtime when we are generating code
>>>> for HSAIL. This puts the execution in an inconsistent state where
>>>> HSAIL registers are being used, but the target runtime is still being treated as AMD64.
>>>> When generating code for a target other than the host runtime, the
>>>> only place where we should have to rely on the host runtime is when
>>>> we are reusing data structures defined in the host runtime.
>>>>
>>>> 2.Related to 1), we need the ability to specify in a central
>>>> location what the target runtime is (e.g., HSAIL) and have this
>>>> change be automatically percolated to all parts of the code that are
>>>> referring to the target runtime.
>>>>
>>>
>>> This is partly done. I have still yet to work out how to make
>>> snippets be reusable across different backends but have some ideas I
>>> will investigate further tomorrow.
>>
>> The only way to do this is to pass all relevant configuration information into this snippets as @ConstantParameter arguments. This means every snippet that directly or indirectly uses a @Fold method (e.g., in HotSpotReplacementUtils) needs to be modified.
>>
>> Before making this investment, it would be good know which of the existing snippets you think could be used by the HSAIL backend. The same analysis would apply for the PTX.
>>
>>>> 3.As a result of the problem mentioned in 1), we're seeing errors
>>>> and exceptions when we run HSAIL test cases. Two examples below.
>>>>
>>>> a.CFGPrinterObserver line 146 looks up the runtime to be
>>>> AMD64HotSpotRuntime, and as a result
>>>> CompilationPrinter.debugInfoToString (line 124) gets an
>>>> ArrayIndexOutOfBoundsException when it tries to lookup an HSAIL
>>>> register number in an array of AMD64 registers (which has fewer
>>>> registers than HSAIL). This is causing test cases that exercise a
>>>> lot of HSAIL registers  to fail when we run them with flags to dump
>>>> a data file for the C1visualizer.
>>
>> This should now be fixed.
>>
>>>> b.The other example is that test cases involving method calls are
>>>> causing some exception handling snippets to be invoked. The AMD64
>>>> definition of these snippets gets loaded and looks for a
>>>> threadregister (which we haven't specified in HSAIL) and this leads
>>>> to an assertion error.
>>>>
>>>>                 As mentioned earlier, we would like the ability to
>>>> fix all such problems by making changes to a central location, as
>>>> opposed to having to code around them in several places. For
>>>> example, you mentioned one possible solution would be the ability to
>>>> pass in an HSAIL specific CodeCacheProvider to
>>>> GraalCompiler.compileGraph(), which in turn causes the right target
>>>> runtime to be percolated to all of these code regions.. We'd like
>>>> the redesign to make such a centralized solution possible.
>>>>
>>>
>>> You'll see in the recent changes that we are moving in this direction.
>>>>
>>>> 4.Currently, Graal is building up a superset of intrinsics for the
>>>> host runtime (x86) and allowing the different backends to filter off
>>>> of that list.  For the case of supporting GPU targets, we'd like a
>>>> way for each backend to define its own intrinsics that may not be
>>>> part of the x86 intrinsics.
>>
>> That should now be solved with the new HSAILHotSpotReplacementsImpl and HSAILHotSpotLoweringProvider classes.
>>
>>>> 5.We need to be able to declare our own snippets without affecting
>>>> the
>>>> AMD64 snippets.
>>
>> To prevent us from holding you up, you could simply copy the existing snippets you think are reusable in the HSAIL backend. Of course, you'd need to redirect them to alternative @Fold utility methods.
>>
>>>> a.Similarly our own Replacements
>>
>> That should now be supported as described above.
>>
>>>> 6.We may have a need to define our own new nodes in the HSAIL
>>>> backend, for example used by our own snippets.  It would be
>>>> preferable if we can do this in a way without having to define NYI
>>>> node handlers for that node in all the other backends.
>>>>
>>> As long as the nodes are defined and used only in HSAIL projects,
>>> there will be no need for NYI place holder code anywhere.
>>>>
>>>> 7.Since there are multiple GPU targets (.e.g, HSAIL, PTX), we need
>>>> to make sure the infrastructure supports multiple ISA targets, not just one.
>>>>
>>> Done.
>>
>> -Doug
>>
>>>> -----Original Message-----
>>>> From: Doug Simon [mailto:doug.simon at oracle.com]
>>>> Sent: Tuesday, October 08, 2013 1:08 PM
>>>> To: Venkatachalam, Vasanth
>>>> Cc: graal-dev at openjdk.java.net <mailto:graal-dev at openjdk.java.net>;
>>>> dl.Runtimes
>>>> Subject: Re: handling of Math intrinsics for multiple GPU targets
>>>>
>>>> This is obviously something else that needs to be vectored to each
>>>> backend, allowing each to make their own decision as you say. It
>>>> will be factored into the redesign currently going on. Please let us
>>>> know of other abstractions like this that need to be broadened or
>>>> exposed to each backend.
>>>>
>>>> On Oct 8, 2013, at 6:11 PM, "Venkatachalam, Vasanth"
>>>> <Vasanth.Venkatachalam at amd.com
>>>> <mailto:Vasanth.Venkatachalam at amd.com>>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>
>>>>>
>>>>
>>>>> I noticed that Graal is building a superset of math intrinsics for
>>>> the host runtime  (x86) and then filtering out some of these methods
>>>> from being intrinsified based on the value of a config parameter
>>>> (e.g., config.usePopCountIinstruction, config.useAESIntrinsics, etc.).
>>>>
>>>>>
>>>>
>>>>> In more detail when the VM first starts up in
>>>> VMToCompilerImpl.start() it gets the host runtime (which is x86) and
>>>> builds a superset of intrinsics for that runtime by calling
>>>> GraalMethodSubstitutions.registerReplacements( ). This in turn
>>>> processes a class file MathSubstitutionsx86.class to get a list of
>>>> math routines to be intrinsified, filters out some of these routines
>>>> (via a call to HotSpotReplacementsImpl.registerMethodSubstitution())
>>>> and adds the remaining ones to a HashMap called
>>>> registeredMethodSubstitutions.
>>>>
>>>>>
>>>>
>>>>> For the case of supporting multiple GPU targets, it sounds like
>>>>> this
>>>> logic is the reverse of what we need. Instead of building a superset
>>>> of intrinsics for x86 and filtering them for the target runtime, we
>>>> need a way for each target runtime (e.g., HSAIL) to specify its own
>>>> list of supported intrinsics. Has anyone thought about how this
>>>> should be handled?
>>>>
>>>>>
>>>>
>>>>> Vasanth
>>>>
>>>>>
>>>>
>>>
>>
>>
>
>