Webrev for throwing some exceptions from HSAIL

Wed Dec 18 06:56:46 PST 2013

As Chris mentioned, let's try to come up with a workitem model that 
relates to the deoptimizations, etc.

GPU devices process elements in wavefronts or warps which are often size 
32 or 64 workitems at once. Devices can usually execute several 
wavefronts simultaneously.

All the workitems in each wavefront are processed in lock-step unless 
there is divergence, in which case each side of a branch will be 
predicated off in turn until the branches reunite.

So at any given time, there can be some wavefronts that already 
definitely completed, some that have not started yet, and some that are 
in flight.

Since the total job size being offloaded could be enormous, I think we 
should keep track of the workitems/wavefronts currently in flight at any 
given moment if we need to deoptimize or safepoint etc. This will 
normally be hundreds or thousands of workitems at once for current 
hardware. We know the current workitem id in the job for any given GPU 
core, and we know how many total GPU cores are on a device, so with that 
we can work out what is the state of partially completed work if there 
is a deopt etc.

We can prevent future wavefronts from starting by explicit checks, and 
workitems that already completed, at least in the case of HSA for the 
use cases we have working, their results are already stored back in the 
heap so that should not require any more attention.

I am starting to experiment with this on our hardware. Will this idea 
work on PTX? It seems like it will work for HSA.

Let me know your ideas.
Thanks,
Eric

On 12/10/2013 02:27 PM, Christian Thalinger wrote:
>
> On Dec 10, 2013, at 9:31 AM, Gilles Duboscq <duboscq at ssw.jku.at> wrote:
>
>> On Tue, Dec 10, 2013 at 5:41 PM, Tom Deneau <tom.deneau at amd.com> wrote:
>>
>>> Gilles --
>>>
>>> Some comments below
>>>
>>> -- Tom
>>>
>>>> -----Original Message-----
>>>> From: graal-dev-bounces at openjdk.java.net [mailto:graal-dev-
>>>> bounces at openjdk.java.net] On Behalf Of Gilles Duboscq
>>>> Sent: Tuesday, December 10, 2013 9:05 AM
>>>> To: Doug Simon
>>>> Cc: Caspole, Eric; graal-dev at openjdk.java.net
>>>> Subject: Re: Webrev for throwing some exceptions from HSAIL
>>>>
>>>> Hi Eric,
>>>>
>>>> The deoptimization mechanism and the exception mechanism are two
>>>> completely different things.
>>>> When a deoptimization happens, your only choice is to restart execution
>>> in the interpreter.
>>>>
>>>> To do so, the deoptimization points are decorated with LIRFrameState in
>>> the
>>>> backend. This LIRFrameState gives you information that allow to rebuild
>>> the
>>>> interpreter frames. For each frame you have a method, a bci, the stack
>>> and
>>>> local values and the owned monitors.
>>>>
>>>
>>> Yes, this was indeed a baby step.
>>>
>>> We realized with graal's policy of providing the framestate back at the
>>> last
>>> side-effecting bytecode means we don't get even get the exact bci on
>>> deoptimizations.
>>>
>>> Our plan for the next phase is to provide enough information and
>>> actually build up the correct interpreter frames back and restart
>>> execution in the interpreter back on the host side.
>>>
>>
>> OK, but i wonder what are the semantics there since the data has already
>> been partially modified by the other kernel workitems which didn't
>> deoptimize.
>> In general when we think about parallelization we consider that it's only
>> safe to parallelize sections where there is no deoptimization since this
>> means we are guaranteed not to deopt to a state of the data that is invalid
>> for the interpreter.
>> I suppose in the case of streams you are relying on the relaxed semantics
>> of parallel streams.
>> How are you going to ensure that all element are processed in the case of
>> deoptimization?
>
> Correct.  This is something we have talked about already in the Sumatra scope.  Before we can throw exceptions in GPUs we have to have some kind of work item logic.  I think trying to throw exceptions without having properly set up work items and a way to undo or record which items have already been processed is premature and not something we want.
>
> We need to resume talking about work items and start to implement it.
>
>>
>>