actions -- Rebuilding the Interpreter Frames on the GPU

Gilles Duboscq duboscq at ssw.jku.at
Tue Jan 28 10:29:09 PST 2014


Tom,

After further thinking, discussing and hacking into HotSpot, I think
we've finally arrived to a reasonable battle plan. We have turned the
problem around and the plan is to use a combination of something that
looks like OSR and deoptimization:
- Around the end of the compilation (just before going to LIR), I
create a new graph based on the current graph:
  - It gets 2 arguments a long (a pointer actually), and an int
  - For each deopt in the original graph there is a unique int, the
first thing this new graph does is a switch on this int.
  - After this switch, it reads all the values necessary for the
deopt's framestates from this long pointer (which probably simply
points to the HSAILFrame)
  - It then directly deopts from there.
- When a deopt happens on the GPU, we do a JavaCall using something
like JavaCalls::call_helper (javaCalls.cpp) with an additional
argument for the entry point

I think doing deopt this way will avoid us a lot of problem because:
- we don't need to modify any of HotSpot's deopt code
- the frames and nmethods involved look perfectly normal to HotSpot

My plan is:
- make it possible for ExternalCompilationResult to contain both the
External part (HSAIL things) and the host part (the code coming from
this second graph)
- Hook somewhere in the HSAIL backend to generate this second graph,
compile it using the Host backend and combine the HSAIL and host
results in the ExternalCompilationResult
- Install this ExternalCompilationResult correctly in the code cache
- Implement the final calling to JavaCalls::call_helper in gpu_hsail.cpp

-Gilles

On Tue, Jan 28, 2014 at 2:49 PM, Gilles Duboscq <duboscq at ssw.jku.at> wrote:
> On Mon, Jan 27, 2014 at 8:35 PM, Tom Deneau <tom.deneau at amd.com> wrote:
>> Gilles --
>>
>> I took a look at your diff file and it seems we are mostly headed in
>> the right direction.
>>
>> Regarding this paragraph
>>> Right now i'm trying to see how i can modify fetch_unroll_info_helper to
>>> minimise its relying on frames. This needs quite a bit of refactoring.
>>> Part of this also requires figuring out exactly what will be the frame
>>> layout when we will call it. I suppose that to avoid to many changes we
>>> can call a stub similar to the deopt/uncommon_trap stub from
>>> sharedRuntime_x86_64.cpp.
>>>
>>
>> I was assuming the frame layout would be what the HSAILFrame structure shows.
>> For now there will only be one level of HSAILFrame and we will always have 32 saved
>> $s registers, 16 saved $d registers, even if some are not necessary, but the HSAILFrame
>> has provisions for saving fewer.
>
> Yes but in the deoptimization code HotSpot expects frame values
> (frame.hpp), and frame is a platform specific class (see frame_x86.hpp
> and friends). I'm not sure we really win something by making the HSAIL
> frames look the same as the host architecture: that would require some
> changes and there are still assumptions that these frames are on the
> stack.
>
>>
>> If there are other layouts for HSAILFrame that make this easier, let me know.
>>
>> Also, I'm not sure what you mean by "call a stub similar to the deopt/uncommon_trap
>> stub from sharedRuntime_x86_64.cpp".
>
> Deoptimization::fetch_unroll_info_helper makes some assumptions on the
> layout of the frames leading to it. For example expects to be called
> from a stub: either the deopt_blob
> (SharedRuntime::generate_deopt_blob) or the uncommon_trap_blob
> (SharedRuntime::generate_uncommon_trap_blob).
> I was talking about this with Tom Rodriguez and what we probably want
> is to do a standard JavaCall which would land on such a stub, this
> would make it easier to end up with a valid-looking/walk-able stack.
>
>>
>> -- Tom
>>
>>
>>> -----Original Message-----
>>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
>>> Gilles Duboscq
>>> Sent: Friday, January 24, 2014 12:07 PM
>>> To: Deneau, Tom
>>> Subject: Re: actions -- Rebuilding the Interpreter Frames on the GPU
>>>
>>> Hello Tom,
>>>
>>> I'm sending you my current diff, mostly for you information because it
>>> probably wouldn't compile or run.
>>>
>>> For the deopt process what we need to do is:
>>> -Get the UnrollBlock from Deoptimization::fetch_unroll_info_helper
>>> -Rebuild the "skeletal frames" (walkable and with PCs but no values)
>>> using this UnrollBlock (see for example sharedRuntime_x86_64.cpp
>>> starting around line 3530) -Run Deoptimization::unpack_frames which will
>>> fill the skeletal frames with values using the UnrollBlock
>>>
>>> This work relies on vframes (here compiledVFrames) corresponding to the
>>> java frames that are contained in the method that just deoptimized.
>>> Usually theses vframes reference a particular frame (from frame.hpp,
>>> i.e. a physical frame from the host machine).
>>> Sub-classing frame is not really possible (I spent some time looking at
>>> that but that doesn't seem reasonable) but subclassing compiledVFrame
>>> should be easy, that's what i did in HsailCompiledVFrame.
>>> HsailCompiledVFrame references the HSAILFrame and uses it in
>>> HsailCompiledVFrame::create_stack_value which is what creates
>>> StackValues which are later used to retrieve the data.
>>>
>>> Right now i'm trying to see how i can modify fetch_unroll_info_helper to
>>> minimise its relying on frames. This needs quite a bit of refactoring.
>>> Part of this also requires figuring out exactly what will be the frame
>>> layout when we will call it. I suppose that to avoid to many changes we
>>> can call a stub similar to the deopt/uncommon_trap stub from
>>> sharedRuntime_x86_64.cpp.
>>>
>>> A few questions:
>>> why would there be multiple HSAILFrame? Is there a stack and method
>>> calls in HSAIL? if that's not the case then HSAILFrame should be an
>>> HSAIL equivalant of frame: only one frame since there is only one
>>> physical frame.
>>> I'm not entirely sure why we need the HSAILLocation. It's useful now
>>> during development but I suppose it should not be needed any more once
>>> we go through the StackValues. Did you have a specific use in mind
>>> beyond development tests?
>>>
>>> -Gilles
>>>
>>> On Thu, Jan 23, 2014 at 10:10 PM, Gilles Duboscq <duboscq at ssw.jku.at>
>>> wrote:
>>> > Hello Tom,
>>> >
>>> > I've been working on this and by now i'm not really convinced i will
>>> > get something useful enough for tomorrow.
>>> > I'll share the state of my patch/findings with you tomorrow anyway but
>>> > I'll probably need more work.
>>> >
>>> > Sorry about that, I knew this deoptimization code is complicated but
>>> > using a non-physical frame(i.e. not a frame from the platform's native
>>> > ABI) is more complicated than i thought.
>>> >
>>> > -Gilles
>>> >
>>> > On Mon, Jan 20, 2014 at 8:14 PM, Tom Deneau <tom.deneau at amd.com>
>>> wrote:
>>> >> Thanks, Gilles.
>>> >>
>>> >>> -----Original Message-----
>>> >>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
>>> >>> Gilles Duboscq
>>> >>> Sent: Monday, January 20, 2014 12:29 PM
>>> >>> To: Deneau, Tom
>>> >>> Subject: Re: actions -- Rebuilding the Interpreter Frames on the GPU
>>> >>>
>>> >>> Hello Tom,
>>> >>>
>>> >>> Yes i've looked at your webrev.
>>> >>> Thank you.
>>> >>>
>>> >>> I also looked at the hotspot code and I have a rough idea of what is
>>> >>> needed.
>>> >>> Sorry for the late answer, I have a lot of things on my stack right
>>> now.
>>> >>>
>>> >>> I intend to look at it this week and i hope to have at least
>>> >>> something that you can experiment with on friday.
>>> >>>
>>> >>> -Gilles
>>> >>>
>>> >>> On Fri, Jan 17, 2014 at 10:23 PM, Tom Deneau <tom.deneau at amd.com>
>>> wrote:
>>> >>> > Hi Gilles --
>>> >>> >
>>> >>> > I assume you saw the notice of the webrev I uploaded that can be
>>> >>> inspected
>>> >>> > (and also can be built, although we are not proposing it for
>>> >>> > check-
>>> >>> in).
>>> >>> >
>>> >>> > http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-
>>> >>> debuginfo-for-gilles/webrev/
>>> >>> >
>>> >>> >
>>> >>> > To help with our internal planning, can you give us a rough
>>> >>> > estimate
>>> >>> of how far
>>> >>> > away the frame rebuilding interface might be?
>>> >>> >
>>> >>> > -- Tom
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> >> -----Original Message-----
>>> >>> >> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf
>>> >>> >> Of Gilles Duboscq
>>> >>> >> Sent: Wednesday, January 15, 2014 4:38 AM
>>> >>> >> To: Deneau, Tom
>>> >>> >> Cc: Doug Simon; graal-dev at openjdk.java.net
>>> >>> >> Subject: Re: actions -- Rebuilding the Interpreter Frames on the
>>> >>> >> GPU
>>> >>> >>
>>> >>> >> Hello Tom,
>>> >>> >>
>>> >>> >> It's on my list, i already had a closer look at the frame
>>> >>> >> rebuilding code.
>>> >>> >> I would be interested to have a look at the code of your
>>> >>> CodeInstaller
>>> >>> >> subclass and the code you use to retrieve the runtime values so
>>> >>> >> that
>>> >>> i
>>> >>> >> can experiment with it.
>>> >>> >>
>>> >>> >> -Gilles
>>> >>> >>
>>> >>> >> On Mon, Jan 13, 2014 at 5:09 PM, Tom Deneau <tom.deneau at amd.com>
>>> >>> wrote:
>>> >>> >> > Gilles, Doug --
>>> >>> >> >
>>> >>> >> > A status update on our end...
>>> >>> >> >
>>> >>> >> >    * We now generate HSAIL code to save the register state at
>>> >>> >> > deopt
>>> >>> >> points
>>> >>> >> >
>>> >>> >> >    * We have an HSAIL-specific CodeInstaller class based on the
>>> >>> >> changes
>>> >>> >> >      Doug added and we use this at compile time (code-install
>>> >>> >> > time)
>>> >>> to
>>> >>> >> >      build the ScopeDescs.  (This avoids the host-register
>>> >>> >> > specific
>>> >>> >> code
>>> >>> >> >      in the base CodeInstaller class).
>>> >>> >> >
>>> >>> >> >    * At runtime, if we detect that a workitem deopted, we map
>>> >>> >> > the
>>> >>> >> saved "HSAIL pc"
>>> >>> >> >      to the relevant ScopeDesc and use each Location item in
>>> >>> >> > the
>>> >>> >> ScopeDesc
>>> >>> >> >      to retrieve the relevant HSAIL register from the HSAIL
>>> >>> >> > frame
>>> >>> >> (where the
>>> >>> >> >      registers were saved).
>>> >>> >> >
>>> >>> >> > Right now we just print out the live locals or expression stack
>>> >>> values
>>> >>> >> > for the deopted workitem and they look correct.  The next step
>>> >>> would
>>> >>> >> be
>>> >>> >> > to rebuild the interpreter frames.
>>> >>> >> >
>>> >>> >> > Can I get an update on the "C++ changes needed to easily
>>> >>> >> > rebuild
>>> >>> the
>>> >>> >> > interpreter frames from a raw buffer provided by the GPU".
>>> >>> >> >
>>> >>> >> > -- Tom
>>> >>> >> >
>>> >>> >> >
>>> >>> >> >
>>> >>> >> >
>>> >>> >> >> -----Original Message-----
>>> >>> >> >> From: graal-dev-bounces at openjdk.java.net [mailto:graal-dev-
>>> >>> >> >> bounces at openjdk.java.net] On Behalf Of Gilles Duboscq
>>> >>> >> >> Sent: Friday, December 20, 2013 4:31 AM
>>> >>> >> >> To: Doug Simon
>>> >>> >> >> Cc: graal-dev at openjdk.java.net
>>> >>> >> >> Subject: Re: actions
>>> >>> >> >>
>>> >>> >> >> As for me, I'll look into the C++ changes needed to easily
>>> >>> >> >> rebuild
>>> >>> >> the
>>> >>> >> >> interpreter frames from a raw buffer provided by the GPU
>>> >>> >> >> during deoptimization.
>>> >>> >> >>
>>> >>> >> >> -Gilles
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> On Thu, Dec 19, 2013 at 11:27 PM, Doug Simon
>>> >>> <doug.simon at oracle.com>
>>> >>> >> >> wrote:
>>> >>> >> >>
>>> >>> >> >> > As a result of the Sumatra Skype meeting today on the topic
>>> >>> >> >> > of
>>> >>> how
>>> >>> >> to
>>> >>> >> >> > handle deopt for HSAIL & PTX, I’ve signed up to investigate
>>> >>> changes
>>> >>> >> in
>>> >>> >> >> > the
>>> >>> >> >> > C++ layer of Graal to accommodate installing code whose
>>> >>> >> >> > C++ debug
>>> >>> info
>>> >>> >> is
>>> >>> >> >> > C++ not
>>> >>> >> >> > in terms of host machine state (e.g. uses a different
>>> >>> >> >> > register
>>> >>> set
>>> >>> >> >> > than the host register set).
>>> >>> >> >> >
>>> >>> >> >> > -Doug
>>> >>> >> >> >
>>> >>> >> >> > On Dec 19, 2013, at 11:02 PM, Deneau, Tom
>>> >>> >> >> > <tom.deneau at amd.com>
>>> >>> >> wrote:
>>> >>> >> >> >
>>> >>> >> >> > > Gilles, Doug --
>>> >>> >> >> > >
>>> >>> >> >> > > Could you post to the graal-dev list what the two action
>>> >>> >> >> > > items
>>> >>> >> you
>>> >>> >> >> > > took
>>> >>> >> >> > were?
>>> >>> >> >> > >
>>> >>> >> >> > > -- Tom
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >
>>> >>> >
>>> >>


More information about the graal-dev mailing list