actions -- Rebuilding the Interpreter Frames on the GPU

Deneau, Tom tom.deneau at amd.com
Wed Jan 29 10:21:36 PST 2014


Gilles --

I pushed an updated version of the webrev to
http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-debuginfo-for-gilles-v2/webrev/

As with the previous one, not proposing that this gets checked in
but it should provide a basis for your experiments.

There haven't been any big structural changes since the first one.
This one has merged with the latest default on Jan 29, which includes
Doug Simon's patch to get rid of HSAILCompilationResult and use
backend.CompileKernel instead.

The junits, including the new ones based on bounds checks, etc should pass
when run with the hsail simulator.

Let me know if your run into any problems with this..

-- Tom


> -----Original Message-----
> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
> Gilles Duboscq
> Sent: Wednesday, January 29, 2014 6:36 AM
> To: Deneau, Tom
> Cc: graal-dev at openjdk.java.net
> Subject: Re: actions -- Rebuilding the Interpreter Frames on the GPU
> 
> Tom,
> 
> Do you have an updated version of the webrev I based my work on so far?
> Since I'm changing direction, it would probably be better if I base off
> a recent version.
> I think Doug is going to push some changes regarding multi-gpu support
> later this afternoon (CET), so it would probably be better if it can be
> based on something after that.
> 
> -Gilles
> 
> On Wed, Jan 29, 2014 at 12:07 AM, Gilles Duboscq <gilwooden at gmail.com>
> wrote:
> > Yes, it's all correct.
> > This host code basically only contains code to handle the GPU code's
> > depots which it handles by using ... depot again, but since we are on
> > the host now, depot there is very simple.
> >
> > On 28 Jan 2014 19:59, "Tom Deneau" <tom.deneau at amd.com> wrote:
> >>
> >> Gilles --
> >>
> >> I'm not sure I understand this 100% (and I can't say I understand how
> >> OSR works) but this sounds like a good goal to avoid modifying the
> >> hotspot deopt code, etc.
> >>
> >> So is the following correct?
> >>    * this second graph compiles to some funny host code which
> >>      gets invoked at runtime via javaCall when the gpu de-opts?
> >>      This host code is like a special compilation of the original
> >> kernel method.
> >>
> >>    * When the gpu sees a deopt and makes the javacall, it just
> >>      needs to pass the unique de-opt location (int)
> >>      and the set of saved gpu register/stack values.
> >>
> >>    * And the funny host code will set up all the locals, expressions,
> etc.
> >>      and then does a normal host deopt...
> >>
> >> If so, it sounds very clever... :)
> >>
> >> -- Tom
> >>
> >>
> >>
> >> > -----Original Message-----
> >> > From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
> >> > Gilles Duboscq
> >> > Sent: Tuesday, January 28, 2014 12:29 PM
> >> > To: Deneau, Tom
> >> > Cc: graal-dev at openjdk.java.net
> >> > Subject: Re: actions -- Rebuilding the Interpreter Frames on the
> >> > GPU
> >> >
> >> > Tom,
> >> >
> >> > After further thinking, discussing and hacking into HotSpot, I
> >> > think we've finally arrived to a reasonable battle plan. We have
> >> > turned the problem around and the plan is to use a combination of
> >> > something that looks like OSR and deoptimization:
> >> > - Around the end of the compilation (just before going to LIR), I
> >> > create a new graph based on the current graph:
> >> >   - It gets 2 arguments a long (a pointer actually), and an int
> >> >   - For each deopt in the original graph there is a unique int, the
> >> > first thing this new graph does is a switch on this int.
> >> >   - After this switch, it reads all the values necessary for the
> >> > deopt's framestates from this long pointer (which probably simply
> >> > points to the
> >> > HSAILFrame)
> >> >   - It then directly deopts from there.
> >> > - When a deopt happens on the GPU, we do a JavaCall using something
> >> > like JavaCalls::call_helper (javaCalls.cpp) with an additional
> >> > argument for the entry point
> >> >
> >> > I think doing deopt this way will avoid us a lot of problem
> because:
> >> > - we don't need to modify any of HotSpot's deopt code
> >> > - the frames and nmethods involved look perfectly normal to HotSpot
> >> >
> >> > My plan is:
> >> > - make it possible for ExternalCompilationResult to contain both
> >> > the External part (HSAIL things) and the host part (the code coming
> >> > from this second graph)
> >> > - Hook somewhere in the HSAIL backend to generate this second
> >> > graph, compile it using the Host backend and combine the HSAIL and
> >> > host results in the ExternalCompilationResult
> >> > - Install this ExternalCompilationResult correctly in the code
> >> > cache
> >> > - Implement the final calling to JavaCalls::call_helper in
> >> > gpu_hsail.cpp
> >> >
> >> > -Gilles
> >> >
> >> > On Tue, Jan 28, 2014 at 2:49 PM, Gilles Duboscq
> >> > <duboscq at ssw.jku.at>
> >> > wrote:
> >> > > On Mon, Jan 27, 2014 at 8:35 PM, Tom Deneau <tom.deneau at amd.com>
> >> > wrote:
> >> > >> Gilles --
> >> > >>
> >> > >> I took a look at your diff file and it seems we are mostly
> >> > >> headed in the right direction.
> >> > >>
> >> > >> Regarding this paragraph
> >> > >>> Right now i'm trying to see how i can modify
> >> > >>> fetch_unroll_info_helper to minimise its relying on frames.
> >> > >>> This
> >> > needs quite a bit of refactoring.
> >> > >>> Part of this also requires figuring out exactly what will be
> >> > >>> the frame layout when we will call it. I suppose that to avoid
> >> > >>> to many changes we can call a stub similar to the
> >> > >>> deopt/uncommon_trap stub from sharedRuntime_x86_64.cpp.
> >> > >>>
> >> > >>
> >> > >> I was assuming the frame layout would be what the HSAILFrame
> >> > structure shows.
> >> > >> For now there will only be one level of HSAILFrame and we will
> >> > >> always have 32 saved $s registers, 16 saved $d registers, even
> >> > >> if some are not necessary, but the HSAILFrame has provisions for
> saving fewer.
> >> > >
> >> > > Yes but in the deoptimization code HotSpot expects frame values
> >> > > (frame.hpp), and frame is a platform specific class (see
> >> > > frame_x86.hpp and friends). I'm not sure we really win something
> >> > > by making the HSAIL frames look the same as the host
> >> > > architecture: that would require some changes and there are still
> >> > > assumptions that these frames are on the stack.
> >> > >
> >> > >>
> >> > >> If there are other layouts for HSAILFrame that make this easier,
> >> > >> let
> >> > me know.
> >> > >>
> >> > >> Also, I'm not sure what you mean by "call a stub similar to the
> >> > >> deopt/uncommon_trap stub from sharedRuntime_x86_64.cpp".
> >> > >
> >> > > Deoptimization::fetch_unroll_info_helper makes some assumptions
> >> > > on the layout of the frames leading to it. For example expects to
> >> > > be called from a stub: either the deopt_blob
> >> > > (SharedRuntime::generate_deopt_blob) or the uncommon_trap_blob
> >> > > (SharedRuntime::generate_uncommon_trap_blob).
> >> > > I was talking about this with Tom Rodriguez and what we probably
> >> > > want is to do a standard JavaCall which would land on such a
> >> > > stub, this would make it easier to end up with a valid-
> looking/walk-able stack.
> >> > >
> >> > >>
> >> > >> -- Tom
> >> > >>
> >> > >>
> >> > >>> -----Original Message-----
> >> > >>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On
> >> > >>> Behalf Of Gilles Duboscq
> >> > >>> Sent: Friday, January 24, 2014 12:07 PM
> >> > >>> To: Deneau, Tom
> >> > >>> Subject: Re: actions -- Rebuilding the Interpreter Frames on
> >> > >>> the GPU
> >> > >>>
> >> > >>> Hello Tom,
> >> > >>>
> >> > >>> I'm sending you my current diff, mostly for you information
> >> > >>> because it probably wouldn't compile or run.
> >> > >>>
> >> > >>> For the deopt process what we need to do is:
> >> > >>> -Get the UnrollBlock from
> >> > >>> Deoptimization::fetch_unroll_info_helper
> >> > >>> -Rebuild the "skeletal frames" (walkable and with PCs but no
> >> > >>> values) using this UnrollBlock (see for example
> >> > >>> sharedRuntime_x86_64.cpp starting around line 3530) -Run
> >> > >>> Deoptimization::unpack_frames which will fill the skeletal
> >> > >>> frames with values using the UnrollBlock
> >> > >>>
> >> > >>> This work relies on vframes (here compiledVFrames)
> >> > >>> corresponding to the java frames that are contained in the
> >> > >>> method that just
> >> > deoptimized.
> >> > >>> Usually theses vframes reference a particular frame (from
> >> > >>> frame.hpp, i.e. a physical frame from the host machine).
> >> > >>> Sub-classing frame is not really possible (I spent some time
> >> > >>> looking at that but that doesn't seem reasonable) but
> >> > >>> subclassing compiledVFrame should be easy, that's what i did in
> >> > HsailCompiledVFrame.
> >> > >>> HsailCompiledVFrame references the HSAILFrame and uses it in
> >> > >>> HsailCompiledVFrame::create_stack_value which is what creates
> >> > >>> StackValues which are later used to retrieve the data.
> >> > >>>
> >> > >>> Right now i'm trying to see how i can modify
> >> > >>> fetch_unroll_info_helper to minimise its relying on frames.
> >> > >>> This
> >> > needs quite a bit of refactoring.
> >> > >>> Part of this also requires figuring out exactly what will be
> >> > >>> the frame layout when we will call it. I suppose that to avoid
> >> > >>> to many changes we can call a stub similar to the
> >> > >>> deopt/uncommon_trap stub from sharedRuntime_x86_64.cpp.
> >> > >>>
> >> > >>> A few questions:
> >> > >>> why would there be multiple HSAILFrame? Is there a stack and
> >> > >>> method calls in HSAIL? if that's not the case then HSAILFrame
> >> > >>> should be an HSAIL equivalant of frame: only one frame since
> >> > >>> there is only one physical frame.
> >> > >>> I'm not entirely sure why we need the HSAILLocation. It's
> >> > >>> useful now during development but I suppose it should not be
> >> > >>> needed any more once we go through the StackValues. Did you
> >> > >>> have a specific use in mind beyond development tests?
> >> > >>>
> >> > >>> -Gilles
> >> > >>>
> >> > >>> On Thu, Jan 23, 2014 at 10:10 PM, Gilles Duboscq
> >> > >>> <duboscq at ssw.jku.at>
> >> > >>> wrote:
> >> > >>> > Hello Tom,
> >> > >>> >
> >> > >>> > I've been working on this and by now i'm not really convinced
> >> > >>> > i will get something useful enough for tomorrow.
> >> > >>> > I'll share the state of my patch/findings with you tomorrow
> >> > >>> > anyway but I'll probably need more work.
> >> > >>> >
> >> > >>> > Sorry about that, I knew this deoptimization code is
> >> > >>> > complicated but using a non-physical frame(i.e. not a frame
> >> > >>> > from the platform's native
> >> > >>> > ABI) is more complicated than i thought.
> >> > >>> >
> >> > >>> > -Gilles
> >> > >>> >
> >> > >>> > On Mon, Jan 20, 2014 at 8:14 PM, Tom Deneau
> >> > >>> > <tom.deneau at amd.com>
> >> > >>> wrote:
> >> > >>> >> Thanks, Gilles.
> >> > >>> >>
> >> > >>> >>> -----Original Message-----
> >> > >>> >>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On
> >> > >>> >>> Behalf Of Gilles Duboscq
> >> > >>> >>> Sent: Monday, January 20, 2014 12:29 PM
> >> > >>> >>> To: Deneau, Tom
> >> > >>> >>> Subject: Re: actions -- Rebuilding the Interpreter Frames
> >> > >>> >>> on the GPU
> >> > >>> >>>
> >> > >>> >>> Hello Tom,
> >> > >>> >>>
> >> > >>> >>> Yes i've looked at your webrev.
> >> > >>> >>> Thank you.
> >> > >>> >>>
> >> > >>> >>> I also looked at the hotspot code and I have a rough idea
> >> > >>> >>> of what is needed.
> >> > >>> >>> Sorry for the late answer, I have a lot of things on my
> >> > >>> >>> stack right
> >> > >>> now.
> >> > >>> >>>
> >> > >>> >>> I intend to look at it this week and i hope to have at
> >> > >>> >>> least something that you can experiment with on friday.
> >> > >>> >>>
> >> > >>> >>> -Gilles
> >> > >>> >>>
> >> > >>> >>> On Fri, Jan 17, 2014 at 10:23 PM, Tom Deneau
> >> > >>> >>> <tom.deneau at amd.com>
> >> > >>> wrote:
> >> > >>> >>> > Hi Gilles --
> >> > >>> >>> >
> >> > >>> >>> > I assume you saw the notice of the webrev I uploaded that
> >> > >>> >>> > can be
> >> > >>> >>> inspected
> >> > >>> >>> > (and also can be built, although we are not proposing it
> >> > >>> >>> > for
> >> > >>> >>> > check-
> >> > >>> >>> in).
> >> > >>> >>> >
> >> > >>> >>> > http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-
> >> > >>> >>> > hsail
> >> > >>> >>> > -
> >> > >>> >>> debuginfo-for-gilles/webrev/
> >> > >>> >>> >
> >> > >>> >>> >
> >> > >>> >>> > To help with our internal planning, can you give us a
> >> > >>> >>> > rough estimate
> >> > >>> >>> of how far
> >> > >>> >>> > away the frame rebuilding interface might be?
> >> > >>> >>> >
> >> > >>> >>> > -- Tom
> >> > >>> >>> >
> >> > >>> >>> >
> >> > >>> >>> >
> >> > >>> >>> >> -----Original Message-----
> >> > >>> >>> >> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com]
> >> > >>> >>> >> On Behalf Of Gilles Duboscq
> >> > >>> >>> >> Sent: Wednesday, January 15, 2014 4:38 AM
> >> > >>> >>> >> To: Deneau, Tom
> >> > >>> >>> >> Cc: Doug Simon; graal-dev at openjdk.java.net
> >> > >>> >>> >> Subject: Re: actions -- Rebuilding the Interpreter
> >> > >>> >>> >> Frames on the GPU
> >> > >>> >>> >>
> >> > >>> >>> >> Hello Tom,
> >> > >>> >>> >>
> >> > >>> >>> >> It's on my list, i already had a closer look at the
> >> > >>> >>> >> frame rebuilding code.
> >> > >>> >>> >> I would be interested to have a look at the code of your
> >> > >>> >>> CodeInstaller
> >> > >>> >>> >> subclass and the code you use to retrieve the runtime
> >> > >>> >>> >> values so that
> >> > >>> >>> i
> >> > >>> >>> >> can experiment with it.
> >> > >>> >>> >>
> >> > >>> >>> >> -Gilles
> >> > >>> >>> >>
> >> > >>> >>> >> On Mon, Jan 13, 2014 at 5:09 PM, Tom Deneau
> >> > >>> >>> >> <tom.deneau at amd.com>
> >> > >>> >>> wrote:
> >> > >>> >>> >> > Gilles, Doug --
> >> > >>> >>> >> >
> >> > >>> >>> >> > A status update on our end...
> >> > >>> >>> >> >
> >> > >>> >>> >> >    * We now generate HSAIL code to save the register
> >> > >>> >>> >> > state at deopt
> >> > >>> >>> >> points
> >> > >>> >>> >> >
> >> > >>> >>> >> >    * We have an HSAIL-specific CodeInstaller class
> >> > >>> >>> >> > based on the
> >> > >>> >>> >> changes
> >> > >>> >>> >> >      Doug added and we use this at compile time
> >> > >>> >>> >> > (code-install
> >> > >>> >>> >> > time)
> >> > >>> >>> to
> >> > >>> >>> >> >      build the ScopeDescs.  (This avoids the
> >> > >>> >>> >> > host-register specific
> >> > >>> >>> >> code
> >> > >>> >>> >> >      in the base CodeInstaller class).
> >> > >>> >>> >> >
> >> > >>> >>> >> >    * At runtime, if we detect that a workitem deopted,
> >> > >>> >>> >> > we map the
> >> > >>> >>> >> saved "HSAIL pc"
> >> > >>> >>> >> >      to the relevant ScopeDesc and use each Location
> >> > >>> >>> >> > item in the
> >> > >>> >>> >> ScopeDesc
> >> > >>> >>> >> >      to retrieve the relevant HSAIL register from the
> >> > >>> >>> >> > HSAIL frame
> >> > >>> >>> >> (where the
> >> > >>> >>> >> >      registers were saved).
> >> > >>> >>> >> >
> >> > >>> >>> >> > Right now we just print out the live locals or
> >> > >>> >>> >> > expression stack
> >> > >>> >>> values
> >> > >>> >>> >> > for the deopted workitem and they look correct.  The
> >> > >>> >>> >> > next step
> >> > >>> >>> would
> >> > >>> >>> >> be
> >> > >>> >>> >> > to rebuild the interpreter frames.
> >> > >>> >>> >> >
> >> > >>> >>> >> > Can I get an update on the "C++ changes needed to
> >> > >>> >>> >> > easily rebuild
> >> > >>> >>> the
> >> > >>> >>> >> > interpreter frames from a raw buffer provided by the
> GPU".
> >> > >>> >>> >> >
> >> > >>> >>> >> > -- Tom
> >> > >>> >>> >> >
> >> > >>> >>> >> >
> >> > >>> >>> >> >
> >> > >>> >>> >> >
> >> > >>> >>> >> >> -----Original Message-----
> >> > >>> >>> >> >> From: graal-dev-bounces at openjdk.java.net
> >> > >>> >>> >> >> [mailto:graal-dev- bounces at openjdk.java.net] On
> >> > >>> >>> >> >> Behalf Of Gilles Duboscq
> >> > >>> >>> >> >> Sent: Friday, December 20, 2013 4:31 AM
> >> > >>> >>> >> >> To: Doug Simon
> >> > >>> >>> >> >> Cc: graal-dev at openjdk.java.net
> >> > >>> >>> >> >> Subject: Re: actions
> >> > >>> >>> >> >>
> >> > >>> >>> >> >> As for me, I'll look into the C++ changes needed to
> >> > >>> >>> >> >> easily rebuild
> >> > >>> >>> >> the
> >> > >>> >>> >> >> interpreter frames from a raw buffer provided by the
> >> > >>> >>> >> >> GPU during deoptimization.
> >> > >>> >>> >> >>
> >> > >>> >>> >> >> -Gilles
> >> > >>> >>> >> >>
> >> > >>> >>> >> >>
> >> > >>> >>> >> >> On Thu, Dec 19, 2013 at 11:27 PM, Doug Simon
> >> > >>> >>> <doug.simon at oracle.com>
> >> > >>> >>> >> >> wrote:
> >> > >>> >>> >> >>
> >> > >>> >>> >> >> > As a result of the Sumatra Skype meeting today on
> >> > >>> >>> >> >> > the topic of
> >> > >>> >>> how
> >> > >>> >>> >> to
> >> > >>> >>> >> >> > handle deopt for HSAIL & PTX, I’ve signed up to
> >> > >>> >>> >> >> > investigate
> >> > >>> >>> changes
> >> > >>> >>> >> in
> >> > >>> >>> >> >> > the
> >> > >>> >>> >> >> > C++ layer of Graal to accommodate installing code
> >> > >>> >>> >> >> > C++ whose debug
> >> > >>> >>> info
> >> > >>> >>> >> is
> >> > >>> >>> >> >> > C++ not
> >> > >>> >>> >> >> > in terms of host machine state (e.g. uses a
> >> > >>> >>> >> >> > different register
> >> > >>> >>> set
> >> > >>> >>> >> >> > than the host register set).
> >> > >>> >>> >> >> >
> >> > >>> >>> >> >> > -Doug
> >> > >>> >>> >> >> >
> >> > >>> >>> >> >> > On Dec 19, 2013, at 11:02 PM, Deneau, Tom
> >> > >>> >>> >> >> > <tom.deneau at amd.com>
> >> > >>> >>> >> wrote:
> >> > >>> >>> >> >> >
> >> > >>> >>> >> >> > > Gilles, Doug --
> >> > >>> >>> >> >> > >
> >> > >>> >>> >> >> > > Could you post to the graal-dev list what the two
> >> > >>> >>> >> >> > > action items
> >> > >>> >>> >> you
> >> > >>> >>> >> >> > > took
> >> > >>> >>> >> >> > were?
> >> > >>> >>> >> >> > >
> >> > >>> >>> >> >> > > -- Tom
> >> > >>> >>> >> >> >
> >> > >>> >>> >> >> >
> >> > >>> >>> >> >
> >> > >>> >>> >
> >> > >>> >>
> >>
> >



More information about the graal-dev mailing list