actions -- Rebuilding the Interpreter Frames on the GPU
Deneau, Tom
tom.deneau at amd.com
Fri Jan 31 05:36:34 PST 2014
Gilles --
Yet another updated version of the webrev can be found at
http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-debuginfo-for-gilles-v3/webrev/
This one merged with Jan 31 trunk which includes Doug's more extensive GPU changes.
The tests should all still pass on the simulator.
-- Tom
> -----Original Message-----
> From: Deneau, Tom
> Sent: Wednesday, January 29, 2014 12:22 PM
> To: 'Gilles Duboscq'
> Cc: graal-dev at openjdk.java.net
> Subject: RE: actions -- Rebuilding the Interpreter Frames on the GPU
>
> Gilles --
>
> I pushed an updated version of the webrev to
> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-
> debuginfo-for-gilles-v2/webrev/
>
> As with the previous one, not proposing that this gets checked in but it
> should provide a basis for your experiments.
>
> There haven't been any big structural changes since the first one.
> This one has merged with the latest default on Jan 29, which includes
> Doug Simon's patch to get rid of HSAILCompilationResult and use
> backend.CompileKernel instead.
>
> The junits, including the new ones based on bounds checks, etc should
> pass when run with the hsail simulator.
>
> Let me know if your run into any problems with this..
>
> -- Tom
>
>
> > -----Original Message-----
> > From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
> > Gilles Duboscq
> > Sent: Wednesday, January 29, 2014 6:36 AM
> > To: Deneau, Tom
> > Cc: graal-dev at openjdk.java.net
> > Subject: Re: actions -- Rebuilding the Interpreter Frames on the GPU
> >
> > Tom,
> >
> > Do you have an updated version of the webrev I based my work on so
> far?
> > Since I'm changing direction, it would probably be better if I base
> > off a recent version.
> > I think Doug is going to push some changes regarding multi-gpu support
> > later this afternoon (CET), so it would probably be better if it can
> > be based on something after that.
> >
> > -Gilles
> >
> > On Wed, Jan 29, 2014 at 12:07 AM, Gilles Duboscq <gilwooden at gmail.com>
> > wrote:
> > > Yes, it's all correct.
> > > This host code basically only contains code to handle the GPU code's
> > > depots which it handles by using ... depot again, but since we are
> > > on the host now, depot there is very simple.
> > >
> > > On 28 Jan 2014 19:59, "Tom Deneau" <tom.deneau at amd.com> wrote:
> > >>
> > >> Gilles --
> > >>
> > >> I'm not sure I understand this 100% (and I can't say I understand
> > >> how OSR works) but this sounds like a good goal to avoid modifying
> > >> the hotspot deopt code, etc.
> > >>
> > >> So is the following correct?
> > >> * this second graph compiles to some funny host code which
> > >> gets invoked at runtime via javaCall when the gpu de-opts?
> > >> This host code is like a special compilation of the original
> > >> kernel method.
> > >>
> > >> * When the gpu sees a deopt and makes the javacall, it just
> > >> needs to pass the unique de-opt location (int)
> > >> and the set of saved gpu register/stack values.
> > >>
> > >> * And the funny host code will set up all the locals,
> > >> expressions,
> > etc.
> > >> and then does a normal host deopt...
> > >>
> > >> If so, it sounds very clever... :)
> > >>
> > >> -- Tom
> > >>
> > >>
> > >>
> > >> > -----Original Message-----
> > >> > From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf
> > >> > Of Gilles Duboscq
> > >> > Sent: Tuesday, January 28, 2014 12:29 PM
> > >> > To: Deneau, Tom
> > >> > Cc: graal-dev at openjdk.java.net
> > >> > Subject: Re: actions -- Rebuilding the Interpreter Frames on the
> > >> > GPU
> > >> >
> > >> > Tom,
> > >> >
> > >> > After further thinking, discussing and hacking into HotSpot, I
> > >> > think we've finally arrived to a reasonable battle plan. We have
> > >> > turned the problem around and the plan is to use a combination of
> > >> > something that looks like OSR and deoptimization:
> > >> > - Around the end of the compilation (just before going to LIR), I
> > >> > create a new graph based on the current graph:
> > >> > - It gets 2 arguments a long (a pointer actually), and an int
> > >> > - For each deopt in the original graph there is a unique int,
> > >> > the first thing this new graph does is a switch on this int.
> > >> > - After this switch, it reads all the values necessary for the
> > >> > deopt's framestates from this long pointer (which probably simply
> > >> > points to the
> > >> > HSAILFrame)
> > >> > - It then directly deopts from there.
> > >> > - When a deopt happens on the GPU, we do a JavaCall using
> > >> > something like JavaCalls::call_helper (javaCalls.cpp) with an
> > >> > additional argument for the entry point
> > >> >
> > >> > I think doing deopt this way will avoid us a lot of problem
> > because:
> > >> > - we don't need to modify any of HotSpot's deopt code
> > >> > - the frames and nmethods involved look perfectly normal to
> > >> > HotSpot
> > >> >
> > >> > My plan is:
> > >> > - make it possible for ExternalCompilationResult to contain both
> > >> > the External part (HSAIL things) and the host part (the code
> > >> > coming from this second graph)
> > >> > - Hook somewhere in the HSAIL backend to generate this second
> > >> > graph, compile it using the Host backend and combine the HSAIL
> > >> > and host results in the ExternalCompilationResult
> > >> > - Install this ExternalCompilationResult correctly in the code
> > >> > cache
> > >> > - Implement the final calling to JavaCalls::call_helper in
> > >> > gpu_hsail.cpp
> > >> >
> > >> > -Gilles
> > >> >
> > >> > On Tue, Jan 28, 2014 at 2:49 PM, Gilles Duboscq
> > >> > <duboscq at ssw.jku.at>
> > >> > wrote:
> > >> > > On Mon, Jan 27, 2014 at 8:35 PM, Tom Deneau
> > >> > > <tom.deneau at amd.com>
> > >> > wrote:
> > >> > >> Gilles --
> > >> > >>
> > >> > >> I took a look at your diff file and it seems we are mostly
> > >> > >> headed in the right direction.
> > >> > >>
> > >> > >> Regarding this paragraph
> > >> > >>> Right now i'm trying to see how i can modify
> > >> > >>> fetch_unroll_info_helper to minimise its relying on frames.
> > >> > >>> This
> > >> > needs quite a bit of refactoring.
> > >> > >>> Part of this also requires figuring out exactly what will be
> > >> > >>> the frame layout when we will call it. I suppose that to
> > >> > >>> avoid to many changes we can call a stub similar to the
> > >> > >>> deopt/uncommon_trap stub from sharedRuntime_x86_64.cpp.
> > >> > >>>
> > >> > >>
> > >> > >> I was assuming the frame layout would be what the HSAILFrame
> > >> > structure shows.
> > >> > >> For now there will only be one level of HSAILFrame and we will
> > >> > >> always have 32 saved $s registers, 16 saved $d registers, even
> > >> > >> if some are not necessary, but the HSAILFrame has provisions
> > >> > >> for
> > saving fewer.
> > >> > >
> > >> > > Yes but in the deoptimization code HotSpot expects frame values
> > >> > > (frame.hpp), and frame is a platform specific class (see
> > >> > > frame_x86.hpp and friends). I'm not sure we really win
> > >> > > something by making the HSAIL frames look the same as the host
> > >> > > architecture: that would require some changes and there are
> > >> > > still assumptions that these frames are on the stack.
> > >> > >
> > >> > >>
> > >> > >> If there are other layouts for HSAILFrame that make this
> > >> > >> easier, let
> > >> > me know.
> > >> > >>
> > >> > >> Also, I'm not sure what you mean by "call a stub similar to
> > >> > >> the deopt/uncommon_trap stub from sharedRuntime_x86_64.cpp".
> > >> > >
> > >> > > Deoptimization::fetch_unroll_info_helper makes some assumptions
> > >> > > on the layout of the frames leading to it. For example expects
> > >> > > to be called from a stub: either the deopt_blob
> > >> > > (SharedRuntime::generate_deopt_blob) or the uncommon_trap_blob
> > >> > > (SharedRuntime::generate_uncommon_trap_blob).
> > >> > > I was talking about this with Tom Rodriguez and what we
> > >> > > probably want is to do a standard JavaCall which would land on
> > >> > > such a stub, this would make it easier to end up with a valid-
> > looking/walk-able stack.
> > >> > >
> > >> > >>
> > >> > >> -- Tom
> > >> > >>
> > >> > >>
> > >> > >>> -----Original Message-----
> > >> > >>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On
> > >> > >>> Behalf Of Gilles Duboscq
> > >> > >>> Sent: Friday, January 24, 2014 12:07 PM
> > >> > >>> To: Deneau, Tom
> > >> > >>> Subject: Re: actions -- Rebuilding the Interpreter Frames on
> > >> > >>> the GPU
> > >> > >>>
> > >> > >>> Hello Tom,
> > >> > >>>
> > >> > >>> I'm sending you my current diff, mostly for you information
> > >> > >>> because it probably wouldn't compile or run.
> > >> > >>>
> > >> > >>> For the deopt process what we need to do is:
> > >> > >>> -Get the UnrollBlock from
> > >> > >>> Deoptimization::fetch_unroll_info_helper
> > >> > >>> -Rebuild the "skeletal frames" (walkable and with PCs but no
> > >> > >>> values) using this UnrollBlock (see for example
> > >> > >>> sharedRuntime_x86_64.cpp starting around line 3530) -Run
> > >> > >>> Deoptimization::unpack_frames which will fill the skeletal
> > >> > >>> frames with values using the UnrollBlock
> > >> > >>>
> > >> > >>> This work relies on vframes (here compiledVFrames)
> > >> > >>> corresponding to the java frames that are contained in the
> > >> > >>> method that just
> > >> > deoptimized.
> > >> > >>> Usually theses vframes reference a particular frame (from
> > >> > >>> frame.hpp, i.e. a physical frame from the host machine).
> > >> > >>> Sub-classing frame is not really possible (I spent some time
> > >> > >>> looking at that but that doesn't seem reasonable) but
> > >> > >>> subclassing compiledVFrame should be easy, that's what i did
> > >> > >>> in
> > >> > HsailCompiledVFrame.
> > >> > >>> HsailCompiledVFrame references the HSAILFrame and uses it in
> > >> > >>> HsailCompiledVFrame::create_stack_value which is what creates
> > >> > >>> StackValues which are later used to retrieve the data.
> > >> > >>>
> > >> > >>> Right now i'm trying to see how i can modify
> > >> > >>> fetch_unroll_info_helper to minimise its relying on frames.
> > >> > >>> This
> > >> > needs quite a bit of refactoring.
> > >> > >>> Part of this also requires figuring out exactly what will be
> > >> > >>> the frame layout when we will call it. I suppose that to
> > >> > >>> avoid to many changes we can call a stub similar to the
> > >> > >>> deopt/uncommon_trap stub from sharedRuntime_x86_64.cpp.
> > >> > >>>
> > >> > >>> A few questions:
> > >> > >>> why would there be multiple HSAILFrame? Is there a stack and
> > >> > >>> method calls in HSAIL? if that's not the case then HSAILFrame
> > >> > >>> should be an HSAIL equivalant of frame: only one frame since
> > >> > >>> there is only one physical frame.
> > >> > >>> I'm not entirely sure why we need the HSAILLocation. It's
> > >> > >>> useful now during development but I suppose it should not be
> > >> > >>> needed any more once we go through the StackValues. Did you
> > >> > >>> have a specific use in mind beyond development tests?
> > >> > >>>
> > >> > >>> -Gilles
> > >> > >>>
> > >> > >>> On Thu, Jan 23, 2014 at 10:10 PM, Gilles Duboscq
> > >> > >>> <duboscq at ssw.jku.at>
> > >> > >>> wrote:
> > >> > >>> > Hello Tom,
> > >> > >>> >
> > >> > >>> > I've been working on this and by now i'm not really
> > >> > >>> > convinced i will get something useful enough for tomorrow.
> > >> > >>> > I'll share the state of my patch/findings with you tomorrow
> > >> > >>> > anyway but I'll probably need more work.
> > >> > >>> >
> > >> > >>> > Sorry about that, I knew this deoptimization code is
> > >> > >>> > complicated but using a non-physical frame(i.e. not a frame
> > >> > >>> > from the platform's native
> > >> > >>> > ABI) is more complicated than i thought.
> > >> > >>> >
> > >> > >>> > -Gilles
> > >> > >>> >
> > >> > >>> > On Mon, Jan 20, 2014 at 8:14 PM, Tom Deneau
> > >> > >>> > <tom.deneau at amd.com>
> > >> > >>> wrote:
> > >> > >>> >> Thanks, Gilles.
> > >> > >>> >>
> > >> > >>> >>> -----Original Message-----
> > >> > >>> >>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On
> > >> > >>> >>> Behalf Of Gilles Duboscq
> > >> > >>> >>> Sent: Monday, January 20, 2014 12:29 PM
> > >> > >>> >>> To: Deneau, Tom
> > >> > >>> >>> Subject: Re: actions -- Rebuilding the Interpreter Frames
> > >> > >>> >>> on the GPU
> > >> > >>> >>>
> > >> > >>> >>> Hello Tom,
> > >> > >>> >>>
> > >> > >>> >>> Yes i've looked at your webrev.
> > >> > >>> >>> Thank you.
> > >> > >>> >>>
> > >> > >>> >>> I also looked at the hotspot code and I have a rough idea
> > >> > >>> >>> of what is needed.
> > >> > >>> >>> Sorry for the late answer, I have a lot of things on my
> > >> > >>> >>> stack right
> > >> > >>> now.
> > >> > >>> >>>
> > >> > >>> >>> I intend to look at it this week and i hope to have at
> > >> > >>> >>> least something that you can experiment with on friday.
> > >> > >>> >>>
> > >> > >>> >>> -Gilles
> > >> > >>> >>>
> > >> > >>> >>> On Fri, Jan 17, 2014 at 10:23 PM, Tom Deneau
> > >> > >>> >>> <tom.deneau at amd.com>
> > >> > >>> wrote:
> > >> > >>> >>> > Hi Gilles --
> > >> > >>> >>> >
> > >> > >>> >>> > I assume you saw the notice of the webrev I uploaded
> > >> > >>> >>> > that can be
> > >> > >>> >>> inspected
> > >> > >>> >>> > (and also can be built, although we are not proposing
> > >> > >>> >>> > it for
> > >> > >>> >>> > check-
> > >> > >>> >>> in).
> > >> > >>> >>> >
> > >> > >>> >>> > http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webre
> > >> > >>> >>> > v-
> > >> > >>> >>> > hsail
> > >> > >>> >>> > -
> > >> > >>> >>> debuginfo-for-gilles/webrev/
> > >> > >>> >>> >
> > >> > >>> >>> >
> > >> > >>> >>> > To help with our internal planning, can you give us a
> > >> > >>> >>> > rough estimate
> > >> > >>> >>> of how far
> > >> > >>> >>> > away the frame rebuilding interface might be?
> > >> > >>> >>> >
> > >> > >>> >>> > -- Tom
> > >> > >>> >>> >
> > >> > >>> >>> >
> > >> > >>> >>> >
> > >> > >>> >>> >> -----Original Message-----
> > >> > >>> >>> >> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com]
> > >> > >>> >>> >> On Behalf Of Gilles Duboscq
> > >> > >>> >>> >> Sent: Wednesday, January 15, 2014 4:38 AM
> > >> > >>> >>> >> To: Deneau, Tom
> > >> > >>> >>> >> Cc: Doug Simon; graal-dev at openjdk.java.net
> > >> > >>> >>> >> Subject: Re: actions -- Rebuilding the Interpreter
> > >> > >>> >>> >> Frames on the GPU
> > >> > >>> >>> >>
> > >> > >>> >>> >> Hello Tom,
> > >> > >>> >>> >>
> > >> > >>> >>> >> It's on my list, i already had a closer look at the
> > >> > >>> >>> >> frame rebuilding code.
> > >> > >>> >>> >> I would be interested to have a look at the code of
> > >> > >>> >>> >> your
> > >> > >>> >>> CodeInstaller
> > >> > >>> >>> >> subclass and the code you use to retrieve the runtime
> > >> > >>> >>> >> values so that
> > >> > >>> >>> i
> > >> > >>> >>> >> can experiment with it.
> > >> > >>> >>> >>
> > >> > >>> >>> >> -Gilles
> > >> > >>> >>> >>
> > >> > >>> >>> >> On Mon, Jan 13, 2014 at 5:09 PM, Tom Deneau
> > >> > >>> >>> >> <tom.deneau at amd.com>
> > >> > >>> >>> wrote:
> > >> > >>> >>> >> > Gilles, Doug --
> > >> > >>> >>> >> >
> > >> > >>> >>> >> > A status update on our end...
> > >> > >>> >>> >> >
> > >> > >>> >>> >> > * We now generate HSAIL code to save the register
> > >> > >>> >>> >> > state at deopt
> > >> > >>> >>> >> points
> > >> > >>> >>> >> >
> > >> > >>> >>> >> > * We have an HSAIL-specific CodeInstaller class
> > >> > >>> >>> >> > based on the
> > >> > >>> >>> >> changes
> > >> > >>> >>> >> > Doug added and we use this at compile time
> > >> > >>> >>> >> > (code-install
> > >> > >>> >>> >> > time)
> > >> > >>> >>> to
> > >> > >>> >>> >> > build the ScopeDescs. (This avoids the
> > >> > >>> >>> >> > host-register specific
> > >> > >>> >>> >> code
> > >> > >>> >>> >> > in the base CodeInstaller class).
> > >> > >>> >>> >> >
> > >> > >>> >>> >> > * At runtime, if we detect that a workitem
> > >> > >>> >>> >> > deopted, we map the
> > >> > >>> >>> >> saved "HSAIL pc"
> > >> > >>> >>> >> > to the relevant ScopeDesc and use each Location
> > >> > >>> >>> >> > item in the
> > >> > >>> >>> >> ScopeDesc
> > >> > >>> >>> >> > to retrieve the relevant HSAIL register from
> > >> > >>> >>> >> > the HSAIL frame
> > >> > >>> >>> >> (where the
> > >> > >>> >>> >> > registers were saved).
> > >> > >>> >>> >> >
> > >> > >>> >>> >> > Right now we just print out the live locals or
> > >> > >>> >>> >> > expression stack
> > >> > >>> >>> values
> > >> > >>> >>> >> > for the deopted workitem and they look correct. The
> > >> > >>> >>> >> > next step
> > >> > >>> >>> would
> > >> > >>> >>> >> be
> > >> > >>> >>> >> > to rebuild the interpreter frames.
> > >> > >>> >>> >> >
> > >> > >>> >>> >> > Can I get an update on the "C++ changes needed to
> > >> > >>> >>> >> > easily rebuild
> > >> > >>> >>> the
> > >> > >>> >>> >> > interpreter frames from a raw buffer provided by the
> > GPU".
> > >> > >>> >>> >> >
> > >> > >>> >>> >> > -- Tom
> > >> > >>> >>> >> >
> > >> > >>> >>> >> >
> > >> > >>> >>> >> >
> > >> > >>> >>> >> >
> > >> > >>> >>> >> >> -----Original Message-----
> > >> > >>> >>> >> >> From: graal-dev-bounces at openjdk.java.net
> > >> > >>> >>> >> >> [mailto:graal-dev- bounces at openjdk.java.net] On
> > >> > >>> >>> >> >> Behalf Of Gilles Duboscq
> > >> > >>> >>> >> >> Sent: Friday, December 20, 2013 4:31 AM
> > >> > >>> >>> >> >> To: Doug Simon
> > >> > >>> >>> >> >> Cc: graal-dev at openjdk.java.net
> > >> > >>> >>> >> >> Subject: Re: actions
> > >> > >>> >>> >> >>
> > >> > >>> >>> >> >> As for me, I'll look into the C++ changes needed to
> > >> > >>> >>> >> >> easily rebuild
> > >> > >>> >>> >> the
> > >> > >>> >>> >> >> interpreter frames from a raw buffer provided by
> > >> > >>> >>> >> >> the GPU during deoptimization.
> > >> > >>> >>> >> >>
> > >> > >>> >>> >> >> -Gilles
> > >> > >>> >>> >> >>
> > >> > >>> >>> >> >>
> > >> > >>> >>> >> >> On Thu, Dec 19, 2013 at 11:27 PM, Doug Simon
> > >> > >>> >>> <doug.simon at oracle.com>
> > >> > >>> >>> >> >> wrote:
> > >> > >>> >>> >> >>
> > >> > >>> >>> >> >> > As a result of the Sumatra Skype meeting today on
> > >> > >>> >>> >> >> > the topic of
> > >> > >>> >>> how
> > >> > >>> >>> >> to
> > >> > >>> >>> >> >> > handle deopt for HSAIL & PTX, I’ve signed up to
> > >> > >>> >>> >> >> > investigate
> > >> > >>> >>> changes
> > >> > >>> >>> >> in
> > >> > >>> >>> >> >> > the
> > >> > >>> >>> >> >> > C++ layer of Graal to accommodate installing code
> > >> > >>> >>> >> >> > C++ whose debug
> > >> > >>> >>> info
> > >> > >>> >>> >> is
> > >> > >>> >>> >> >> > C++ not
> > >> > >>> >>> >> >> > in terms of host machine state (e.g. uses a
> > >> > >>> >>> >> >> > different register
> > >> > >>> >>> set
> > >> > >>> >>> >> >> > than the host register set).
> > >> > >>> >>> >> >> >
> > >> > >>> >>> >> >> > -Doug
> > >> > >>> >>> >> >> >
> > >> > >>> >>> >> >> > On Dec 19, 2013, at 11:02 PM, Deneau, Tom
> > >> > >>> >>> >> >> > <tom.deneau at amd.com>
> > >> > >>> >>> >> wrote:
> > >> > >>> >>> >> >> >
> > >> > >>> >>> >> >> > > Gilles, Doug --
> > >> > >>> >>> >> >> > >
> > >> > >>> >>> >> >> > > Could you post to the graal-dev list what the
> > >> > >>> >>> >> >> > > two action items
> > >> > >>> >>> >> you
> > >> > >>> >>> >> >> > > took
> > >> > >>> >>> >> >> > were?
> > >> > >>> >>> >> >> > >
> > >> > >>> >>> >> >> > > -- Tom
> > >> > >>> >>> >> >> >
> > >> > >>> >>> >> >> >
> > >> > >>> >>> >> >
> > >> > >>> >>> >
> > >> > >>> >>
> > >>
> > >
More information about the graal-dev
mailing list