actions -- Rebuilding the Interpreter Frames on the GPU
Gilles Duboscq
gilwooden at gmail.com
Tue Jan 28 15:07:09 PST 2014
Yes, it's all correct.
This host code basically only contains code to handle the GPU code's depots
which it handles by using ... depot again, but since we are on the host
now, depot there is very simple.
On 28 Jan 2014 19:59, "Tom Deneau" <tom.deneau at amd.com> wrote:
> Gilles --
>
> I'm not sure I understand this 100% (and I can't say I understand
> how OSR works) but this sounds like a good goal to
> avoid modifying the hotspot deopt code, etc.
>
> So is the following correct?
> * this second graph compiles to some funny host code which
> gets invoked at runtime via javaCall when the gpu de-opts?
> This host code is like a special compilation of the original kernel
> method.
>
> * When the gpu sees a deopt and makes the javacall, it just
> needs to pass the unique de-opt location (int)
> and the set of saved gpu register/stack values.
>
> * And the funny host code will set up all the locals, expressions, etc.
> and then does a normal host deopt...
>
> If so, it sounds very clever... :)
>
> -- Tom
>
>
>
> > -----Original Message-----
> > From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
> > Gilles Duboscq
> > Sent: Tuesday, January 28, 2014 12:29 PM
> > To: Deneau, Tom
> > Cc: graal-dev at openjdk.java.net
> > Subject: Re: actions -- Rebuilding the Interpreter Frames on the GPU
> >
> > Tom,
> >
> > After further thinking, discussing and hacking into HotSpot, I think
> > we've finally arrived to a reasonable battle plan. We have turned the
> > problem around and the plan is to use a combination of something that
> > looks like OSR and deoptimization:
> > - Around the end of the compilation (just before going to LIR), I create
> > a new graph based on the current graph:
> > - It gets 2 arguments a long (a pointer actually), and an int
> > - For each deopt in the original graph there is a unique int, the
> > first thing this new graph does is a switch on this int.
> > - After this switch, it reads all the values necessary for the deopt's
> > framestates from this long pointer (which probably simply points to the
> > HSAILFrame)
> > - It then directly deopts from there.
> > - When a deopt happens on the GPU, we do a JavaCall using something like
> > JavaCalls::call_helper (javaCalls.cpp) with an additional argument for
> > the entry point
> >
> > I think doing deopt this way will avoid us a lot of problem because:
> > - we don't need to modify any of HotSpot's deopt code
> > - the frames and nmethods involved look perfectly normal to HotSpot
> >
> > My plan is:
> > - make it possible for ExternalCompilationResult to contain both the
> > External part (HSAIL things) and the host part (the code coming from
> > this second graph)
> > - Hook somewhere in the HSAIL backend to generate this second graph,
> > compile it using the Host backend and combine the HSAIL and host results
> > in the ExternalCompilationResult
> > - Install this ExternalCompilationResult correctly in the code cache
> > - Implement the final calling to JavaCalls::call_helper in gpu_hsail.cpp
> >
> > -Gilles
> >
> > On Tue, Jan 28, 2014 at 2:49 PM, Gilles Duboscq <duboscq at ssw.jku.at>
> > wrote:
> > > On Mon, Jan 27, 2014 at 8:35 PM, Tom Deneau <tom.deneau at amd.com>
> > wrote:
> > >> Gilles --
> > >>
> > >> I took a look at your diff file and it seems we are mostly headed in
> > >> the right direction.
> > >>
> > >> Regarding this paragraph
> > >>> Right now i'm trying to see how i can modify
> > >>> fetch_unroll_info_helper to minimise its relying on frames. This
> > needs quite a bit of refactoring.
> > >>> Part of this also requires figuring out exactly what will be the
> > >>> frame layout when we will call it. I suppose that to avoid to many
> > >>> changes we can call a stub similar to the deopt/uncommon_trap stub
> > >>> from sharedRuntime_x86_64.cpp.
> > >>>
> > >>
> > >> I was assuming the frame layout would be what the HSAILFrame
> > structure shows.
> > >> For now there will only be one level of HSAILFrame and we will always
> > >> have 32 saved $s registers, 16 saved $d registers, even if some are
> > >> not necessary, but the HSAILFrame has provisions for saving fewer.
> > >
> > > Yes but in the deoptimization code HotSpot expects frame values
> > > (frame.hpp), and frame is a platform specific class (see frame_x86.hpp
> > > and friends). I'm not sure we really win something by making the HSAIL
> > > frames look the same as the host architecture: that would require some
> > > changes and there are still assumptions that these frames are on the
> > > stack.
> > >
> > >>
> > >> If there are other layouts for HSAILFrame that make this easier, let
> > me know.
> > >>
> > >> Also, I'm not sure what you mean by "call a stub similar to the
> > >> deopt/uncommon_trap stub from sharedRuntime_x86_64.cpp".
> > >
> > > Deoptimization::fetch_unroll_info_helper makes some assumptions on the
> > > layout of the frames leading to it. For example expects to be called
> > > from a stub: either the deopt_blob
> > > (SharedRuntime::generate_deopt_blob) or the uncommon_trap_blob
> > > (SharedRuntime::generate_uncommon_trap_blob).
> > > I was talking about this with Tom Rodriguez and what we probably want
> > > is to do a standard JavaCall which would land on such a stub, this
> > > would make it easier to end up with a valid-looking/walk-able stack.
> > >
> > >>
> > >> -- Tom
> > >>
> > >>
> > >>> -----Original Message-----
> > >>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
> > >>> Gilles Duboscq
> > >>> Sent: Friday, January 24, 2014 12:07 PM
> > >>> To: Deneau, Tom
> > >>> Subject: Re: actions -- Rebuilding the Interpreter Frames on the GPU
> > >>>
> > >>> Hello Tom,
> > >>>
> > >>> I'm sending you my current diff, mostly for you information because
> > >>> it probably wouldn't compile or run.
> > >>>
> > >>> For the deopt process what we need to do is:
> > >>> -Get the UnrollBlock from Deoptimization::fetch_unroll_info_helper
> > >>> -Rebuild the "skeletal frames" (walkable and with PCs but no values)
> > >>> using this UnrollBlock (see for example sharedRuntime_x86_64.cpp
> > >>> starting around line 3530) -Run Deoptimization::unpack_frames which
> > >>> will fill the skeletal frames with values using the UnrollBlock
> > >>>
> > >>> This work relies on vframes (here compiledVFrames) corresponding to
> > >>> the java frames that are contained in the method that just
> > deoptimized.
> > >>> Usually theses vframes reference a particular frame (from frame.hpp,
> > >>> i.e. a physical frame from the host machine).
> > >>> Sub-classing frame is not really possible (I spent some time looking
> > >>> at that but that doesn't seem reasonable) but subclassing
> > >>> compiledVFrame should be easy, that's what i did in
> > HsailCompiledVFrame.
> > >>> HsailCompiledVFrame references the HSAILFrame and uses it in
> > >>> HsailCompiledVFrame::create_stack_value which is what creates
> > >>> StackValues which are later used to retrieve the data.
> > >>>
> > >>> Right now i'm trying to see how i can modify
> > >>> fetch_unroll_info_helper to minimise its relying on frames. This
> > needs quite a bit of refactoring.
> > >>> Part of this also requires figuring out exactly what will be the
> > >>> frame layout when we will call it. I suppose that to avoid to many
> > >>> changes we can call a stub similar to the deopt/uncommon_trap stub
> > >>> from sharedRuntime_x86_64.cpp.
> > >>>
> > >>> A few questions:
> > >>> why would there be multiple HSAILFrame? Is there a stack and method
> > >>> calls in HSAIL? if that's not the case then HSAILFrame should be an
> > >>> HSAIL equivalant of frame: only one frame since there is only one
> > >>> physical frame.
> > >>> I'm not entirely sure why we need the HSAILLocation. It's useful now
> > >>> during development but I suppose it should not be needed any more
> > >>> once we go through the StackValues. Did you have a specific use in
> > >>> mind beyond development tests?
> > >>>
> > >>> -Gilles
> > >>>
> > >>> On Thu, Jan 23, 2014 at 10:10 PM, Gilles Duboscq
> > >>> <duboscq at ssw.jku.at>
> > >>> wrote:
> > >>> > Hello Tom,
> > >>> >
> > >>> > I've been working on this and by now i'm not really convinced i
> > >>> > will get something useful enough for tomorrow.
> > >>> > I'll share the state of my patch/findings with you tomorrow anyway
> > >>> > but I'll probably need more work.
> > >>> >
> > >>> > Sorry about that, I knew this deoptimization code is complicated
> > >>> > but using a non-physical frame(i.e. not a frame from the
> > >>> > platform's native
> > >>> > ABI) is more complicated than i thought.
> > >>> >
> > >>> > -Gilles
> > >>> >
> > >>> > On Mon, Jan 20, 2014 at 8:14 PM, Tom Deneau <tom.deneau at amd.com>
> > >>> wrote:
> > >>> >> Thanks, Gilles.
> > >>> >>
> > >>> >>> -----Original Message-----
> > >>> >>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf
> > >>> >>> Of Gilles Duboscq
> > >>> >>> Sent: Monday, January 20, 2014 12:29 PM
> > >>> >>> To: Deneau, Tom
> > >>> >>> Subject: Re: actions -- Rebuilding the Interpreter Frames on the
> > >>> >>> GPU
> > >>> >>>
> > >>> >>> Hello Tom,
> > >>> >>>
> > >>> >>> Yes i've looked at your webrev.
> > >>> >>> Thank you.
> > >>> >>>
> > >>> >>> I also looked at the hotspot code and I have a rough idea of
> > >>> >>> what is needed.
> > >>> >>> Sorry for the late answer, I have a lot of things on my stack
> > >>> >>> right
> > >>> now.
> > >>> >>>
> > >>> >>> I intend to look at it this week and i hope to have at least
> > >>> >>> something that you can experiment with on friday.
> > >>> >>>
> > >>> >>> -Gilles
> > >>> >>>
> > >>> >>> On Fri, Jan 17, 2014 at 10:23 PM, Tom Deneau
> > >>> >>> <tom.deneau at amd.com>
> > >>> wrote:
> > >>> >>> > Hi Gilles --
> > >>> >>> >
> > >>> >>> > I assume you saw the notice of the webrev I uploaded that can
> > >>> >>> > be
> > >>> >>> inspected
> > >>> >>> > (and also can be built, although we are not proposing it for
> > >>> >>> > check-
> > >>> >>> in).
> > >>> >>> >
> > >>> >>> > http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail
> > >>> >>> > -
> > >>> >>> debuginfo-for-gilles/webrev/
> > >>> >>> >
> > >>> >>> >
> > >>> >>> > To help with our internal planning, can you give us a rough
> > >>> >>> > estimate
> > >>> >>> of how far
> > >>> >>> > away the frame rebuilding interface might be?
> > >>> >>> >
> > >>> >>> > -- Tom
> > >>> >>> >
> > >>> >>> >
> > >>> >>> >
> > >>> >>> >> -----Original Message-----
> > >>> >>> >> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On
> > >>> >>> >> Behalf Of Gilles Duboscq
> > >>> >>> >> Sent: Wednesday, January 15, 2014 4:38 AM
> > >>> >>> >> To: Deneau, Tom
> > >>> >>> >> Cc: Doug Simon; graal-dev at openjdk.java.net
> > >>> >>> >> Subject: Re: actions -- Rebuilding the Interpreter Frames on
> > >>> >>> >> the GPU
> > >>> >>> >>
> > >>> >>> >> Hello Tom,
> > >>> >>> >>
> > >>> >>> >> It's on my list, i already had a closer look at the frame
> > >>> >>> >> rebuilding code.
> > >>> >>> >> I would be interested to have a look at the code of your
> > >>> >>> CodeInstaller
> > >>> >>> >> subclass and the code you use to retrieve the runtime values
> > >>> >>> >> so that
> > >>> >>> i
> > >>> >>> >> can experiment with it.
> > >>> >>> >>
> > >>> >>> >> -Gilles
> > >>> >>> >>
> > >>> >>> >> On Mon, Jan 13, 2014 at 5:09 PM, Tom Deneau
> > >>> >>> >> <tom.deneau at amd.com>
> > >>> >>> wrote:
> > >>> >>> >> > Gilles, Doug --
> > >>> >>> >> >
> > >>> >>> >> > A status update on our end...
> > >>> >>> >> >
> > >>> >>> >> > * We now generate HSAIL code to save the register state
> > >>> >>> >> > at deopt
> > >>> >>> >> points
> > >>> >>> >> >
> > >>> >>> >> > * We have an HSAIL-specific CodeInstaller class based on
> > >>> >>> >> > the
> > >>> >>> >> changes
> > >>> >>> >> > Doug added and we use this at compile time
> > >>> >>> >> > (code-install
> > >>> >>> >> > time)
> > >>> >>> to
> > >>> >>> >> > build the ScopeDescs. (This avoids the host-register
> > >>> >>> >> > specific
> > >>> >>> >> code
> > >>> >>> >> > in the base CodeInstaller class).
> > >>> >>> >> >
> > >>> >>> >> > * At runtime, if we detect that a workitem deopted, we
> > >>> >>> >> > map the
> > >>> >>> >> saved "HSAIL pc"
> > >>> >>> >> > to the relevant ScopeDesc and use each Location item
> > >>> >>> >> > in the
> > >>> >>> >> ScopeDesc
> > >>> >>> >> > to retrieve the relevant HSAIL register from the HSAIL
> > >>> >>> >> > frame
> > >>> >>> >> (where the
> > >>> >>> >> > registers were saved).
> > >>> >>> >> >
> > >>> >>> >> > Right now we just print out the live locals or expression
> > >>> >>> >> > stack
> > >>> >>> values
> > >>> >>> >> > for the deopted workitem and they look correct. The next
> > >>> >>> >> > step
> > >>> >>> would
> > >>> >>> >> be
> > >>> >>> >> > to rebuild the interpreter frames.
> > >>> >>> >> >
> > >>> >>> >> > Can I get an update on the "C++ changes needed to easily
> > >>> >>> >> > rebuild
> > >>> >>> the
> > >>> >>> >> > interpreter frames from a raw buffer provided by the GPU".
> > >>> >>> >> >
> > >>> >>> >> > -- Tom
> > >>> >>> >> >
> > >>> >>> >> >
> > >>> >>> >> >
> > >>> >>> >> >
> > >>> >>> >> >> -----Original Message-----
> > >>> >>> >> >> From: graal-dev-bounces at openjdk.java.net
> > >>> >>> >> >> [mailto:graal-dev- bounces at openjdk.java.net] On Behalf Of
> > >>> >>> >> >> Gilles Duboscq
> > >>> >>> >> >> Sent: Friday, December 20, 2013 4:31 AM
> > >>> >>> >> >> To: Doug Simon
> > >>> >>> >> >> Cc: graal-dev at openjdk.java.net
> > >>> >>> >> >> Subject: Re: actions
> > >>> >>> >> >>
> > >>> >>> >> >> As for me, I'll look into the C++ changes needed to easily
> > >>> >>> >> >> rebuild
> > >>> >>> >> the
> > >>> >>> >> >> interpreter frames from a raw buffer provided by the GPU
> > >>> >>> >> >> during deoptimization.
> > >>> >>> >> >>
> > >>> >>> >> >> -Gilles
> > >>> >>> >> >>
> > >>> >>> >> >>
> > >>> >>> >> >> On Thu, Dec 19, 2013 at 11:27 PM, Doug Simon
> > >>> >>> <doug.simon at oracle.com>
> > >>> >>> >> >> wrote:
> > >>> >>> >> >>
> > >>> >>> >> >> > As a result of the Sumatra Skype meeting today on the
> > >>> >>> >> >> > topic of
> > >>> >>> how
> > >>> >>> >> to
> > >>> >>> >> >> > handle deopt for HSAIL & PTX, I’ve signed up to
> > >>> >>> >> >> > investigate
> > >>> >>> changes
> > >>> >>> >> in
> > >>> >>> >> >> > the
> > >>> >>> >> >> > C++ layer of Graal to accommodate installing code whose
> > >>> >>> >> >> > C++ debug
> > >>> >>> info
> > >>> >>> >> is
> > >>> >>> >> >> > C++ not
> > >>> >>> >> >> > in terms of host machine state (e.g. uses a different
> > >>> >>> >> >> > register
> > >>> >>> set
> > >>> >>> >> >> > than the host register set).
> > >>> >>> >> >> >
> > >>> >>> >> >> > -Doug
> > >>> >>> >> >> >
> > >>> >>> >> >> > On Dec 19, 2013, at 11:02 PM, Deneau, Tom
> > >>> >>> >> >> > <tom.deneau at amd.com>
> > >>> >>> >> wrote:
> > >>> >>> >> >> >
> > >>> >>> >> >> > > Gilles, Doug --
> > >>> >>> >> >> > >
> > >>> >>> >> >> > > Could you post to the graal-dev list what the two
> > >>> >>> >> >> > > action items
> > >>> >>> >> you
> > >>> >>> >> >> > > took
> > >>> >>> >> >> > were?
> > >>> >>> >> >> > >
> > >>> >>> >> >> > > -- Tom
> > >>> >>> >> >> >
> > >>> >>> >> >> >
> > >>> >>> >> >
> > >>> >>> >
> > >>> >>
>
>
More information about the graal-dev
mailing list