actions -- Rebuilding the Interpreter Frames on the GPU
Deneau, Tom
tom.deneau at amd.com
Tue Jan 28 10:59:00 PST 2014
Gilles --
I'm not sure I understand this 100% (and I can't say I understand
how OSR works) but this sounds like a good goal to
avoid modifying the hotspot deopt code, etc.
So is the following correct?
* this second graph compiles to some funny host code which
gets invoked at runtime via javaCall when the gpu de-opts?
This host code is like a special compilation of the original kernel method.
* When the gpu sees a deopt and makes the javacall, it just
needs to pass the unique de-opt location (int)
and the set of saved gpu register/stack values.
* And the funny host code will set up all the locals, expressions, etc.
and then does a normal host deopt...
If so, it sounds very clever... :)
-- Tom
> -----Original Message-----
> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
> Gilles Duboscq
> Sent: Tuesday, January 28, 2014 12:29 PM
> To: Deneau, Tom
> Cc: graal-dev at openjdk.java.net
> Subject: Re: actions -- Rebuilding the Interpreter Frames on the GPU
>
> Tom,
>
> After further thinking, discussing and hacking into HotSpot, I think
> we've finally arrived to a reasonable battle plan. We have turned the
> problem around and the plan is to use a combination of something that
> looks like OSR and deoptimization:
> - Around the end of the compilation (just before going to LIR), I create
> a new graph based on the current graph:
> - It gets 2 arguments a long (a pointer actually), and an int
> - For each deopt in the original graph there is a unique int, the
> first thing this new graph does is a switch on this int.
> - After this switch, it reads all the values necessary for the deopt's
> framestates from this long pointer (which probably simply points to the
> HSAILFrame)
> - It then directly deopts from there.
> - When a deopt happens on the GPU, we do a JavaCall using something like
> JavaCalls::call_helper (javaCalls.cpp) with an additional argument for
> the entry point
>
> I think doing deopt this way will avoid us a lot of problem because:
> - we don't need to modify any of HotSpot's deopt code
> - the frames and nmethods involved look perfectly normal to HotSpot
>
> My plan is:
> - make it possible for ExternalCompilationResult to contain both the
> External part (HSAIL things) and the host part (the code coming from
> this second graph)
> - Hook somewhere in the HSAIL backend to generate this second graph,
> compile it using the Host backend and combine the HSAIL and host results
> in the ExternalCompilationResult
> - Install this ExternalCompilationResult correctly in the code cache
> - Implement the final calling to JavaCalls::call_helper in gpu_hsail.cpp
>
> -Gilles
>
> On Tue, Jan 28, 2014 at 2:49 PM, Gilles Duboscq <duboscq at ssw.jku.at>
> wrote:
> > On Mon, Jan 27, 2014 at 8:35 PM, Tom Deneau <tom.deneau at amd.com>
> wrote:
> >> Gilles --
> >>
> >> I took a look at your diff file and it seems we are mostly headed in
> >> the right direction.
> >>
> >> Regarding this paragraph
> >>> Right now i'm trying to see how i can modify
> >>> fetch_unroll_info_helper to minimise its relying on frames. This
> needs quite a bit of refactoring.
> >>> Part of this also requires figuring out exactly what will be the
> >>> frame layout when we will call it. I suppose that to avoid to many
> >>> changes we can call a stub similar to the deopt/uncommon_trap stub
> >>> from sharedRuntime_x86_64.cpp.
> >>>
> >>
> >> I was assuming the frame layout would be what the HSAILFrame
> structure shows.
> >> For now there will only be one level of HSAILFrame and we will always
> >> have 32 saved $s registers, 16 saved $d registers, even if some are
> >> not necessary, but the HSAILFrame has provisions for saving fewer.
> >
> > Yes but in the deoptimization code HotSpot expects frame values
> > (frame.hpp), and frame is a platform specific class (see frame_x86.hpp
> > and friends). I'm not sure we really win something by making the HSAIL
> > frames look the same as the host architecture: that would require some
> > changes and there are still assumptions that these frames are on the
> > stack.
> >
> >>
> >> If there are other layouts for HSAILFrame that make this easier, let
> me know.
> >>
> >> Also, I'm not sure what you mean by "call a stub similar to the
> >> deopt/uncommon_trap stub from sharedRuntime_x86_64.cpp".
> >
> > Deoptimization::fetch_unroll_info_helper makes some assumptions on the
> > layout of the frames leading to it. For example expects to be called
> > from a stub: either the deopt_blob
> > (SharedRuntime::generate_deopt_blob) or the uncommon_trap_blob
> > (SharedRuntime::generate_uncommon_trap_blob).
> > I was talking about this with Tom Rodriguez and what we probably want
> > is to do a standard JavaCall which would land on such a stub, this
> > would make it easier to end up with a valid-looking/walk-able stack.
> >
> >>
> >> -- Tom
> >>
> >>
> >>> -----Original Message-----
> >>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
> >>> Gilles Duboscq
> >>> Sent: Friday, January 24, 2014 12:07 PM
> >>> To: Deneau, Tom
> >>> Subject: Re: actions -- Rebuilding the Interpreter Frames on the GPU
> >>>
> >>> Hello Tom,
> >>>
> >>> I'm sending you my current diff, mostly for you information because
> >>> it probably wouldn't compile or run.
> >>>
> >>> For the deopt process what we need to do is:
> >>> -Get the UnrollBlock from Deoptimization::fetch_unroll_info_helper
> >>> -Rebuild the "skeletal frames" (walkable and with PCs but no values)
> >>> using this UnrollBlock (see for example sharedRuntime_x86_64.cpp
> >>> starting around line 3530) -Run Deoptimization::unpack_frames which
> >>> will fill the skeletal frames with values using the UnrollBlock
> >>>
> >>> This work relies on vframes (here compiledVFrames) corresponding to
> >>> the java frames that are contained in the method that just
> deoptimized.
> >>> Usually theses vframes reference a particular frame (from frame.hpp,
> >>> i.e. a physical frame from the host machine).
> >>> Sub-classing frame is not really possible (I spent some time looking
> >>> at that but that doesn't seem reasonable) but subclassing
> >>> compiledVFrame should be easy, that's what i did in
> HsailCompiledVFrame.
> >>> HsailCompiledVFrame references the HSAILFrame and uses it in
> >>> HsailCompiledVFrame::create_stack_value which is what creates
> >>> StackValues which are later used to retrieve the data.
> >>>
> >>> Right now i'm trying to see how i can modify
> >>> fetch_unroll_info_helper to minimise its relying on frames. This
> needs quite a bit of refactoring.
> >>> Part of this also requires figuring out exactly what will be the
> >>> frame layout when we will call it. I suppose that to avoid to many
> >>> changes we can call a stub similar to the deopt/uncommon_trap stub
> >>> from sharedRuntime_x86_64.cpp.
> >>>
> >>> A few questions:
> >>> why would there be multiple HSAILFrame? Is there a stack and method
> >>> calls in HSAIL? if that's not the case then HSAILFrame should be an
> >>> HSAIL equivalant of frame: only one frame since there is only one
> >>> physical frame.
> >>> I'm not entirely sure why we need the HSAILLocation. It's useful now
> >>> during development but I suppose it should not be needed any more
> >>> once we go through the StackValues. Did you have a specific use in
> >>> mind beyond development tests?
> >>>
> >>> -Gilles
> >>>
> >>> On Thu, Jan 23, 2014 at 10:10 PM, Gilles Duboscq
> >>> <duboscq at ssw.jku.at>
> >>> wrote:
> >>> > Hello Tom,
> >>> >
> >>> > I've been working on this and by now i'm not really convinced i
> >>> > will get something useful enough for tomorrow.
> >>> > I'll share the state of my patch/findings with you tomorrow anyway
> >>> > but I'll probably need more work.
> >>> >
> >>> > Sorry about that, I knew this deoptimization code is complicated
> >>> > but using a non-physical frame(i.e. not a frame from the
> >>> > platform's native
> >>> > ABI) is more complicated than i thought.
> >>> >
> >>> > -Gilles
> >>> >
> >>> > On Mon, Jan 20, 2014 at 8:14 PM, Tom Deneau <tom.deneau at amd.com>
> >>> wrote:
> >>> >> Thanks, Gilles.
> >>> >>
> >>> >>> -----Original Message-----
> >>> >>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf
> >>> >>> Of Gilles Duboscq
> >>> >>> Sent: Monday, January 20, 2014 12:29 PM
> >>> >>> To: Deneau, Tom
> >>> >>> Subject: Re: actions -- Rebuilding the Interpreter Frames on the
> >>> >>> GPU
> >>> >>>
> >>> >>> Hello Tom,
> >>> >>>
> >>> >>> Yes i've looked at your webrev.
> >>> >>> Thank you.
> >>> >>>
> >>> >>> I also looked at the hotspot code and I have a rough idea of
> >>> >>> what is needed.
> >>> >>> Sorry for the late answer, I have a lot of things on my stack
> >>> >>> right
> >>> now.
> >>> >>>
> >>> >>> I intend to look at it this week and i hope to have at least
> >>> >>> something that you can experiment with on friday.
> >>> >>>
> >>> >>> -Gilles
> >>> >>>
> >>> >>> On Fri, Jan 17, 2014 at 10:23 PM, Tom Deneau
> >>> >>> <tom.deneau at amd.com>
> >>> wrote:
> >>> >>> > Hi Gilles --
> >>> >>> >
> >>> >>> > I assume you saw the notice of the webrev I uploaded that can
> >>> >>> > be
> >>> >>> inspected
> >>> >>> > (and also can be built, although we are not proposing it for
> >>> >>> > check-
> >>> >>> in).
> >>> >>> >
> >>> >>> > http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail
> >>> >>> > -
> >>> >>> debuginfo-for-gilles/webrev/
> >>> >>> >
> >>> >>> >
> >>> >>> > To help with our internal planning, can you give us a rough
> >>> >>> > estimate
> >>> >>> of how far
> >>> >>> > away the frame rebuilding interface might be?
> >>> >>> >
> >>> >>> > -- Tom
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> >> -----Original Message-----
> >>> >>> >> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On
> >>> >>> >> Behalf Of Gilles Duboscq
> >>> >>> >> Sent: Wednesday, January 15, 2014 4:38 AM
> >>> >>> >> To: Deneau, Tom
> >>> >>> >> Cc: Doug Simon; graal-dev at openjdk.java.net
> >>> >>> >> Subject: Re: actions -- Rebuilding the Interpreter Frames on
> >>> >>> >> the GPU
> >>> >>> >>
> >>> >>> >> Hello Tom,
> >>> >>> >>
> >>> >>> >> It's on my list, i already had a closer look at the frame
> >>> >>> >> rebuilding code.
> >>> >>> >> I would be interested to have a look at the code of your
> >>> >>> CodeInstaller
> >>> >>> >> subclass and the code you use to retrieve the runtime values
> >>> >>> >> so that
> >>> >>> i
> >>> >>> >> can experiment with it.
> >>> >>> >>
> >>> >>> >> -Gilles
> >>> >>> >>
> >>> >>> >> On Mon, Jan 13, 2014 at 5:09 PM, Tom Deneau
> >>> >>> >> <tom.deneau at amd.com>
> >>> >>> wrote:
> >>> >>> >> > Gilles, Doug --
> >>> >>> >> >
> >>> >>> >> > A status update on our end...
> >>> >>> >> >
> >>> >>> >> > * We now generate HSAIL code to save the register state
> >>> >>> >> > at deopt
> >>> >>> >> points
> >>> >>> >> >
> >>> >>> >> > * We have an HSAIL-specific CodeInstaller class based on
> >>> >>> >> > the
> >>> >>> >> changes
> >>> >>> >> > Doug added and we use this at compile time
> >>> >>> >> > (code-install
> >>> >>> >> > time)
> >>> >>> to
> >>> >>> >> > build the ScopeDescs. (This avoids the host-register
> >>> >>> >> > specific
> >>> >>> >> code
> >>> >>> >> > in the base CodeInstaller class).
> >>> >>> >> >
> >>> >>> >> > * At runtime, if we detect that a workitem deopted, we
> >>> >>> >> > map the
> >>> >>> >> saved "HSAIL pc"
> >>> >>> >> > to the relevant ScopeDesc and use each Location item
> >>> >>> >> > in the
> >>> >>> >> ScopeDesc
> >>> >>> >> > to retrieve the relevant HSAIL register from the HSAIL
> >>> >>> >> > frame
> >>> >>> >> (where the
> >>> >>> >> > registers were saved).
> >>> >>> >> >
> >>> >>> >> > Right now we just print out the live locals or expression
> >>> >>> >> > stack
> >>> >>> values
> >>> >>> >> > for the deopted workitem and they look correct. The next
> >>> >>> >> > step
> >>> >>> would
> >>> >>> >> be
> >>> >>> >> > to rebuild the interpreter frames.
> >>> >>> >> >
> >>> >>> >> > Can I get an update on the "C++ changes needed to easily
> >>> >>> >> > rebuild
> >>> >>> the
> >>> >>> >> > interpreter frames from a raw buffer provided by the GPU".
> >>> >>> >> >
> >>> >>> >> > -- Tom
> >>> >>> >> >
> >>> >>> >> >
> >>> >>> >> >
> >>> >>> >> >
> >>> >>> >> >> -----Original Message-----
> >>> >>> >> >> From: graal-dev-bounces at openjdk.java.net
> >>> >>> >> >> [mailto:graal-dev- bounces at openjdk.java.net] On Behalf Of
> >>> >>> >> >> Gilles Duboscq
> >>> >>> >> >> Sent: Friday, December 20, 2013 4:31 AM
> >>> >>> >> >> To: Doug Simon
> >>> >>> >> >> Cc: graal-dev at openjdk.java.net
> >>> >>> >> >> Subject: Re: actions
> >>> >>> >> >>
> >>> >>> >> >> As for me, I'll look into the C++ changes needed to easily
> >>> >>> >> >> rebuild
> >>> >>> >> the
> >>> >>> >> >> interpreter frames from a raw buffer provided by the GPU
> >>> >>> >> >> during deoptimization.
> >>> >>> >> >>
> >>> >>> >> >> -Gilles
> >>> >>> >> >>
> >>> >>> >> >>
> >>> >>> >> >> On Thu, Dec 19, 2013 at 11:27 PM, Doug Simon
> >>> >>> <doug.simon at oracle.com>
> >>> >>> >> >> wrote:
> >>> >>> >> >>
> >>> >>> >> >> > As a result of the Sumatra Skype meeting today on the
> >>> >>> >> >> > topic of
> >>> >>> how
> >>> >>> >> to
> >>> >>> >> >> > handle deopt for HSAIL & PTX, I’ve signed up to
> >>> >>> >> >> > investigate
> >>> >>> changes
> >>> >>> >> in
> >>> >>> >> >> > the
> >>> >>> >> >> > C++ layer of Graal to accommodate installing code whose
> >>> >>> >> >> > C++ debug
> >>> >>> info
> >>> >>> >> is
> >>> >>> >> >> > C++ not
> >>> >>> >> >> > in terms of host machine state (e.g. uses a different
> >>> >>> >> >> > register
> >>> >>> set
> >>> >>> >> >> > than the host register set).
> >>> >>> >> >> >
> >>> >>> >> >> > -Doug
> >>> >>> >> >> >
> >>> >>> >> >> > On Dec 19, 2013, at 11:02 PM, Deneau, Tom
> >>> >>> >> >> > <tom.deneau at amd.com>
> >>> >>> >> wrote:
> >>> >>> >> >> >
> >>> >>> >> >> > > Gilles, Doug --
> >>> >>> >> >> > >
> >>> >>> >> >> > > Could you post to the graal-dev list what the two
> >>> >>> >> >> > > action items
> >>> >>> >> you
> >>> >>> >> >> > > took
> >>> >>> >> >> > were?
> >>> >>> >> >> > >
> >>> >>> >> >> > > -- Tom
> >>> >>> >> >> >
> >>> >>> >> >> >
> >>> >>> >> >
> >>> >>> >
> >>> >>
More information about the graal-dev
mailing list