actions -- Rebuilding the Interpreter Frames on the GPU

Doug Simon doug.simon at oracle.com
Wed Jan 29 11:22:04 PST 2014


On Jan 29, 2014, at 7:21 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> Gilles --
> 
> I pushed an updated version of the webrev to
> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-debuginfo-for-gilles-v2/webrev/
> 
> As with the previous one, not proposing that this gets checked in
> but it should provide a basis for your experiments.
> 
> There haven't been any big structural changes since the first one.
> This one has merged with the latest default on Jan 29, which includes
> Doug Simon's patch to get rid of HSAILCompilationResult and use
> backend.CompileKernel instead.

Sorry for the delay. There is indeed one more substantial push I real hope to make tonight which completes the support for co-existing GPU backends[1]. Once I’ve pushed it, I’ll also send out a description of the changes.

-Doug

[1] https://bugs.openjdk.java.net/browse/GRAAL-1

>> -----Original Message-----
>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
>> Gilles Duboscq
>> Sent: Wednesday, January 29, 2014 6:36 AM
>> To: Deneau, Tom
>> Cc: graal-dev at openjdk.java.net
>> Subject: Re: actions -- Rebuilding the Interpreter Frames on the GPU
>> 
>> Tom,
>> 
>> Do you have an updated version of the webrev I based my work on so far?
>> Since I'm changing direction, it would probably be better if I base off
>> a recent version.
>> I think Doug is going to push some changes regarding multi-gpu support
>> later this afternoon (CET), so it would probably be better if it can be
>> based on something after that.
>> 
>> -Gilles
>> 
>> On Wed, Jan 29, 2014 at 12:07 AM, Gilles Duboscq <gilwooden at gmail.com>
>> wrote:
>>> Yes, it's all correct.
>>> This host code basically only contains code to handle the GPU code's
>>> depots which it handles by using ... depot again, but since we are on
>>> the host now, depot there is very simple.
>>> 
>>> On 28 Jan 2014 19:59, "Tom Deneau" <tom.deneau at amd.com> wrote:
>>>> 
>>>> Gilles --
>>>> 
>>>> I'm not sure I understand this 100% (and I can't say I understand how
>>>> OSR works) but this sounds like a good goal to avoid modifying the
>>>> hotspot deopt code, etc.
>>>> 
>>>> So is the following correct?
>>>>   * this second graph compiles to some funny host code which
>>>>     gets invoked at runtime via javaCall when the gpu de-opts?
>>>>     This host code is like a special compilation of the original
>>>> kernel method.
>>>> 
>>>>   * When the gpu sees a deopt and makes the javacall, it just
>>>>     needs to pass the unique de-opt location (int)
>>>>     and the set of saved gpu register/stack values.
>>>> 
>>>>   * And the funny host code will set up all the locals, expressions,
>> etc.
>>>>     and then does a normal host deopt...
>>>> 
>>>> If so, it sounds very clever... :)
>>>> 
>>>> -- Tom
>>>> 
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
>>>>> Gilles Duboscq
>>>>> Sent: Tuesday, January 28, 2014 12:29 PM
>>>>> To: Deneau, Tom
>>>>> Cc: graal-dev at openjdk.java.net
>>>>> Subject: Re: actions -- Rebuilding the Interpreter Frames on the
>>>>> GPU
>>>>> 
>>>>> Tom,
>>>>> 
>>>>> After further thinking, discussing and hacking into HotSpot, I
>>>>> think we've finally arrived to a reasonable battle plan. We have
>>>>> turned the problem around and the plan is to use a combination of
>>>>> something that looks like OSR and deoptimization:
>>>>> - Around the end of the compilation (just before going to LIR), I
>>>>> create a new graph based on the current graph:
>>>>>  - It gets 2 arguments a long (a pointer actually), and an int
>>>>>  - For each deopt in the original graph there is a unique int, the
>>>>> first thing this new graph does is a switch on this int.
>>>>>  - After this switch, it reads all the values necessary for the
>>>>> deopt's framestates from this long pointer (which probably simply
>>>>> points to the
>>>>> HSAILFrame)
>>>>>  - It then directly deopts from there.
>>>>> - When a deopt happens on the GPU, we do a JavaCall using something
>>>>> like JavaCalls::call_helper (javaCalls.cpp) with an additional
>>>>> argument for the entry point
>>>>> 
>>>>> I think doing deopt this way will avoid us a lot of problem
>> because:
>>>>> - we don't need to modify any of HotSpot's deopt code
>>>>> - the frames and nmethods involved look perfectly normal to HotSpot
>>>>> 
>>>>> My plan is:
>>>>> - make it possible for ExternalCompilationResult to contain both
>>>>> the External part (HSAIL things) and the host part (the code coming
>>>>> from this second graph)
>>>>> - Hook somewhere in the HSAIL backend to generate this second
>>>>> graph, compile it using the Host backend and combine the HSAIL and
>>>>> host results in the ExternalCompilationResult
>>>>> - Install this ExternalCompilationResult correctly in the code
>>>>> cache
>>>>> - Implement the final calling to JavaCalls::call_helper in
>>>>> gpu_hsail.cpp
>>>>> 
>>>>> -Gilles
>>>>> 
>>>>> On Tue, Jan 28, 2014 at 2:49 PM, Gilles Duboscq
>>>>> <duboscq at ssw.jku.at>
>>>>> wrote:
>>>>>> On Mon, Jan 27, 2014 at 8:35 PM, Tom Deneau <tom.deneau at amd.com>
>>>>> wrote:
>>>>>>> Gilles --
>>>>>>> 
>>>>>>> I took a look at your diff file and it seems we are mostly
>>>>>>> headed in the right direction.
>>>>>>> 
>>>>>>> Regarding this paragraph
>>>>>>>> Right now i'm trying to see how i can modify
>>>>>>>> fetch_unroll_info_helper to minimise its relying on frames.
>>>>>>>> This
>>>>> needs quite a bit of refactoring.
>>>>>>>> Part of this also requires figuring out exactly what will be
>>>>>>>> the frame layout when we will call it. I suppose that to avoid
>>>>>>>> to many changes we can call a stub similar to the
>>>>>>>> deopt/uncommon_trap stub from sharedRuntime_x86_64.cpp.
>>>>>>>> 
>>>>>>> 
>>>>>>> I was assuming the frame layout would be what the HSAILFrame
>>>>> structure shows.
>>>>>>> For now there will only be one level of HSAILFrame and we will
>>>>>>> always have 32 saved $s registers, 16 saved $d registers, even
>>>>>>> if some are not necessary, but the HSAILFrame has provisions for
>> saving fewer.
>>>>>> 
>>>>>> Yes but in the deoptimization code HotSpot expects frame values
>>>>>> (frame.hpp), and frame is a platform specific class (see
>>>>>> frame_x86.hpp and friends). I'm not sure we really win something
>>>>>> by making the HSAIL frames look the same as the host
>>>>>> architecture: that would require some changes and there are still
>>>>>> assumptions that these frames are on the stack.
>>>>>> 
>>>>>>> 
>>>>>>> If there are other layouts for HSAILFrame that make this easier,
>>>>>>> let
>>>>> me know.
>>>>>>> 
>>>>>>> Also, I'm not sure what you mean by "call a stub similar to the
>>>>>>> deopt/uncommon_trap stub from sharedRuntime_x86_64.cpp".
>>>>>> 
>>>>>> Deoptimization::fetch_unroll_info_helper makes some assumptions
>>>>>> on the layout of the frames leading to it. For example expects to
>>>>>> be called from a stub: either the deopt_blob
>>>>>> (SharedRuntime::generate_deopt_blob) or the uncommon_trap_blob
>>>>>> (SharedRuntime::generate_uncommon_trap_blob).
>>>>>> I was talking about this with Tom Rodriguez and what we probably
>>>>>> want is to do a standard JavaCall which would land on such a
>>>>>> stub, this would make it easier to end up with a valid-
>> looking/walk-able stack.
>>>>>> 
>>>>>>> 
>>>>>>> -- Tom
>>>>>>> 
>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On
>>>>>>>> Behalf Of Gilles Duboscq
>>>>>>>> Sent: Friday, January 24, 2014 12:07 PM
>>>>>>>> To: Deneau, Tom
>>>>>>>> Subject: Re: actions -- Rebuilding the Interpreter Frames on
>>>>>>>> the GPU
>>>>>>>> 
>>>>>>>> Hello Tom,
>>>>>>>> 
>>>>>>>> I'm sending you my current diff, mostly for you information
>>>>>>>> because it probably wouldn't compile or run.
>>>>>>>> 
>>>>>>>> For the deopt process what we need to do is:
>>>>>>>> -Get the UnrollBlock from
>>>>>>>> Deoptimization::fetch_unroll_info_helper
>>>>>>>> -Rebuild the "skeletal frames" (walkable and with PCs but no
>>>>>>>> values) using this UnrollBlock (see for example
>>>>>>>> sharedRuntime_x86_64.cpp starting around line 3530) -Run
>>>>>>>> Deoptimization::unpack_frames which will fill the skeletal
>>>>>>>> frames with values using the UnrollBlock
>>>>>>>> 
>>>>>>>> This work relies on vframes (here compiledVFrames)
>>>>>>>> corresponding to the java frames that are contained in the
>>>>>>>> method that just
>>>>> deoptimized.
>>>>>>>> Usually theses vframes reference a particular frame (from
>>>>>>>> frame.hpp, i.e. a physical frame from the host machine).
>>>>>>>> Sub-classing frame is not really possible (I spent some time
>>>>>>>> looking at that but that doesn't seem reasonable) but
>>>>>>>> subclassing compiledVFrame should be easy, that's what i did in
>>>>> HsailCompiledVFrame.
>>>>>>>> HsailCompiledVFrame references the HSAILFrame and uses it in
>>>>>>>> HsailCompiledVFrame::create_stack_value which is what creates
>>>>>>>> StackValues which are later used to retrieve the data.
>>>>>>>> 
>>>>>>>> Right now i'm trying to see how i can modify
>>>>>>>> fetch_unroll_info_helper to minimise its relying on frames.
>>>>>>>> This
>>>>> needs quite a bit of refactoring.
>>>>>>>> Part of this also requires figuring out exactly what will be
>>>>>>>> the frame layout when we will call it. I suppose that to avoid
>>>>>>>> to many changes we can call a stub similar to the
>>>>>>>> deopt/uncommon_trap stub from sharedRuntime_x86_64.cpp.
>>>>>>>> 
>>>>>>>> A few questions:
>>>>>>>> why would there be multiple HSAILFrame? Is there a stack and
>>>>>>>> method calls in HSAIL? if that's not the case then HSAILFrame
>>>>>>>> should be an HSAIL equivalant of frame: only one frame since
>>>>>>>> there is only one physical frame.
>>>>>>>> I'm not entirely sure why we need the HSAILLocation. It's
>>>>>>>> useful now during development but I suppose it should not be
>>>>>>>> needed any more once we go through the StackValues. Did you
>>>>>>>> have a specific use in mind beyond development tests?
>>>>>>>> 
>>>>>>>> -Gilles
>>>>>>>> 
>>>>>>>> On Thu, Jan 23, 2014 at 10:10 PM, Gilles Duboscq
>>>>>>>> <duboscq at ssw.jku.at>
>>>>>>>> wrote:
>>>>>>>>> Hello Tom,
>>>>>>>>> 
>>>>>>>>> I've been working on this and by now i'm not really convinced
>>>>>>>>> i will get something useful enough for tomorrow.
>>>>>>>>> I'll share the state of my patch/findings with you tomorrow
>>>>>>>>> anyway but I'll probably need more work.
>>>>>>>>> 
>>>>>>>>> Sorry about that, I knew this deoptimization code is
>>>>>>>>> complicated but using a non-physical frame(i.e. not a frame
>>>>>>>>> from the platform's native
>>>>>>>>> ABI) is more complicated than i thought.
>>>>>>>>> 
>>>>>>>>> -Gilles
>>>>>>>>> 
>>>>>>>>> On Mon, Jan 20, 2014 at 8:14 PM, Tom Deneau
>>>>>>>>> <tom.deneau at amd.com>
>>>>>>>> wrote:
>>>>>>>>>> Thanks, Gilles.
>>>>>>>>>> 
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On
>>>>>>>>>>> Behalf Of Gilles Duboscq
>>>>>>>>>>> Sent: Monday, January 20, 2014 12:29 PM
>>>>>>>>>>> To: Deneau, Tom
>>>>>>>>>>> Subject: Re: actions -- Rebuilding the Interpreter Frames
>>>>>>>>>>> on the GPU
>>>>>>>>>>> 
>>>>>>>>>>> Hello Tom,
>>>>>>>>>>> 
>>>>>>>>>>> Yes i've looked at your webrev.
>>>>>>>>>>> Thank you.
>>>>>>>>>>> 
>>>>>>>>>>> I also looked at the hotspot code and I have a rough idea
>>>>>>>>>>> of what is needed.
>>>>>>>>>>> Sorry for the late answer, I have a lot of things on my
>>>>>>>>>>> stack right
>>>>>>>> now.
>>>>>>>>>>> 
>>>>>>>>>>> I intend to look at it this week and i hope to have at
>>>>>>>>>>> least something that you can experiment with on friday.
>>>>>>>>>>> 
>>>>>>>>>>> -Gilles
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Jan 17, 2014 at 10:23 PM, Tom Deneau
>>>>>>>>>>> <tom.deneau at amd.com>
>>>>>>>> wrote:
>>>>>>>>>>>> Hi Gilles --
>>>>>>>>>>>> 
>>>>>>>>>>>> I assume you saw the notice of the webrev I uploaded that
>>>>>>>>>>>> can be
>>>>>>>>>>> inspected
>>>>>>>>>>>> (and also can be built, although we are not proposing it
>>>>>>>>>>>> for
>>>>>>>>>>>> check-
>>>>>>>>>>> in).
>>>>>>>>>>>> 
>>>>>>>>>>>> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-
>>>>>>>>>>>> hsail
>>>>>>>>>>>> -
>>>>>>>>>>> debuginfo-for-gilles/webrev/
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> To help with our internal planning, can you give us a
>>>>>>>>>>>> rough estimate
>>>>>>>>>>> of how far
>>>>>>>>>>>> away the frame rebuilding interface might be?
>>>>>>>>>>>> 
>>>>>>>>>>>> -- Tom
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com]
>>>>>>>>>>>>> On Behalf Of Gilles Duboscq
>>>>>>>>>>>>> Sent: Wednesday, January 15, 2014 4:38 AM
>>>>>>>>>>>>> To: Deneau, Tom
>>>>>>>>>>>>> Cc: Doug Simon; graal-dev at openjdk.java.net
>>>>>>>>>>>>> Subject: Re: actions -- Rebuilding the Interpreter
>>>>>>>>>>>>> Frames on the GPU
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hello Tom,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It's on my list, i already had a closer look at the
>>>>>>>>>>>>> frame rebuilding code.
>>>>>>>>>>>>> I would be interested to have a look at the code of your
>>>>>>>>>>> CodeInstaller
>>>>>>>>>>>>> subclass and the code you use to retrieve the runtime
>>>>>>>>>>>>> values so that
>>>>>>>>>>> i
>>>>>>>>>>>>> can experiment with it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Gilles
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Jan 13, 2014 at 5:09 PM, Tom Deneau
>>>>>>>>>>>>> <tom.deneau at amd.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Gilles, Doug --
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> A status update on our end...
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>   * We now generate HSAIL code to save the register
>>>>>>>>>>>>>> state at deopt
>>>>>>>>>>>>> points
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>   * We have an HSAIL-specific CodeInstaller class
>>>>>>>>>>>>>> based on the
>>>>>>>>>>>>> changes
>>>>>>>>>>>>>>     Doug added and we use this at compile time
>>>>>>>>>>>>>> (code-install
>>>>>>>>>>>>>> time)
>>>>>>>>>>> to
>>>>>>>>>>>>>>     build the ScopeDescs.  (This avoids the
>>>>>>>>>>>>>> host-register specific
>>>>>>>>>>>>> code
>>>>>>>>>>>>>>     in the base CodeInstaller class).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>   * At runtime, if we detect that a workitem deopted,
>>>>>>>>>>>>>> we map the
>>>>>>>>>>>>> saved "HSAIL pc"
>>>>>>>>>>>>>>     to the relevant ScopeDesc and use each Location
>>>>>>>>>>>>>> item in the
>>>>>>>>>>>>> ScopeDesc
>>>>>>>>>>>>>>     to retrieve the relevant HSAIL register from the
>>>>>>>>>>>>>> HSAIL frame
>>>>>>>>>>>>> (where the
>>>>>>>>>>>>>>     registers were saved).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Right now we just print out the live locals or
>>>>>>>>>>>>>> expression stack
>>>>>>>>>>> values
>>>>>>>>>>>>>> for the deopted workitem and they look correct.  The
>>>>>>>>>>>>>> next step
>>>>>>>>>>> would
>>>>>>>>>>>>> be
>>>>>>>>>>>>>> to rebuild the interpreter frames.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Can I get an update on the "C++ changes needed to
>>>>>>>>>>>>>> easily rebuild
>>>>>>>>>>> the
>>>>>>>>>>>>>> interpreter frames from a raw buffer provided by the
>> GPU".
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -- Tom
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: graal-dev-bounces at openjdk.java.net
>>>>>>>>>>>>>>> [mailto:graal-dev- bounces at openjdk.java.net] On
>>>>>>>>>>>>>>> Behalf Of Gilles Duboscq
>>>>>>>>>>>>>>> Sent: Friday, December 20, 2013 4:31 AM
>>>>>>>>>>>>>>> To: Doug Simon
>>>>>>>>>>>>>>> Cc: graal-dev at openjdk.java.net
>>>>>>>>>>>>>>> Subject: Re: actions
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> As for me, I'll look into the C++ changes needed to
>>>>>>>>>>>>>>> easily rebuild
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> interpreter frames from a raw buffer provided by the
>>>>>>>>>>>>>>> GPU during deoptimization.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -Gilles
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Thu, Dec 19, 2013 at 11:27 PM, Doug Simon
>>>>>>>>>>> <doug.simon at oracle.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> As a result of the Sumatra Skype meeting today on
>>>>>>>>>>>>>>>> the topic of
>>>>>>>>>>> how
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> handle deopt for HSAIL & PTX, I’ve signed up to
>>>>>>>>>>>>>>>> investigate
>>>>>>>>>>> changes
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> C++ layer of Graal to accommodate installing code
>>>>>>>>>>>>>>>> C++ whose debug
>>>>>>>>>>> info
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> C++ not
>>>>>>>>>>>>>>>> in terms of host machine state (e.g. uses a
>>>>>>>>>>>>>>>> different register
>>>>>>>>>>> set
>>>>>>>>>>>>>>>> than the host register set).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -Doug
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Dec 19, 2013, at 11:02 PM, Deneau, Tom
>>>>>>>>>>>>>>>> <tom.deneau at amd.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Gilles, Doug --
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Could you post to the graal-dev list what the two
>>>>>>>>>>>>>>>>> action items
>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>> took
>>>>>>>>>>>>>>>> were?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -- Tom
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>> 
>>> 
> 



More information about the graal-dev mailing list