class gpu
Deneau, Tom
tom.deneau at amd.com
Thu Feb 6 07:50:46 PST 2014
Doug --
The code can be seen at
https://github.com/HSAFoundation/Okra-Interface-to-HSAIL-Simulator/blob/master/src/cpp/okraContextSimulator.cpp
line 318 thru 320.
If necessary, you should be able to build using the instructions at
https://github.com/HSAFoundation/Okra-Interface-to-HSAIL-Simulator
-- Tom
> -----Original Message-----
> From: Doug Simon [mailto:doug.simon at oracle.com]
> Sent: Thursday, February 06, 2014 4:41 AM
> To: Deneau, Tom
> Cc: graal-dev at openjdk.java.net
> Subject: Re: class gpu
>
>
> On Feb 5, 2014, at 9:29 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>
> > Doug --
> >
> > Sorry about the delay, there are now a set of okra-1.7* jars up at
> > http://cr.openjdk.java.net/~tdeneau/
> > Can you make the version change in mx/projects?
>
> Done.
>
> >
> > * the logger from OkraContext is gone
>
> Thanks.
>
> > * I wasn't able to reproduce the problem you mentioned with deleting
> > temporary files
>
> If I run 'mx -vm server unittest hsail', those temp files are left
> behind. Where is the code that deletes these files? Maybe there's
> something weird on my machine that I can look into if I have the
> sources.
>
> -Doug
>
> > -----Original Message-----
> >> From: Doug Simon [mailto:doug.simon at oracle.com]
> >> Sent: Monday, February 03, 2014 4:32 PM
> >> To: Deneau, Tom
> >> Cc: graal-dev at openjdk.java.net
> >> Subject: Re: class gpu
> >>
> >> Tom,
> >>
> >> I have the proposed changes ready for pushing. However, the use of
> >> java.util.logging in OkraContext prevents the DaCapo benchmarks from
> >> running. The static initializer in OkraContext.java derived from:
> >>
> >> private static final Logger logger =
> >> Logger.getLogger("okracontext");
> >>
> >> causes the field
> >> java.util.logging.LogManager.initializedGlobalHandlers
> >> to be reset to false (I have no idea why). This causes
> >> re-initialization of the root logger during DaCapo benchmark
> >> execution which (for some other unknown reason) causes the benchmarks
> >> to start logging to the console. Finally, this causes the DaCapo
> >> output validation to fail. You can see this (only on Linux) by
> >> executing a benchmark without and then with -XX:+UseHSAILSimulator:
> >>
> >> $ mx dacapo fop
> >> Bootstrapping Graal................................. in 17688 ms
> >> (compiled 3326 methods) ===== DaCapo 9.12 fop starting ===== =====
> >> DaCapo 9.12 fop PASSED in 2793 msec ===== $ mx dacapo
> >> -XX:+UseHSAILSimulator fop Bootstrapping
> >> Graal................................. in 18249 ms (compiled 3323
> >> methods) ===== DaCapo 9.12 fop starting ===== Digest validation
> >> failed for stderr.log, expecting
> >> 0xda39a3ee5e6b4b0d3255bfef95601890afd80709 found
> >> 0x2199068d93c2bfe53159a85954d3fb3bb437ac9b
> >> ===== DaCapo 9.12 fop FAILED =====
> >> Validation FAILED for fop default
> >> Benchmark failures: ['fop']
> >>
> >> It's hard to say where the fundamental problem is. I would have
> >> thought it's safe for JDK code to use logging without impacting
> >> application code. However, since there is exactly one logging
> >> statement in OkraContext, the simplest solution is to remove use of
> >> logging altogether (replacing it with something like a
> >> System.out.println() guarded by a system property). Once the Okra
> >> jars have been updated with this fix, I can push the other changes.
> >>
> >> -Doug
> >>
> >> On Feb 3, 2014, at 5:41 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> >>
> >>> OK, sounds like a plan...
> >>>
> >>>> -----Original Message-----
> >>>> From: Doug Simon [mailto:doug.simon at oracle.com]
> >>>> Sent: Monday, February 03, 2014 10:40 AM
> >>>> To: Deneau, Tom
> >>>> Cc: graal-dev at openjdk.java.net
> >>>> Subject: Re: class gpu
> >>>>
> >>>> On Feb 3, 2014, at 5:04 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> >>>>
> >>>>> Doug --
> >>>>>
> >>>>> I am wondering whether we need the old setup where class gpu
> >> included
> >>>> classes ptx and hsail.
> >>>>>
> >>>>> I have noticed that if hsail/vm/gpu_hsail.hpp tries to include
> >>>>> something like like graalEnv.hpp, then because of the way
> >>>>> gpu_hsail.hpp gets included in gpu.hpp, if graalEnv.hpp is not
> >>>>> included already earlier, then it gets defined in the scope of
> >>>>> gpu::hsail and then cannot be seen at the outermost scope for
> >>>>> other
> >>>> later hpp files (which also try to include graalEnv.hpp) to use
> them.
> >>>> Which makes the whole thing more fragile.
> >>>>>
> >>>>> Workarounds seem to be:
> >>>>> * include the graalEnv.hpp and such in gpu.hpp itself before the
> >>>> class gpu scoping
> >>>>> so they are always defined outside the scope of gpu::hsail
> first.
> >>>> This is what
> >>>>> I am currently doing but that doesn't feel right.
> >>>>>
> >>>>> * Move such hpp files into precompiled.hpp, also doesn't feel
> >> right.
> >>>>>
> >>>>> * Do we really need scoping of hsail class within the gpu class,
> >>>>> or
> >>>> should we instead be using
> >>>>> namespaces. (We would have to pick a different name from that
> >>>>> of
> >>>> the gpu class itself).
> >>>>> So gpu_hsail.hpp could look something like
> >>>>>
> >>>>> // includes defined at outermost scope
> >>>>> #include "graalEnv.hpp"
> >>>>> namespace GPU {
> >>>>> namespace hsail {
> >>>>> //... actual definitions
> >>>>> }
> >>>>> }
> >>>>
> >>>> I think the best solution is to simply make the Hsail and Ptx C++
> >>>> classes not be nested within the gpu class. We should avoid
> >> namespaces
> >>>> as I see this construct is not used in the rest of the HotSpot code
> >> base
> >>>> (apart from some Shark code).
> >>>>
> >>>> I just quickly tried pulling Ptx and Hsail outside of gpu and
> >> everything
> >>>> appears to work fine. I'll include this change in the push that
> >> removes
> >>>> the UseHSAILSimulator option (once Eric confirms that's the right
> >> thing
> >>>> to do).
> >>>>
> >>>>> * Also, with the gpu refactoring, I think no C++ code actually
> >> calls
> >>>> anything in gpu::hsail (or gpu::ptx)
> >>>>> so do they even need to be defined in gpu.hpp?
> >>>>
> >>>> Nope. I'll pull them out as well.
> >>>>
> >>>> -Doug
> >>>>
> >>>>>> -----Original Message-----
> >>>>>> From: graal-dev-bounces at openjdk.java.net [mailto:graal-dev-
> >>>>>> bounces at openjdk.java.net] On Behalf Of Deneau, Tom
> >>>>>> Sent: Sunday, February 02, 2014 10:01 AM
> >>>>>> To: Doug Simon
> >>>>>> Cc: graal-dev at openjdk.java.net
> >>>>>> Subject: hooking in HsailCodeInstaller
> >>>>>>
> >>>>>> Doug --
> >>>>>>
> >>>>>> Although the webrev I provided to Gilles at
> >>>>>> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-
> >>>>>> debuginfo-for-gilles-v4/webrev/
> >>>>>> is not meant for checkin, could you glance at the code for
> >>>>>> hooking
> >> in
> >>>>>> the HsailCodeInstaller and see if it is the right general
> pattern.
> >>>>>>
> >>>>>> starting at HSAILHotSpotBackend.installKernel and going thru
> >>>>>> gpu::hsail::installHsailCode
> >>>>>>
> >>>>>> It felt like lots of code from existing routines had to be copied
> >>>>>> with only a few lines changed in the middle to call the
> >>>>>> HsailCodeInstaller.
> >>>>>>
> >>>>>> -- Tom
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Deneau, Tom
> >>>>>>> Sent: Sunday, February 02, 2014 9:50 AM
> >>>>>>> To: 'Gilles Duboscq'
> >>>>>>> Cc: 'graal-dev at openjdk.java.net'
> >>>>>>> Subject: RE: actions -- Rebuilding the Interpreter Frames on the
> >> GPU
> >>>>>>>
> >>>>>>> Gilles --
> >>>>>>>
> >>>>>>> As mentioned in a separate email, the v3 webrev had a flaw in
> >>>>>>> that it did not go thru the HsailCodeInstaller to set the scope
> >>>>>>> values for locals,
> >>>>>> expressions,
> >>>>>>> etc.
> >>>>>>> Our rudimentary runtime support doesn't actually use these
> >>>>>>> values yet (that comes with your deopt-to-interpreter support)
> >>>>>>> so we only print them out in some debugging configurations.
> >>>>>>> Anyway, the
> >> junit
> >>>>>>> tests we had did not fail if this HsailCodeInstaller support was
> >>>>>>> missing.
> >>>>>>>
> >>>>>>> So the following v4 webrev does use the HsailCodeInstaller and
> >>>>>>> should
> >>>>>> be
> >>>>>>> used
> >>>>>>> for your experiments:
> >>>>>>> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-
> >>>>>>> debuginfo-for-gilles-v4/webrev/
> >>>>>>>
> >>>>>>> -- Tom
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Deneau, Tom
> >>>>>>>> Sent: Friday, January 31, 2014 7:37 AM
> >>>>>>>> To: Deneau, Tom; 'Gilles Duboscq'
> >>>>>>>> Cc: 'graal-dev at openjdk.java.net'
> >>>>>>>> Subject: RE: actions -- Rebuilding the Interpreter Frames on
> >>>>>>>> the GPU
> >>>>>>>>
> >>>>>>>> Gilles --
> >>>>>>>>
> >>>>>>>> Yet another updated version of the webrev can be found at
> >>>>>>>> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-
> >>>>>>>> debuginfo-for-gilles-v3/webrev/
> >>>>>>>>
> >>>>>>>> This one merged with Jan 31 trunk which includes Doug's more
> >>>>>> extensive
> >>>>>>>> GPU changes.
> >>>>>>>> The tests should all still pass on the simulator.
> >>>>>>>>
> >>>>>>>> -- Tom
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Deneau, Tom
> >>>>>>>>> Sent: Wednesday, January 29, 2014 12:22 PM
> >>>>>>>>> To: 'Gilles Duboscq'
> >>>>>>>>> Cc: graal-dev at openjdk.java.net
> >>>>>>>>> Subject: RE: actions -- Rebuilding the Interpreter Frames on
> >>>>>>>>> the
> >>>>>> GPU
> >>>>>>>>>
> >>>>>>>>> Gilles --
> >>>>>>>>>
> >>>>>>>>> I pushed an updated version of the webrev to
> >>>>>>>>> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail
> >>>>>>>>> - debuginfo-for-gilles-v2/webrev/
> >>>>>>>>>
> >>>>>>>>> As with the previous one, not proposing that this gets checked
> >> in
> >>>>>>> but
> >>>>>>>> it
> >>>>>>>>> should provide a basis for your experiments.
> >>>>>>>>>
> >>>>>>>>> There haven't been any big structural changes since the first
> >> one.
> >>>>>>>>> This one has merged with the latest default on Jan 29, which
> >>>>>>> includes
> >>>>>>>>> Doug Simon's patch to get rid of HSAILCompilationResult and
> >>>>>>>>> use backend.CompileKernel instead.
> >>>>>>>>>
> >>>>>>>>> The junits, including the new ones based on bounds checks, etc
> >>>>>>> should
> >>>>>>>>> pass when run with the hsail simulator.
> >>>>>>>>>
> >>>>>>>>> Let me know if your run into any problems with this..
> >>>>>>>>>
> >>>>>>>>> -- Tom
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On
> >> Behalf
> >>>>>>> Of
> >>>>>>>>>> Gilles Duboscq
> >>>>>>>>>> Sent: Wednesday, January 29, 2014 6:36 AM
> >>>>>>>>>> To: Deneau, Tom
> >>>>>>>>>> Cc: graal-dev at openjdk.java.net
> >>>>>>>>>> Subject: Re: actions -- Rebuilding the Interpreter Frames on
> >> the
> >>>>>>> GPU
> >>>>>>>>>>
> >>>>>>>>>> Tom,
> >>>>>>>>>>
> >>>>>>>>>> Do you have an updated version of the webrev I based my work
> >>>>>>>>>> on
> >>>>>> so
> >>>>>>>>> far?
> >>>>>>>>>> Since I'm changing direction, it would probably be better if
> >>>>>>>>>> I
> >>>>>>> base
> >>>>>>>>>> off a recent version.
> >>>>>>>>>> I think Doug is going to push some changes regarding
> >>>>>>>>>> multi-gpu
> >>>>>>>> support
> >>>>>>>>>> later this afternoon (CET), so it would probably be better if
> >> it
> >>>>>>> can
> >>>>>>>>>> be based on something after that.
> >>>>>>>>>>
> >>>>>>>>>> -Gilles
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jan 29, 2014 at 12:07 AM, Gilles Duboscq
> >>>>>>>> <gilwooden at gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>> Yes, it's all correct.
> >>>>>>>>>>> This host code basically only contains code to handle the
> >>>>>>>>>>> GPU
> >>>>>>>> code's
> >>>>>>>>>>> depots which it handles by using ... depot again, but since
> >>>>>>>>>>> we
> >>>>>>> are
> >>>>>>>>>>> on the host now, depot there is very simple.
> >>>>>>>>>>>
> >>>>>>>>>>> On 28 Jan 2014 19:59, "Tom Deneau" <tom.deneau at amd.com>
> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Gilles --
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm not sure I understand this 100% (and I can't say I
> >>>>>>> understand
> >>>>>>>>>>>> how OSR works) but this sounds like a good goal to avoid
> >>>>>>>> modifying
> >>>>>>>>>>>> the hotspot deopt code, etc.
> >>>>>>>>>>>>
> >>>>>>>>>>>> So is the following correct?
> >>>>>>>>>>>> * this second graph compiles to some funny host code which
> >>>>>>>>>>>> gets invoked at runtime via javaCall when the gpu de-
> >>>>>> opts?
> >>>>>>>>>>>> This host code is like a special compilation of the
> >>>>>>> original
> >>>>>>>>>>>> kernel method.
> >>>>>>>>>>>>
> >>>>>>>>>>>> * When the gpu sees a deopt and makes the javacall, it
> >>>>>> just
> >>>>>>>>>>>> needs to pass the unique de-opt location (int)
> >>>>>>>>>>>> and the set of saved gpu register/stack values.
> >>>>>>>>>>>>
> >>>>>>>>>>>> * And the funny host code will set up all the locals,
> >>>>>>>>>>>> expressions,
> >>>>>>>>>> etc.
> >>>>>>>>>>>> and then does a normal host deopt...
> >>>>>>>>>>>>
> >>>>>>>>>>>> If so, it sounds very clever... :)
> >>>>>>>>>>>>
> >>>>>>>>>>>> -- Tom
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On
> >>>>>>>> Behalf
> >>>>>>>>>>>>> Of Gilles Duboscq
> >>>>>>>>>>>>> Sent: Tuesday, January 28, 2014 12:29 PM
> >>>>>>>>>>>>> To: Deneau, Tom
> >>>>>>>>>>>>> Cc: graal-dev at openjdk.java.net
> >>>>>>>>>>>>> Subject: Re: actions -- Rebuilding the Interpreter Frames
> >>>>>> on
> >>>>>>>> the
> >>>>>>>>>>>>> GPU
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Tom,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> After further thinking, discussing and hacking into
> >>>>>> HotSpot,
> >>>>>>> I
> >>>>>>>>>>>>> think we've finally arrived to a reasonable battle plan.
> >>>>>>>>>>>>> We
> >>>>>>>> have
> >>>>>>>>>>>>> turned the problem around and the plan is to use a
> >>>>>>> combination
> >>>>>>>> of
> >>>>>>>>>>>>> something that looks like OSR and deoptimization:
> >>>>>>>>>>>>> - Around the end of the compilation (just before going to
> >>>>>>> LIR),
> >>>>>>>> I
> >>>>>>>>>>>>> create a new graph based on the current graph:
> >>>>>>>>>>>>> - It gets 2 arguments a long (a pointer actually), and an
> >>>>>>> int
> >>>>>>>>>>>>> - For each deopt in the original graph there is a unique
> >>>>>>> int,
> >>>>>>>>>>>>> the first thing this new graph does is a switch on this
> >>>>>> int.
> >>>>>>>>>>>>> - After this switch, it reads all the values necessary
> >>>>>> for
> >>>>>>>> the
> >>>>>>>>>>>>> deopt's framestates from this long pointer (which probably
> >>>>>>>> simply
> >>>>>>>>>>>>> points to the
> >>>>>>>>>>>>> HSAILFrame)
> >>>>>>>>>>>>> - It then directly deopts from there.
> >>>>>>>>>>>>> - When a deopt happens on the GPU, we do a JavaCall using
> >>>>>>>>>>>>> something like JavaCalls::call_helper (javaCalls.cpp) with
> >>>>>> an
> >>>>>>>>>>>>> additional argument for the entry point
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I think doing deopt this way will avoid us a lot of
> >>>>>>>>>>>>> problem
> >>>>>>>>>> because:
> >>>>>>>>>>>>> - we don't need to modify any of HotSpot's deopt code
> >>>>>>>>>>>>> - the frames and nmethods involved look perfectly normal
> >>>>>>>>>>>>> to HotSpot
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> My plan is:
> >>>>>>>>>>>>> - make it possible for ExternalCompilationResult to
> >>>>>>>>>>>>> contain
> >>>>>>>> both
> >>>>>>>>>>>>> the External part (HSAIL things) and the host part (the
> >>>>>> code
> >>>>>>>>>>>>> coming from this second graph)
> >>>>>>>>>>>>> - Hook somewhere in the HSAIL backend to generate this
> >>>>>> second
> >>>>>>>>>>>>> graph, compile it using the Host backend and combine the
> >>>>>>> HSAIL
> >>>>>>>>>>>>> and host results in the ExternalCompilationResult
> >>>>>>>>>>>>> - Install this ExternalCompilationResult correctly in the
> >>>>>>> code
> >>>>>>>>>>>>> cache
> >>>>>>>>>>>>> - Implement the final calling to JavaCalls::call_helper in
> >>>>>>>>>>>>> gpu_hsail.cpp
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Gilles
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Jan 28, 2014 at 2:49 PM, Gilles Duboscq
> >>>>>>>>>>>>> <duboscq at ssw.jku.at>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>> On Mon, Jan 27, 2014 at 8:35 PM, Tom Deneau
> >>>>>>>>>>>>>> <tom.deneau at amd.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>> Gilles --
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I took a look at your diff file and it seems we are
> >>>>>> mostly
> >>>>>>>>>>>>>>> headed in the right direction.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Regarding this paragraph
> >>>>>>>>>>>>>>>> Right now i'm trying to see how i can modify
> >>>>>>>>>>>>>>>> fetch_unroll_info_helper to minimise its relying on
> >>>>>>> frames.
> >>>>>>>>>>>>>>>> This
> >>>>>>>>>>>>> needs quite a bit of refactoring.
> >>>>>>>>>>>>>>>> Part of this also requires figuring out exactly what
> >>>>>> will
> >>>>>>>> be
> >>>>>>>>>>>>>>>> the frame layout when we will call it. I suppose that
> >>>>>> to
> >>>>>>>>>>>>>>>> avoid to many changes we can call a stub similar to the
> >>>>>>>>>>>>>>>> deopt/uncommon_trap stub from sharedRuntime_x86_64.cpp.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I was assuming the frame layout would be what the
> >>>>>>> HSAILFrame
> >>>>>>>>>>>>> structure shows.
> >>>>>>>>>>>>>>> For now there will only be one level of HSAILFrame and
> >>>>>> we
> >>>>>>>> will
> >>>>>>>>>>>>>>> always have 32 saved $s registers, 16 saved $d
> >>>>>> registers,
> >>>>>>>> even
> >>>>>>>>>>>>>>> if some are not necessary, but the HSAILFrame has
> >>>>>>> provisions
> >>>>>>>>>>>>>>> for
> >>>>>>>>>> saving fewer.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes but in the deoptimization code HotSpot expects frame
> >>>>>>>> values
> >>>>>>>>>>>>>> (frame.hpp), and frame is a platform specific class (see
> >>>>>>>>>>>>>> frame_x86.hpp and friends). I'm not sure we really win
> >>>>>>>>>>>>>> something by making the HSAIL frames look the same as the
> >>>>>>>> host
> >>>>>>>>>>>>>> architecture: that would require some changes and there
> >>>>>> are
> >>>>>>>>>>>>>> still assumptions that these frames are on the stack.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If there are other layouts for HSAILFrame that make this
> >>>>>>>>>>>>>>> easier, let
> >>>>>>>>>>>>> me know.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Also, I'm not sure what you mean by "call a stub similar
> >>>>>>> to
> >>>>>>>>>>>>>>> the deopt/uncommon_trap stub from
> >>>>>>> sharedRuntime_x86_64.cpp".
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Deoptimization::fetch_unroll_info_helper makes some
> >>>>>>>> assumptions
> >>>>>>>>>>>>>> on the layout of the frames leading to it. For example
> >>>>>>>> expects
> >>>>>>>>>>>>>> to be called from a stub: either the deopt_blob
> >>>>>>>>>>>>>> (SharedRuntime::generate_deopt_blob) or the
> >>>>>>>> uncommon_trap_blob
> >>>>>>>>>>>>>> (SharedRuntime::generate_uncommon_trap_blob).
> >>>>>>>>>>>>>> I was talking about this with Tom Rodriguez and what we
> >>>>>>>>>>>>>> probably want is to do a standard JavaCall which would
> >>>>>> land
> >>>>>>>> on
> >>>>>>>>>>>>>> such a stub, this would make it easier to end up with a
> >>>>>>>> valid-
> >>>>>>>>>> looking/walk-able stack.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> -- Tom
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>>>>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com]
> >>>>>> On
> >>>>>>>>>>>>>>>> Behalf Of Gilles Duboscq
> >>>>>>>>>>>>>>>> Sent: Friday, January 24, 2014 12:07 PM
> >>>>>>>>>>>>>>>> To: Deneau, Tom
> >>>>>>>>>>>>>>>> Subject: Re: actions -- Rebuilding the Interpreter
> >>>>>> Frames
> >>>>>>>> on
> >>>>>>>>>>>>>>>> the GPU
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hello Tom,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I'm sending you my current diff, mostly for you
> >>>>>>> information
> >>>>>>>>>>>>>>>> because it probably wouldn't compile or run.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> For the deopt process what we need to do is:
> >>>>>>>>>>>>>>>> -Get the UnrollBlock from
> >>>>>>>>>>>>>>>> Deoptimization::fetch_unroll_info_helper
> >>>>>>>>>>>>>>>> -Rebuild the "skeletal frames" (walkable and with PCs
> >>>>>> but
> >>>>>>>> no
> >>>>>>>>>>>>>>>> values) using this UnrollBlock (see for example
> >>>>>>>>>>>>>>>> sharedRuntime_x86_64.cpp starting around line 3530) -
> >>>>>> Run
> >>>>>>>>>>>>>>>> Deoptimization::unpack_frames which will fill the
> >>>>>>> skeletal
> >>>>>>>>>>>>>>>> frames with values using the UnrollBlock
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> This work relies on vframes (here compiledVFrames)
> >>>>>>>>>>>>>>>> corresponding to the java frames that are contained in
> >>>>>>> the
> >>>>>>>>>>>>>>>> method that just
> >>>>>>>>>>>>> deoptimized.
> >>>>>>>>>>>>>>>> Usually theses vframes reference a particular frame
> >>>>>> (from
> >>>>>>>>>>>>>>>> frame.hpp, i.e. a physical frame from the host
> >>>>>> machine).
> >>>>>>>>>>>>>>>> Sub-classing frame is not really possible (I spent some
> >>>>>>>> time
> >>>>>>>>>>>>>>>> looking at that but that doesn't seem reasonable) but
> >>>>>>>>>>>>>>>> subclassing compiledVFrame should be easy, that's what
> >>>>>> i
> >>>>>>>> did
> >>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>> HsailCompiledVFrame.
> >>>>>>>>>>>>>>>> HsailCompiledVFrame references the HSAILFrame and uses
> >>>>>> it
> >>>>>>>> in
> >>>>>>>>>>>>>>>> HsailCompiledVFrame::create_stack_value which is what
> >>>>>>>> creates
> >>>>>>>>>>>>>>>> StackValues which are later used to retrieve the data.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Right now i'm trying to see how i can modify
> >>>>>>>>>>>>>>>> fetch_unroll_info_helper to minimise its relying on
> >>>>>>> frames.
> >>>>>>>>>>>>>>>> This
> >>>>>>>>>>>>> needs quite a bit of refactoring.
> >>>>>>>>>>>>>>>> Part of this also requires figuring out exactly what
> >>>>>> will
> >>>>>>>> be
> >>>>>>>>>>>>>>>> the frame layout when we will call it. I suppose that
> >>>>>> to
> >>>>>>>>>>>>>>>> avoid to many changes we can call a stub similar to the
> >>>>>>>>>>>>>>>> deopt/uncommon_trap stub from sharedRuntime_x86_64.cpp.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> A few questions:
> >>>>>>>>>>>>>>>> why would there be multiple HSAILFrame? Is there a
> >>>>>> stack
> >>>>>>>> and
> >>>>>>>>>>>>>>>> method calls in HSAIL? if that's not the case then
> >>>>>>>> HSAILFrame
> >>>>>>>>>>>>>>>> should be an HSAIL equivalant of frame: only one frame
> >>>>>>>> since
> >>>>>>>>>>>>>>>> there is only one physical frame.
> >>>>>>>>>>>>>>>> I'm not entirely sure why we need the HSAILLocation.
> >>>>>> It's
> >>>>>>>>>>>>>>>> useful now during development but I suppose it should
> >>>>>> not
> >>>>>>>> be
> >>>>>>>>>>>>>>>> needed any more once we go through the StackValues. Did
> >>>>>>> you
> >>>>>>>>>>>>>>>> have a specific use in mind beyond development tests?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -Gilles
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Thu, Jan 23, 2014 at 10:10 PM, Gilles Duboscq
> >>>>>>>>>>>>>>>> <duboscq at ssw.jku.at>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>> Hello Tom,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I've been working on this and by now i'm not really
> >>>>>>>>>>>>>>>>> convinced i will get something useful enough for
> >>>>>>>> tomorrow.
> >>>>>>>>>>>>>>>>> I'll share the state of my patch/findings with you
> >>>>>>>> tomorrow
> >>>>>>>>>>>>>>>>> anyway but I'll probably need more work.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Sorry about that, I knew this deoptimization code is
> >>>>>>>>>>>>>>>>> complicated but using a non-physical frame(i.e. not a
> >>>>>>>> frame
> >>>>>>>>>>>>>>>>> from the platform's native
> >>>>>>>>>>>>>>>>> ABI) is more complicated than i thought.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> -Gilles
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Mon, Jan 20, 2014 at 8:14 PM, Tom Deneau
> >>>>>>>>>>>>>>>>> <tom.deneau at amd.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>> Thanks, Gilles.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>>>>>>>> From: gilwooden at gmail.com
> >>>>>>> [mailto:gilwooden at gmail.com]
> >>>>>>>> On
> >>>>>>>>>>>>>>>>>>> Behalf Of Gilles Duboscq
> >>>>>>>>>>>>>>>>>>> Sent: Monday, January 20, 2014 12:29 PM
> >>>>>>>>>>>>>>>>>>> To: Deneau, Tom
> >>>>>>>>>>>>>>>>>>> Subject: Re: actions -- Rebuilding the Interpreter
> >>>>>>>> Frames
> >>>>>>>>>>>>>>>>>>> on the GPU
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hello Tom,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Yes i've looked at your webrev.
> >>>>>>>>>>>>>>>>>>> Thank you.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I also looked at the hotspot code and I have a
> >>>>>> rough
> >>>>>>>> idea
> >>>>>>>>>>>>>>>>>>> of what is needed.
> >>>>>>>>>>>>>>>>>>> Sorry for the late answer, I have a lot of things
> >>>>>> on
> >>>>>>> my
> >>>>>>>>>>>>>>>>>>> stack right
> >>>>>>>>>>>>>>>> now.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I intend to look at it this week and i hope to have
> >>>>>>> at
> >>>>>>>>>>>>>>>>>>> least something that you can experiment with on
> >>>>>>> friday.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> -Gilles
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Fri, Jan 17, 2014 at 10:23 PM, Tom Deneau
> >>>>>>>>>>>>>>>>>>> <tom.deneau at amd.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>> Hi Gilles --
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I assume you saw the notice of the webrev I
> >>>>>>> uploaded
> >>>>>>>>>>>>>>>>>>>> that can be
> >>>>>>>>>>>>>>>>>>> inspected
> >>>>>>>>>>>>>>>>>>>> (and also can be built, although we are not
> >>>>>>> proposing
> >>>>>>>>>>>>>>>>>>>> it for
> >>>>>>>>>>>>>>>>>>>> check-
> >>>>>>>>>>>>>>>>>>> in).
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~tdeneau/graal-
> >>>>>>>> webrevs/webre
> >>>>>>>>>>>>>>>>>>>> v-
> >>>>>>>>>>>>>>>>>>>> hsail
> >>>>>>>>>>>>>>>>>>>> -
> >>>>>>>>>>>>>>>>>>> debuginfo-for-gilles/webrev/
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> To help with our internal planning, can you give
> >>>>>> us
> >>>>>>> a
> >>>>>>>>>>>>>>>>>>>> rough estimate
> >>>>>>>>>>>>>>>>>>> of how far
> >>>>>>>>>>>>>>>>>>>> away the frame rebuilding interface might be?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> -- Tom
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>>>>>>>>>> From: gilwooden at gmail.com
> >>>>>>>> [mailto:gilwooden at gmail.com]
> >>>>>>>>>>>>>>>>>>>>> On Behalf Of Gilles Duboscq
> >>>>>>>>>>>>>>>>>>>>> Sent: Wednesday, January 15, 2014 4:38 AM
> >>>>>>>>>>>>>>>>>>>>> To: Deneau, Tom
> >>>>>>>>>>>>>>>>>>>>> Cc: Doug Simon; graal-dev at openjdk.java.net
> >>>>>>>>>>>>>>>>>>>>> Subject: Re: actions -- Rebuilding the
> >>>>>> Interpreter
> >>>>>>>>>>>>>>>>>>>>> Frames on the GPU
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hello Tom,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> It's on my list, i already had a closer look at
> >>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> frame rebuilding code.
> >>>>>>>>>>>>>>>>>>>>> I would be interested to have a look at the code
> >>>>>>> of
> >>>>>>>>>>>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>>> CodeInstaller
> >>>>>>>>>>>>>>>>>>>>> subclass and the code you use to retrieve the
> >>>>>>>> runtime
> >>>>>>>>>>>>>>>>>>>>> values so that
> >>>>>>>>>>>>>>>>>>> i
> >>>>>>>>>>>>>>>>>>>>> can experiment with it.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> -Gilles
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Mon, Jan 13, 2014 at 5:09 PM, Tom Deneau
> >>>>>>>>>>>>>>>>>>>>> <tom.deneau at amd.com>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>> Gilles, Doug --
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> A status update on our end...
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> * We now generate HSAIL code to save the
> >>>>>>>> register
> >>>>>>>>>>>>>>>>>>>>>> state at deopt
> >>>>>>>>>>>>>>>>>>>>> points
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> * We have an HSAIL-specific CodeInstaller
> >>>>>>> class
> >>>>>>>>>>>>>>>>>>>>>> based on the
> >>>>>>>>>>>>>>>>>>>>> changes
> >>>>>>>>>>>>>>>>>>>>>> Doug added and we use this at compile
> >>>>>> time
> >>>>>>>>>>>>>>>>>>>>>> (code-install
> >>>>>>>>>>>>>>>>>>>>>> time)
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> build the ScopeDescs. (This avoids the
> >>>>>>>>>>>>>>>>>>>>>> host-register specific
> >>>>>>>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>>>>> in the base CodeInstaller class).
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> * At runtime, if we detect that a workitem
> >> deopted,
> >>>>>>>>>>>>>>>>>>>>>> we map the
> >>>>>>>>>>>>>>>>>>>>> saved "HSAIL pc"
> >>>>>>>>>>>>>>>>>>>>>> to the relevant ScopeDesc and use each
> >>>>>>>> Location
> >>>>>>>>>>>>>>>>>>>>>> item in the
> >>>>>>>>>>>>>>>>>>>>> ScopeDesc
> >>>>>>>>>>>>>>>>>>>>>> to retrieve the relevant HSAIL register
> >>>>>>> from
> >>>>>>>>>>>>>>>>>>>>>> the HSAIL frame
> >>>>>>>>>>>>>>>>>>>>> (where the
> >>>>>>>>>>>>>>>>>>>>>> registers were saved).
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Right now we just print out the live locals or
> >>>>>>>>>>>>>>>>>>>>>> expression stack
> >>>>>>>>>>>>>>>>>>> values
> >>>>>>>>>>>>>>>>>>>>>> for the deopted workitem and they look
> >>>>>> correct.
> >>>>>>>> The
> >>>>>>>>>>>>>>>>>>>>>> next step
> >>>>>>>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>> to rebuild the interpreter frames.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Can I get an update on the "C++ changes needed
> >>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> easily rebuild
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> interpreter frames from a raw buffer provided
> >>>>>> by
> >>>>>>>> the
> >>>>>>>>>> GPU".
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> -- Tom
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>>>>>>>>>>>> From: graal-dev-bounces at openjdk.java.net
> >>>>>>>>>>>>>>>>>>>>>>> [mailto:graal-dev- bounces at openjdk.java.net]
> >>>>>> On
> >>>>>>>>>>>>>>>>>>>>>>> Behalf Of Gilles Duboscq
> >>>>>>>>>>>>>>>>>>>>>>> Sent: Friday, December 20, 2013 4:31 AM
> >>>>>>>>>>>>>>>>>>>>>>> To: Doug Simon
> >>>>>>>>>>>>>>>>>>>>>>> Cc: graal-dev at openjdk.java.net
> >>>>>>>>>>>>>>>>>>>>>>> Subject: Re: actions
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> As for me, I'll look into the C++ changes
> >>>>>>> needed
> >>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>> easily rebuild
> >>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> interpreter frames from a raw buffer provided
> >>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>> the GPU during deoptimization.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> -Gilles
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 19, 2013 at 11:27 PM, Doug Simon
> >>>>>>>>>>>>>>>>>>> <doug.simon at oracle.com>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> As a result of the Sumatra Skype meeting
> >>>>>>> today
> >>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>> the topic of
> >>>>>>>>>>>>>>>>>>> how
> >>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> handle deopt for HSAIL & PTX, I've signed
> >>>>>> up
> >>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> investigate
> >>>>>>>>>>>>>>>>>>> changes
> >>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> C++ layer of Graal to accommodate
> >>>>>> installing
> >>>>>>>> code
> >>>>>>>>>>>>>>>>>>>>>>>> C++ whose debug
> >>>>>>>>>>>>>>>>>>> info
> >>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>> C++ not
> >>>>>>>>>>>>>>>>>>>>>>>> in terms of host machine state (e.g. uses a
> >>>>>>>>>>>>>>>>>>>>>>>> different register
> >>>>>>>>>>>>>>>>>>> set
> >>>>>>>>>>>>>>>>>>>>>>>> than the host register set).
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> -Doug
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Dec 19, 2013, at 11:02 PM, Deneau, Tom
> >>>>>>>>>>>>>>>>>>>>>>>> <tom.deneau at amd.com>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Gilles, Doug --
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Could you post to the graal-dev list what
> >>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> two action items
> >>>>>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>>>>> took
> >>>>>>>>>>>>>>>>>>>>>>>> were?
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> -- Tom
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
>
More information about the graal-dev
mailing list