webrev-hsail-deopt-v2

Deneau, Tom tom.deneau at amd.com
Tue Mar 25 18:01:54 UTC 2014


I ended up creating (or re-creating) an HSAILCompilationResultBuilder whose only purpose is to save the lirGen.
It works, but let me know if that meets the style.

-- Tom

> -----Original Message-----
> From: graal-dev [mailto:graal-dev-bounces at openjdk.java.net] On Behalf Of
> Deneau, Tom
> Sent: Tuesday, March 25, 2014 12:10 PM
> To: Doug Simon; Gilles Duboscq
> Cc: graal-dev at openjdk.java.net
> Subject: RE: webrev-hsail-deopt-v2
> 
> Doug, Gilles --
> 
> Regarding the
>      private LIRGenerator lirGen;
> in HSAILHotSpotBackend.java.
> 
> I agree with your statement that lirGen should be per-compilation, not
> per backend.
> I added it only because it was needed by the code Gilles added at the
> end of emitCode
> 
>         compilationResult.setHostGraph(((HSAILHotSpotLIRGenerator)
> lirGen).prepareHostGraph());
> 
> I see that lirGen is passed in to newCompilationResultBuilder and used
> in that method but was not saved anywhere.
> 
> I thought perhaps it could be saved in CompilationResultBuilder itself,
> but that led to the dreaded loops in the project graphs (if I made
> com.oracle.graal.compiler project a dependency of com.oracle.graal.lir).
> 
> So what is the best way to solve this problem?
> 
> -- Tom
> 
> 
> > -----Original Message-----
> > From: Doug Simon [mailto:doug.simon at oracle.com]
> > Sent: Monday, March 24, 2014 6:13 AM
> > To: Deneau, Tom
> > Cc: graal-dev at openjdk.java.net
> > Subject: Re: webrev-hsail-deopt-v2
> >
> > Tom,
> >
> > Thanks for fixing the issues in the first patch. Apart from some
> > seemingly excessive memory usage (which can hopefully be address
> > later), the logic in this patch looks mostly correct to me. However,
> > Gilles needs to also provide feedback before we can integrate this
> > patch as he is more familiar with some of the details.
> >
> > Per-file comments below.
> >
> > graal/com.oracle.graal.hotspot.hsail/src/com/oracle/graal/hotspot/hsai
> > l/
> > HSAILHotSpotBackend.java:
> >
> > These cannot be fields in HSAILHotSpotBackend as they are per-
> > compilation values, not per-backend values:
> >
> > +    private HSAILAssembler asm;
> > +    private LIRGenerator lirGen;
> >
> > There's no real value in having HSAILNonNullParametersPhase extend
> > NonNullParametersPhase since the former overrides all the logic in the
> > latter.
> >
> > graal/com.oracle.graal.hotspot.hsail/src/com/oracle/graal/hotspot/hsai
> > l/
> > HSAILHotSpotLIRGenerator.java:
> >
> > Please create named constants for 40 and 8 that explains what they
> are:
> >
> > 204         if (regNumber < 40) {
> > 205             long offset = config.hsailFrameSaveAreaOffset + 4 *
> > (regNumber - 8);
> >
> > src/gpu/hsail/vm/gpu_hsail.hpp:
> >
> > It would be nice to see the CU acronym explained somewhere.
> >
> > I know that you are thinking of supporting multiple frames in the
> > future but I would not include code partially supporting that now
> > otherwise it just adds confusion. For example, the first frame is
> > contained in a HSAILKernelDeoptimization object, but
> > HSAILFrame::_nextFrame is a pointer to a HSAILFrame. Who
> allocates/deallocates the next frame(s)?
> >
> > 46     HSAILFrame _first_frame;
> >
> > What's the point of HSAILComputeUnitSaveState? In future will it do
> > more than simply wrap a HSAILKernelDeoptimization?
> >
> > 55   class HSAILComputeUnitSaveState {
> > 56     friend class VMStructs;
> > 57    public:
> > 58     HSAILKernelDeoptimization  _deopt_info;
> > 59   };
> >
> > Why not make the definition extra clear by making the preprocessor do
> > the math (i.e. "#define CU_SAVE_STATES_SIZE (8 * 36 * 64)")
> >
> > 61 // 8 cus * 36 waves per cu * wavesize 64
> > 62 #define CU_SAVE_STATES_SIZE     18432
> >
> > Let's use a longer name for clarity (e.g.,
> > _sizeof_compute_unit_save_state). This should also be reflect in
> > HotSpotVMConfig.
> >
> > 69     jint _sizeof_cuss;
> >
> > src/gpu/hsail/vm/gpu_hsail.cpp:
> >
> > I was going to suggest stack allocating the HSAILDeoptimizationInfo
> > until I realized it's 20Mb! Is that a conservative approach we can
> > improve upon somehow (e.g., a lazily created thread local)? In any
> > case, shouldn't the actual allocation be within the "if
> (useDeoptInfo)" guard?
> >
> > 149   HSAILDeoptimizationInfo* e = new (ResourceObj::C_HEAP,
> mtInternal)
> > HSAILDeoptimizationInfo();
> > 150   if (useDeoptInfo) {
> >
> > -Doug
> >
> > On Mar 20, 2014, at 9:57 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> >
> > > Doug, Gilles --
> > >
> > > I believe I have gotten rid of the warnings from the original hsail-
> > deopt webrev.
> > > I also fixed a few casts that Eric Caspole noticed were preventing
> > > the
> > fastdebug build from building
> > > (but did not affect product and debug builds).    Please take a look
> > at
> > >
> > > http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-deopt
> > > -
> > v2
> > >
> > > I've included here the explanatory text from the first submission
> > >
> > > This consists of at least the start of support for deoptimization in
> > HSAIL kernels.  Although the list of files changed may look long, many
> > of the files have only a few lines changed.  Special thanks to Gilles
> > Duboscq and Doug Simon who provided some of the infrastructure that
> > this webrev uses.
> > >
> > > -- Tom Deneau
> > >
> > >
> > > Below I have described
> > >
> > >   * an overview of the codepaths the data structures
> > >   * java and hotspot source changes
> > >
> > >
> > > Deoptimization Data Structures and Overview
> > > ===========================================
> > >
> > > At kernel dispatch time, we allocate space for any workitems should
> > want to deopt.  To reduce the space requirements, space is only
> > reserved for the maximum number of possible concurrent workitems that
> > could be running on that hsa device.
> > >
> > > A deopting workitem sets a "deopt happened" flag, and future
> > > workitems
> > that see "deopt happened" as true will just set a flag saying they
> > never ran and exit early.  Currently the never_ran array is one per
> workitem.
> > We are looking at ways to make this smaller but HSA devices have a lot
> > of freedom in how they schedule workitems (current hardware and the
> > simulator are quite different).
> > >
> > > Workitems that deopt atomically bump an index saying where they
> > > should
> > store their deopt data.  The deopt data consists of
> > >   * workitemid
> > >   * deopt actionAndReason
> > >   * the first HSAILFrame
> > >
> > > An HSAILFrame consists of
> > >   * the deoptId or "pc" offset where the deopt occurred
> > >   * number of s registers
> > >   * number of d registers
> > >   * a bitmap indicating which d registers are oops
> > >   * space for saving the d and s registers
> > >
> > > Currently we always set num_s_registers to 32 and num_d_registers to
> > > 16 but in the hsail code of the kernel we only save the union of the
> > actual registers that are live at any of the infopoints.
> > >
> > > On return from the GPU, we check if there were any deopts.  If not,
> > > we
> > just return back to java.  If there was at least one deopt then
> > >
> > >   a) for the workitems that finished normally, there is nothing to
> > > do
> > >
> > >   b) if there are any deopted workitems, we want to run each
> deopting
> > >      workitem thru the interpreter going first thru the special host
> > >      trampoline code infrastructure that Gilles created.  The
> > >      trampoline host code takes the deoptId and a pointer to the
> > >      saved hsail frame.  We currently do this sequentially although
> > >      other policies are possible.
> > >
> > >   c) for each never ran workitem, we can just run it from the
> > >      beginning of the kernel "method", just making sure we pass the
> > >      arguments and the appropriate workitem id for each one.  Again,
> > >      we currently do this sequentially although other policies are
> > >      possible.
> > >
> > > Because running either type b or c above can cause GCs, and because
> > some of our saved d registers are pointers into the java heap, we take
> > care in case any of these saved pointers are affected by GC.  The
> > current strategy of using an Object array supplied by the java side
> > will be replaced later with an oops_do type of strategy.
> > >
> > >
> > >
> > > Description of source changes in this webrev.
> > > =============================================
> > > graal source changes
> > > ====================
> > >
> > > Assembler, HSAILAssembler
> > >   minor changes for new instructions needed for saving deopt
> > information
> > >
> > > GraalKernelTester.java
> > >   force simulator to run single threaded.
> > >
> > > KernelTester.java
> > >   minor changes to handle exceptions which escape the kernel method
> > >
> > > HSAILLIRGenerator.java
> > >   support switches with keys of type long
> > >
> > >
> > > HSAILHotSpotBackend.java
> > >
> > >   * compileKernel uses some new optimisticOpts which help generate
> > >     deopts when needed.  Also, we dump the infopoints if Log:CodeGen
> > >     is on
> > >
> > >   * HSAILNonNullParametersPhase stamps the appropriate parameters as
> > >     nonNull
> > >
> > >   * installKernel uses the new trampoline infrastructure added by
> > >     Gilles do produce the host trampoline deopt method and install
> > >     it.
> > >
> > >   * emitCode adds a little bit of code to the prologue and a lot of
> > >     code to the epilogue.  See description at the bottom for the
> data
> > >     structures used by the never-ran path and the deopt path.
> > >
> > > HSAILHotSpotLIRGenerator.java
> > >
> > >   * code added by Gilles to build the host graph for the host
> > >     trampoline deopt method.  I suppose some of this would be common
> > >     to any gpu trampoline deopt and should be moved to some
> > >     hsail-independent location.
> > >
> > >   * code to handle the creation of a DeoptimizeOp for emitting HSAIL
> > >     code for a deoptimization
> > >
> > > HSAILHotSpotLoweringProvider.java
> > >
> > >   * refactored to support different strategies for different nodes.
> > >     UnwindNode strategy is to get replaced by a DeoptimizeNode.
> > >
> > > HotSpotVMConfig.java
> > >
> > >  * define offets to fields in the deopt data structures
> > >
> > > VMErrorNode.java
> > >
> > >  * public access to constructor (used by building of host graph for
> > >    trampoline code)
> > >
> > > HSAIL.java
> > >  * some new non-allocatable registers defined (used by deopt paths)
> > >
> > > HSAILControlFlow.java
> > >  * code to emit hsail for a deoptimizationNode
> > >
> > > ComputeProbabilityClosure.java
> > >  * just using a change that Gilles made in the patch he gave me.
> > >
> > >
> > > mx/projects was affected by the move of ExternalCompilationResult to
> > com.oracle.graal.gpu.  In addition, several files had one line import
> > changes from the move of ExternalCompilationResult.
> > >
> > >
> > > hotspot source changes
> > > ======================
> > >
> > > gpu_hsail.cpp, hpp
> > >
> > >   * the execute_kernel routine pushes an extra parameter where deopt
> > info can be saved
> > >
> > >   * while pushing kernel args, keeps track if any are null and if so
> > >     sets some new gpu_exception fields in thread structure which
> gets
> > >     used when thread returns to java mode
> > >
> > >   * on return from kernel checks if any deopt occurred.  If so,
> > >
> > >      * runs any deopting workitems thru the trampoline deopt code
> > >        which ends up running the kernel method thru the interpreter
> > >        for that workitem.
> > >
> > >      * runs any never-ran workitems using simple javaCall.
> > >
> > > gpu_hsail_Frame.hpp
> > >   * new structure that defines the layout of a physical HSAIL frame
> > >
> > > hsailArgumentsBase.*, hsailKernelArguments.hpp
> > hsailJavaCallArguments.hpp
> > >   * refactored to share code between kernel argument setup and
> > >     javaCall argument setup
> > >
> > > javaClasses.cpp
> > >
> > >   * contains logic to check the new gpu_exception fields in thread
> > >     structure and if detected, set as top frame on return
> > >
> > > graalCompiler.cpp, hpp
> > >   * logic added by Gilles for external_deopt_i2c
> > >
> > > javaCalls.cpp, hpp
> > >   * logic added by Gilles for external_deopt_i2c
> > >
> > > sharedRuntime.cpp
> > >   * maybe Gilles can explain why the assert was removed in the patch
> > >     he gave me (it asserts if I put it back in)
> > >
> > > thread.cpp, hpp
> > >   * handle new gpu_exception fields
> > >
> > > vmStructs.cpp
> > > vmStructs_hsail.hpp
> > >   * handle new hsail deopt structs
> > >
> > >
> > >
> > >
> >
> 
> 




More information about the graal-dev mailing list