webrev-hsail-deopt-v2

Deneau, Tom tom.deneau at amd.com
Tue Mar 25 20:17:04 UTC 2014


Doug --

Do you want the next webrev merged with the tip of trunk or is it OK to leave it as is branching off trunk from last week?

-- Tom


> -----Original Message-----
> From: Doug Simon [mailto:doug.simon at oracle.com]
> Sent: Tuesday, March 25, 2014 2:14 PM
> To: Deneau, Tom
> Cc: Gilles Duboscq; graal-dev at openjdk.java.net
> Subject: Re: webrev-hsail-deopt-v2
> 
> Sounds like a reasonable workaround to me.
> 
> On Mar 25, 2014, at 7:01 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> 
> > I ended up creating (or re-creating) an HSAILCompilationResultBuilder
> whose only purpose is to save the lirGen.
> > It works, but let me know if that meets the style.
> >
> > -- Tom
> >
> >> -----Original Message-----
> >> From: graal-dev [mailto:graal-dev-bounces at openjdk.java.net] On Behalf
> Of
> >> Deneau, Tom
> >> Sent: Tuesday, March 25, 2014 12:10 PM
> >> To: Doug Simon; Gilles Duboscq
> >> Cc: graal-dev at openjdk.java.net
> >> Subject: RE: webrev-hsail-deopt-v2
> >>
> >> Doug, Gilles --
> >>
> >> Regarding the
> >>     private LIRGenerator lirGen;
> >> in HSAILHotSpotBackend.java.
> >>
> >> I agree with your statement that lirGen should be per-compilation,
> not
> >> per backend.
> >> I added it only because it was needed by the code Gilles added at the
> >> end of emitCode
> >>
> >>        compilationResult.setHostGraph(((HSAILHotSpotLIRGenerator)
> >> lirGen).prepareHostGraph());
> >>
> >> I see that lirGen is passed in to newCompilationResultBuilder and
> used
> >> in that method but was not saved anywhere.
> >>
> >> I thought perhaps it could be saved in CompilationResultBuilder
> itself,
> >> but that led to the dreaded loops in the project graphs (if I made
> >> com.oracle.graal.compiler project a dependency of
> com.oracle.graal.lir).
> >>
> >> So what is the best way to solve this problem?
> >>
> >> -- Tom
> >>
> >>
> >>> -----Original Message-----
> >>> From: Doug Simon [mailto:doug.simon at oracle.com]
> >>> Sent: Monday, March 24, 2014 6:13 AM
> >>> To: Deneau, Tom
> >>> Cc: graal-dev at openjdk.java.net
> >>> Subject: Re: webrev-hsail-deopt-v2
> >>>
> >>> Tom,
> >>>
> >>> Thanks for fixing the issues in the first patch. Apart from some
> >>> seemingly excessive memory usage (which can hopefully be address
> >>> later), the logic in this patch looks mostly correct to me. However,
> >>> Gilles needs to also provide feedback before we can integrate this
> >>> patch as he is more familiar with some of the details.
> >>>
> >>> Per-file comments below.
> >>>
> >>>
> graal/com.oracle.graal.hotspot.hsail/src/com/oracle/graal/hotspot/hsai
> >>> l/
> >>> HSAILHotSpotBackend.java:
> >>>
> >>> These cannot be fields in HSAILHotSpotBackend as they are per-
> >>> compilation values, not per-backend values:
> >>>
> >>> +    private HSAILAssembler asm;
> >>> +    private LIRGenerator lirGen;
> >>>
> >>> There's no real value in having HSAILNonNullParametersPhase extend
> >>> NonNullParametersPhase since the former overrides all the logic in
> the
> >>> latter.
> >>>
> >>>
> graal/com.oracle.graal.hotspot.hsail/src/com/oracle/graal/hotspot/hsai
> >>> l/
> >>> HSAILHotSpotLIRGenerator.java:
> >>>
> >>> Please create named constants for 40 and 8 that explains what they
> >> are:
> >>>
> >>> 204         if (regNumber < 40) {
> >>> 205             long offset = config.hsailFrameSaveAreaOffset + 4 *
> >>> (regNumber - 8);
> >>>
> >>> src/gpu/hsail/vm/gpu_hsail.hpp:
> >>>
> >>> It would be nice to see the CU acronym explained somewhere.
> >>>
> >>> I know that you are thinking of supporting multiple frames in the
> >>> future but I would not include code partially supporting that now
> >>> otherwise it just adds confusion. For example, the first frame is
> >>> contained in a HSAILKernelDeoptimization object, but
> >>> HSAILFrame::_nextFrame is a pointer to a HSAILFrame. Who
> >> allocates/deallocates the next frame(s)?
> >>>
> >>> 46     HSAILFrame _first_frame;
> >>>
> >>> What's the point of HSAILComputeUnitSaveState? In future will it do
> >>> more than simply wrap a HSAILKernelDeoptimization?
> >>>
> >>> 55   class HSAILComputeUnitSaveState {
> >>> 56     friend class VMStructs;
> >>> 57    public:
> >>> 58     HSAILKernelDeoptimization  _deopt_info;
> >>> 59   };
> >>>
> >>> Why not make the definition extra clear by making the preprocessor
> do
> >>> the math (i.e. "#define CU_SAVE_STATES_SIZE (8 * 36 * 64)")
> >>>
> >>> 61 // 8 cus * 36 waves per cu * wavesize 64
> >>> 62 #define CU_SAVE_STATES_SIZE     18432
> >>>
> >>> Let's use a longer name for clarity (e.g.,
> >>> _sizeof_compute_unit_save_state). This should also be reflect in
> >>> HotSpotVMConfig.
> >>>
> >>> 69     jint _sizeof_cuss;
> >>>
> >>> src/gpu/hsail/vm/gpu_hsail.cpp:
> >>>
> >>> I was going to suggest stack allocating the HSAILDeoptimizationInfo
> >>> until I realized it's 20Mb! Is that a conservative approach we can
> >>> improve upon somehow (e.g., a lazily created thread local)? In any
> >>> case, shouldn't the actual allocation be within the "if
> >> (useDeoptInfo)" guard?
> >>>
> >>> 149   HSAILDeoptimizationInfo* e = new (ResourceObj::C_HEAP,
> >> mtInternal)
> >>> HSAILDeoptimizationInfo();
> >>> 150   if (useDeoptInfo) {
> >>>
> >>> -Doug
> >>>
> >>> On Mar 20, 2014, at 9:57 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> >>>
> >>>> Doug, Gilles --
> >>>>
> >>>> I believe I have gotten rid of the warnings from the original
> hsail-
> >>> deopt webrev.
> >>>> I also fixed a few casts that Eric Caspole noticed were preventing
> >>>> the
> >>> fastdebug build from building
> >>>> (but did not affect product and debug builds).    Please take a
> look
> >>> at
> >>>>
> >>>> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-
> deopt
> >>>> -
> >>> v2
> >>>>
> >>>> I've included here the explanatory text from the first submission
> >>>>
> >>>> This consists of at least the start of support for deoptimization
> in
> >>> HSAIL kernels.  Although the list of files changed may look long,
> many
> >>> of the files have only a few lines changed.  Special thanks to
> Gilles
> >>> Duboscq and Doug Simon who provided some of the infrastructure that
> >>> this webrev uses.
> >>>>
> >>>> -- Tom Deneau
> >>>>
> >>>>
> >>>> Below I have described
> >>>>
> >>>>  * an overview of the codepaths the data structures
> >>>>  * java and hotspot source changes
> >>>>
> >>>>
> >>>> Deoptimization Data Structures and Overview
> >>>> ===========================================
> >>>>
> >>>> At kernel dispatch time, we allocate space for any workitems should
> >>> want to deopt.  To reduce the space requirements, space is only
> >>> reserved for the maximum number of possible concurrent workitems
> that
> >>> could be running on that hsa device.
> >>>>
> >>>> A deopting workitem sets a "deopt happened" flag, and future
> >>>> workitems
> >>> that see "deopt happened" as true will just set a flag saying they
> >>> never ran and exit early.  Currently the never_ran array is one per
> >> workitem.
> >>> We are looking at ways to make this smaller but HSA devices have a
> lot
> >>> of freedom in how they schedule workitems (current hardware and the
> >>> simulator are quite different).
> >>>>
> >>>> Workitems that deopt atomically bump an index saying where they
> >>>> should
> >>> store their deopt data.  The deopt data consists of
> >>>>  * workitemid
> >>>>  * deopt actionAndReason
> >>>>  * the first HSAILFrame
> >>>>
> >>>> An HSAILFrame consists of
> >>>>  * the deoptId or "pc" offset where the deopt occurred
> >>>>  * number of s registers
> >>>>  * number of d registers
> >>>>  * a bitmap indicating which d registers are oops
> >>>>  * space for saving the d and s registers
> >>>>
> >>>> Currently we always set num_s_registers to 32 and num_d_registers
> to
> >>>> 16 but in the hsail code of the kernel we only save the union of
> the
> >>> actual registers that are live at any of the infopoints.
> >>>>
> >>>> On return from the GPU, we check if there were any deopts.  If not,
> >>>> we
> >>> just return back to java.  If there was at least one deopt then
> >>>>
> >>>>  a) for the workitems that finished normally, there is nothing to
> >>>> do
> >>>>
> >>>>  b) if there are any deopted workitems, we want to run each
> >> deopting
> >>>>     workitem thru the interpreter going first thru the special host
> >>>>     trampoline code infrastructure that Gilles created.  The
> >>>>     trampoline host code takes the deoptId and a pointer to the
> >>>>     saved hsail frame.  We currently do this sequentially although
> >>>>     other policies are possible.
> >>>>
> >>>>  c) for each never ran workitem, we can just run it from the
> >>>>     beginning of the kernel "method", just making sure we pass the
> >>>>     arguments and the appropriate workitem id for each one.  Again,
> >>>>     we currently do this sequentially although other policies are
> >>>>     possible.
> >>>>
> >>>> Because running either type b or c above can cause GCs, and because
> >>> some of our saved d registers are pointers into the java heap, we
> take
> >>> care in case any of these saved pointers are affected by GC.  The
> >>> current strategy of using an Object array supplied by the java side
> >>> will be replaced later with an oops_do type of strategy.
> >>>>
> >>>>
> >>>>
> >>>> Description of source changes in this webrev.
> >>>> =============================================
> >>>> graal source changes
> >>>> ====================
> >>>>
> >>>> Assembler, HSAILAssembler
> >>>>  minor changes for new instructions needed for saving deopt
> >>> information
> >>>>
> >>>> GraalKernelTester.java
> >>>>  force simulator to run single threaded.
> >>>>
> >>>> KernelTester.java
> >>>>  minor changes to handle exceptions which escape the kernel method
> >>>>
> >>>> HSAILLIRGenerator.java
> >>>>  support switches with keys of type long
> >>>>
> >>>>
> >>>> HSAILHotSpotBackend.java
> >>>>
> >>>>  * compileKernel uses some new optimisticOpts which help generate
> >>>>    deopts when needed.  Also, we dump the infopoints if Log:CodeGen
> >>>>    is on
> >>>>
> >>>>  * HSAILNonNullParametersPhase stamps the appropriate parameters as
> >>>>    nonNull
> >>>>
> >>>>  * installKernel uses the new trampoline infrastructure added by
> >>>>    Gilles do produce the host trampoline deopt method and install
> >>>>    it.
> >>>>
> >>>>  * emitCode adds a little bit of code to the prologue and a lot of
> >>>>    code to the epilogue.  See description at the bottom for the
> >> data
> >>>>    structures used by the never-ran path and the deopt path.
> >>>>
> >>>> HSAILHotSpotLIRGenerator.java
> >>>>
> >>>>  * code added by Gilles to build the host graph for the host
> >>>>    trampoline deopt method.  I suppose some of this would be common
> >>>>    to any gpu trampoline deopt and should be moved to some
> >>>>    hsail-independent location.
> >>>>
> >>>>  * code to handle the creation of a DeoptimizeOp for emitting HSAIL
> >>>>    code for a deoptimization
> >>>>
> >>>> HSAILHotSpotLoweringProvider.java
> >>>>
> >>>>  * refactored to support different strategies for different nodes.
> >>>>    UnwindNode strategy is to get replaced by a DeoptimizeNode.
> >>>>
> >>>> HotSpotVMConfig.java
> >>>>
> >>>> * define offets to fields in the deopt data structures
> >>>>
> >>>> VMErrorNode.java
> >>>>
> >>>> * public access to constructor (used by building of host graph for
> >>>>   trampoline code)
> >>>>
> >>>> HSAIL.java
> >>>> * some new non-allocatable registers defined (used by deopt paths)
> >>>>
> >>>> HSAILControlFlow.java
> >>>> * code to emit hsail for a deoptimizationNode
> >>>>
> >>>> ComputeProbabilityClosure.java
> >>>> * just using a change that Gilles made in the patch he gave me.
> >>>>
> >>>>
> >>>> mx/projects was affected by the move of ExternalCompilationResult
> to
> >>> com.oracle.graal.gpu.  In addition, several files had one line
> import
> >>> changes from the move of ExternalCompilationResult.
> >>>>
> >>>>
> >>>> hotspot source changes
> >>>> ======================
> >>>>
> >>>> gpu_hsail.cpp, hpp
> >>>>
> >>>>  * the execute_kernel routine pushes an extra parameter where deopt
> >>> info can be saved
> >>>>
> >>>>  * while pushing kernel args, keeps track if any are null and if so
> >>>>    sets some new gpu_exception fields in thread structure which
> >> gets
> >>>>    used when thread returns to java mode
> >>>>
> >>>>  * on return from kernel checks if any deopt occurred.  If so,
> >>>>
> >>>>     * runs any deopting workitems thru the trampoline deopt code
> >>>>       which ends up running the kernel method thru the interpreter
> >>>>       for that workitem.
> >>>>
> >>>>     * runs any never-ran workitems using simple javaCall.
> >>>>
> >>>> gpu_hsail_Frame.hpp
> >>>>  * new structure that defines the layout of a physical HSAIL frame
> >>>>
> >>>> hsailArgumentsBase.*, hsailKernelArguments.hpp
> >>> hsailJavaCallArguments.hpp
> >>>>  * refactored to share code between kernel argument setup and
> >>>>    javaCall argument setup
> >>>>
> >>>> javaClasses.cpp
> >>>>
> >>>>  * contains logic to check the new gpu_exception fields in thread
> >>>>    structure and if detected, set as top frame on return
> >>>>
> >>>> graalCompiler.cpp, hpp
> >>>>  * logic added by Gilles for external_deopt_i2c
> >>>>
> >>>> javaCalls.cpp, hpp
> >>>>  * logic added by Gilles for external_deopt_i2c
> >>>>
> >>>> sharedRuntime.cpp
> >>>>  * maybe Gilles can explain why the assert was removed in the patch
> >>>>    he gave me (it asserts if I put it back in)
> >>>>
> >>>> thread.cpp, hpp
> >>>>  * handle new gpu_exception fields
> >>>>
> >>>> vmStructs.cpp
> >>>> vmStructs_hsail.hpp
> >>>>  * handle new hsail deopt structs
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
> >
> 




More information about the graal-dev mailing list