webrev-hsail-deopt-v2
Deneau, Tom
tom.deneau at amd.com
Tue Mar 25 17:09:34 UTC 2014
Doug, Gilles --
Regarding the
private LIRGenerator lirGen;
in HSAILHotSpotBackend.java.
I agree with your statement that lirGen should be per-compilation, not per backend.
I added it only because it was needed by the code Gilles added at the end of emitCode
compilationResult.setHostGraph(((HSAILHotSpotLIRGenerator) lirGen).prepareHostGraph());
I see that lirGen is passed in to newCompilationResultBuilder and used in that method
but was not saved anywhere.
I thought perhaps it could be saved in CompilationResultBuilder itself, but that led to the
dreaded loops in the project graphs (if I made com.oracle.graal.compiler project a dependency
of com.oracle.graal.lir).
So what is the best way to solve this problem?
-- Tom
> -----Original Message-----
> From: Doug Simon [mailto:doug.simon at oracle.com]
> Sent: Monday, March 24, 2014 6:13 AM
> To: Deneau, Tom
> Cc: graal-dev at openjdk.java.net
> Subject: Re: webrev-hsail-deopt-v2
>
> Tom,
>
> Thanks for fixing the issues in the first patch. Apart from some
> seemingly excessive memory usage (which can hopefully be address later),
> the logic in this patch looks mostly correct to me. However, Gilles
> needs to also provide feedback before we can integrate this patch as he
> is more familiar with some of the details.
>
> Per-file comments below.
>
> graal/com.oracle.graal.hotspot.hsail/src/com/oracle/graal/hotspot/hsail/
> HSAILHotSpotBackend.java:
>
> These cannot be fields in HSAILHotSpotBackend as they are per-
> compilation values, not per-backend values:
>
> + private HSAILAssembler asm;
> + private LIRGenerator lirGen;
>
> There's no real value in having HSAILNonNullParametersPhase extend
> NonNullParametersPhase since the former overrides all the logic in the
> latter.
>
> graal/com.oracle.graal.hotspot.hsail/src/com/oracle/graal/hotspot/hsail/
> HSAILHotSpotLIRGenerator.java:
>
> Please create named constants for 40 and 8 that explains what they are:
>
> 204 if (regNumber < 40) {
> 205 long offset = config.hsailFrameSaveAreaOffset + 4 *
> (regNumber - 8);
>
> src/gpu/hsail/vm/gpu_hsail.hpp:
>
> It would be nice to see the CU acronym explained somewhere.
>
> I know that you are thinking of supporting multiple frames in the future
> but I would not include code partially supporting that now otherwise it
> just adds confusion. For example, the first frame is contained in a
> HSAILKernelDeoptimization object, but HSAILFrame::_nextFrame is a
> pointer to a HSAILFrame. Who allocates/deallocates the next frame(s)?
>
> 46 HSAILFrame _first_frame;
>
> What's the point of HSAILComputeUnitSaveState? In future will it do more
> than simply wrap a HSAILKernelDeoptimization?
>
> 55 class HSAILComputeUnitSaveState {
> 56 friend class VMStructs;
> 57 public:
> 58 HSAILKernelDeoptimization _deopt_info;
> 59 };
>
> Why not make the definition extra clear by making the preprocessor do
> the math (i.e. "#define CU_SAVE_STATES_SIZE (8 * 36 * 64)")
>
> 61 // 8 cus * 36 waves per cu * wavesize 64
> 62 #define CU_SAVE_STATES_SIZE 18432
>
> Let's use a longer name for clarity (e.g.,
> _sizeof_compute_unit_save_state). This should also be reflect in
> HotSpotVMConfig.
>
> 69 jint _sizeof_cuss;
>
> src/gpu/hsail/vm/gpu_hsail.cpp:
>
> I was going to suggest stack allocating the HSAILDeoptimizationInfo
> until I realized it's 20Mb! Is that a conservative approach we can
> improve upon somehow (e.g., a lazily created thread local)? In any case,
> shouldn't the actual allocation be within the "if (useDeoptInfo)" guard?
>
> 149 HSAILDeoptimizationInfo* e = new (ResourceObj::C_HEAP, mtInternal)
> HSAILDeoptimizationInfo();
> 150 if (useDeoptInfo) {
>
> -Doug
>
> On Mar 20, 2014, at 9:57 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>
> > Doug, Gilles --
> >
> > I believe I have gotten rid of the warnings from the original hsail-
> deopt webrev.
> > I also fixed a few casts that Eric Caspole noticed were preventing the
> fastdebug build from building
> > (but did not affect product and debug builds). Please take a look
> at
> >
> > http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-deopt-
> v2
> >
> > I've included here the explanatory text from the first submission
> >
> > This consists of at least the start of support for deoptimization in
> HSAIL kernels. Although the list of files changed may look long, many
> of the files have only a few lines changed. Special thanks to Gilles
> Duboscq and Doug Simon who provided some of the infrastructure that this
> webrev uses.
> >
> > -- Tom Deneau
> >
> >
> > Below I have described
> >
> > * an overview of the codepaths the data structures
> > * java and hotspot source changes
> >
> >
> > Deoptimization Data Structures and Overview
> > ===========================================
> >
> > At kernel dispatch time, we allocate space for any workitems should
> want to deopt. To reduce the space requirements, space is only reserved
> for the maximum number of possible concurrent workitems that could be
> running on that hsa device.
> >
> > A deopting workitem sets a "deopt happened" flag, and future workitems
> that see "deopt happened" as true will just set a flag saying they never
> ran and exit early. Currently the never_ran array is one per workitem.
> We are looking at ways to make this smaller but HSA devices have a lot
> of freedom in how they schedule workitems (current hardware and the
> simulator are quite different).
> >
> > Workitems that deopt atomically bump an index saying where they should
> store their deopt data. The deopt data consists of
> > * workitemid
> > * deopt actionAndReason
> > * the first HSAILFrame
> >
> > An HSAILFrame consists of
> > * the deoptId or "pc" offset where the deopt occurred
> > * number of s registers
> > * number of d registers
> > * a bitmap indicating which d registers are oops
> > * space for saving the d and s registers
> >
> > Currently we always set num_s_registers to 32 and num_d_registers to
> > 16 but in the hsail code of the kernel we only save the union of the
> actual registers that are live at any of the infopoints.
> >
> > On return from the GPU, we check if there were any deopts. If not, we
> just return back to java. If there was at least one deopt then
> >
> > a) for the workitems that finished normally, there is nothing to do
> >
> > b) if there are any deopted workitems, we want to run each deopting
> > workitem thru the interpreter going first thru the special host
> > trampoline code infrastructure that Gilles created. The
> > trampoline host code takes the deoptId and a pointer to the
> > saved hsail frame. We currently do this sequentially although
> > other policies are possible.
> >
> > c) for each never ran workitem, we can just run it from the
> > beginning of the kernel "method", just making sure we pass the
> > arguments and the appropriate workitem id for each one. Again,
> > we currently do this sequentially although other policies are
> > possible.
> >
> > Because running either type b or c above can cause GCs, and because
> some of our saved d registers are pointers into the java heap, we take
> care in case any of these saved pointers are affected by GC. The
> current strategy of using an Object array supplied by the java side will
> be replaced later with an oops_do type of strategy.
> >
> >
> >
> > Description of source changes in this webrev.
> > =============================================
> > graal source changes
> > ====================
> >
> > Assembler, HSAILAssembler
> > minor changes for new instructions needed for saving deopt
> information
> >
> > GraalKernelTester.java
> > force simulator to run single threaded.
> >
> > KernelTester.java
> > minor changes to handle exceptions which escape the kernel method
> >
> > HSAILLIRGenerator.java
> > support switches with keys of type long
> >
> >
> > HSAILHotSpotBackend.java
> >
> > * compileKernel uses some new optimisticOpts which help generate
> > deopts when needed. Also, we dump the infopoints if Log:CodeGen
> > is on
> >
> > * HSAILNonNullParametersPhase stamps the appropriate parameters as
> > nonNull
> >
> > * installKernel uses the new trampoline infrastructure added by
> > Gilles do produce the host trampoline deopt method and install
> > it.
> >
> > * emitCode adds a little bit of code to the prologue and a lot of
> > code to the epilogue. See description at the bottom for the data
> > structures used by the never-ran path and the deopt path.
> >
> > HSAILHotSpotLIRGenerator.java
> >
> > * code added by Gilles to build the host graph for the host
> > trampoline deopt method. I suppose some of this would be common
> > to any gpu trampoline deopt and should be moved to some
> > hsail-independent location.
> >
> > * code to handle the creation of a DeoptimizeOp for emitting HSAIL
> > code for a deoptimization
> >
> > HSAILHotSpotLoweringProvider.java
> >
> > * refactored to support different strategies for different nodes.
> > UnwindNode strategy is to get replaced by a DeoptimizeNode.
> >
> > HotSpotVMConfig.java
> >
> > * define offets to fields in the deopt data structures
> >
> > VMErrorNode.java
> >
> > * public access to constructor (used by building of host graph for
> > trampoline code)
> >
> > HSAIL.java
> > * some new non-allocatable registers defined (used by deopt paths)
> >
> > HSAILControlFlow.java
> > * code to emit hsail for a deoptimizationNode
> >
> > ComputeProbabilityClosure.java
> > * just using a change that Gilles made in the patch he gave me.
> >
> >
> > mx/projects was affected by the move of ExternalCompilationResult to
> com.oracle.graal.gpu. In addition, several files had one line import
> changes from the move of ExternalCompilationResult.
> >
> >
> > hotspot source changes
> > ======================
> >
> > gpu_hsail.cpp, hpp
> >
> > * the execute_kernel routine pushes an extra parameter where deopt
> info can be saved
> >
> > * while pushing kernel args, keeps track if any are null and if so
> > sets some new gpu_exception fields in thread structure which gets
> > used when thread returns to java mode
> >
> > * on return from kernel checks if any deopt occurred. If so,
> >
> > * runs any deopting workitems thru the trampoline deopt code
> > which ends up running the kernel method thru the interpreter
> > for that workitem.
> >
> > * runs any never-ran workitems using simple javaCall.
> >
> > gpu_hsail_Frame.hpp
> > * new structure that defines the layout of a physical HSAIL frame
> >
> > hsailArgumentsBase.*, hsailKernelArguments.hpp
> hsailJavaCallArguments.hpp
> > * refactored to share code between kernel argument setup and
> > javaCall argument setup
> >
> > javaClasses.cpp
> >
> > * contains logic to check the new gpu_exception fields in thread
> > structure and if detected, set as top frame on return
> >
> > graalCompiler.cpp, hpp
> > * logic added by Gilles for external_deopt_i2c
> >
> > javaCalls.cpp, hpp
> > * logic added by Gilles for external_deopt_i2c
> >
> > sharedRuntime.cpp
> > * maybe Gilles can explain why the assert was removed in the patch
> > he gave me (it asserts if I put it back in)
> >
> > thread.cpp, hpp
> > * handle new gpu_exception fields
> >
> > vmStructs.cpp
> > vmStructs_hsail.hpp
> > * handle new hsail deopt structs
> >
> >
> >
> >
>
More information about the graal-dev
mailing list