webrev-hsail-deopt-v2

Doug Simon doug.simon at oracle.com
Mon Mar 24 11:12:46 UTC 2014


Tom,

Thanks for fixing the issues in the first patch. Apart from some seemingly excessive memory usage (which can hopefully be address later), the logic in this patch looks mostly correct to me. However, Gilles needs to also provide feedback before we can integrate this patch as he is more familiar with some of the details.

Per-file comments below.

graal/com.oracle.graal.hotspot.hsail/src/com/oracle/graal/hotspot/hsail/HSAILHotSpotBackend.java:

These cannot be fields in HSAILHotSpotBackend as they are per-compilation values, not per-backend values:

+    private HSAILAssembler asm;
+    private LIRGenerator lirGen;

There’s no real value in having HSAILNonNullParametersPhase extend NonNullParametersPhase since the former overrides all the logic in the latter.

graal/com.oracle.graal.hotspot.hsail/src/com/oracle/graal/hotspot/hsail/HSAILHotSpotLIRGenerator.java:

Please create named constants for 40 and 8 that explains what they are:

204         if (regNumber < 40) {
205             long offset = config.hsailFrameSaveAreaOffset + 4 * (regNumber - 8);

src/gpu/hsail/vm/gpu_hsail.hpp:

It would be nice to see the CU acronym explained somewhere.

I know that you are thinking of supporting multiple frames in the future but I would not include code partially supporting that now otherwise it just adds confusion. For example, the first frame is contained in a HSAILKernelDeoptimization object, but HSAILFrame::_nextFrame is a pointer to a HSAILFrame. Who allocates/deallocates the next frame(s)?

46     HSAILFrame _first_frame;

What’s the point of HSAILComputeUnitSaveState? In future will it do more than simply wrap a HSAILKernelDeoptimization?

55   class HSAILComputeUnitSaveState {
56     friend class VMStructs;
57    public:
58     HSAILKernelDeoptimization  _deopt_info;
59   };

Why not make the definition extra clear by making the preprocessor do the math (i.e. "#define CU_SAVE_STATES_SIZE (8 * 36 * 64)”)

61 // 8 cus * 36 waves per cu * wavesize 64
62 #define CU_SAVE_STATES_SIZE     18432

Let’s use a longer name for clarity (e.g., _sizeof_compute_unit_save_state). This should also be reflect in HotSpotVMConfig.

69     jint _sizeof_cuss;

src/gpu/hsail/vm/gpu_hsail.cpp:

I was going to suggest stack allocating the HSAILDeoptimizationInfo until I realized it’s 20Mb! Is that a conservative approach we can improve upon somehow (e.g., a lazily created thread local)? In any case, shouldn’t the actual allocation be within the “if (useDeoptInfo)” guard?

149   HSAILDeoptimizationInfo* e = new (ResourceObj::C_HEAP, mtInternal) HSAILDeoptimizationInfo();
150   if (useDeoptInfo) { 

-Doug

On Mar 20, 2014, at 9:57 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> Doug, Gilles --
> 
> I believe I have gotten rid of the warnings from the original hsail-deopt webrev.
> I also fixed a few casts that Eric Caspole noticed were preventing the fastdebug build from building
> (but did not affect product and debug builds).    Please take a look at
> 
> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-deopt-v2 
> 
> I've included here the explanatory text from the first submission
> 
> This consists of at least the start of support for deoptimization in HSAIL kernels.  Although the list of files changed may look long, many of the files have only a few lines changed.  Special thanks to Gilles Duboscq and Doug Simon who provided some of the infrastructure that this webrev uses.
> 
> -- Tom Deneau
> 
> 
> Below I have described
> 
>   * an overview of the codepaths the data structures
>   * java and hotspot source changes
> 
> 
> Deoptimization Data Structures and Overview
> ===========================================
> 
> At kernel dispatch time, we allocate space for any workitems should want to deopt.  To reduce the space requirements, space is only reserved for the maximum number of possible concurrent workitems that could be running on that hsa device.
> 
> A deopting workitem sets a "deopt happened" flag, and future workitems that see "deopt happened" as true will just set a flag saying they never ran and exit early.  Currently the never_ran array is one per workitem.  We are looking at ways to make this smaller but HSA devices have a lot of freedom in how they schedule workitems (current hardware and the simulator are quite different).
> 
> Workitems that deopt atomically bump an index saying where they should store their deopt data.  The deopt data consists of
>   * workitemid
>   * deopt actionAndReason
>   * the first HSAILFrame
> 
> An HSAILFrame consists of
>   * the deoptId or "pc" offset where the deopt occurred
>   * number of s registers
>   * number of d registers
>   * a bitmap indicating which d registers are oops
>   * space for saving the d and s registers
> 
> Currently we always set num_s_registers to 32 and num_d_registers to
> 16 but in the hsail code of the kernel we only save the union of the actual registers that are live at any of the infopoints.
> 
> On return from the GPU, we check if there were any deopts.  If not, we just return back to java.  If there was at least one deopt then
> 
>   a) for the workitems that finished normally, there is nothing to do
> 
>   b) if there are any deopted workitems, we want to run each deopting
>      workitem thru the interpreter going first thru the special host
>      trampoline code infrastructure that Gilles created.  The
>      trampoline host code takes the deoptId and a pointer to the
>      saved hsail frame.  We currently do this sequentially although
>      other policies are possible.
> 
>   c) for each never ran workitem, we can just run it from the
>      beginning of the kernel "method", just making sure we pass the
>      arguments and the appropriate workitem id for each one.  Again,
>      we currently do this sequentially although other policies are
>      possible.
> 
> Because running either type b or c above can cause GCs, and because some of our saved d registers are pointers into the java heap, we take care in case any of these saved pointers are affected by GC.  The current strategy of using an Object array supplied by the java side will be replaced later with an oops_do type of strategy.
> 
> 
> 
> Description of source changes in this webrev.
> =============================================
> graal source changes
> ====================
> 
> Assembler, HSAILAssembler
>   minor changes for new instructions needed for saving deopt information
> 
> GraalKernelTester.java
>   force simulator to run single threaded.
> 
> KernelTester.java
>   minor changes to handle exceptions which escape the kernel method
> 
> HSAILLIRGenerator.java
>   support switches with keys of type long
> 
> 
> HSAILHotSpotBackend.java
> 
>   * compileKernel uses some new optimisticOpts which help generate
>     deopts when needed.  Also, we dump the infopoints if Log:CodeGen
>     is on
> 
>   * HSAILNonNullParametersPhase stamps the appropriate parameters as
>     nonNull
> 
>   * installKernel uses the new trampoline infrastructure added by
>     Gilles do produce the host trampoline deopt method and install
>     it.
> 
>   * emitCode adds a little bit of code to the prologue and a lot of
>     code to the epilogue.  See description at the bottom for the data
>     structures used by the never-ran path and the deopt path.
> 
> HSAILHotSpotLIRGenerator.java
> 
>   * code added by Gilles to build the host graph for the host
>     trampoline deopt method.  I suppose some of this would be common
>     to any gpu trampoline deopt and should be moved to some
>     hsail-independent location.
> 
>   * code to handle the creation of a DeoptimizeOp for emitting HSAIL
>     code for a deoptimization
> 
> HSAILHotSpotLoweringProvider.java
> 
>   * refactored to support different strategies for different nodes.
>     UnwindNode strategy is to get replaced by a DeoptimizeNode.
> 
> HotSpotVMConfig.java
> 
>  * define offets to fields in the deopt data structures
> 
> VMErrorNode.java
> 
>  * public access to constructor (used by building of host graph for
>    trampoline code)
> 
> HSAIL.java
>  * some new non-allocatable registers defined (used by deopt paths)
> 
> HSAILControlFlow.java
>  * code to emit hsail for a deoptimizationNode
> 
> ComputeProbabilityClosure.java
>  * just using a change that Gilles made in the patch he gave me.
> 
> 
> mx/projects was affected by the move of ExternalCompilationResult to com.oracle.graal.gpu.  In addition, several files had one line import changes from the move of ExternalCompilationResult.
> 
> 
> hotspot source changes
> ======================
> 
> gpu_hsail.cpp, hpp
> 
>   * the execute_kernel routine pushes an extra parameter where deopt info can be saved
> 
>   * while pushing kernel args, keeps track if any are null and if so
>     sets some new gpu_exception fields in thread structure which gets
>     used when thread returns to java mode
> 
>   * on return from kernel checks if any deopt occurred.  If so,
> 
>      * runs any deopting workitems thru the trampoline deopt code
>        which ends up running the kernel method thru the interpreter
>        for that workitem.
> 
>      * runs any never-ran workitems using simple javaCall.
> 
> gpu_hsail_Frame.hpp
>   * new structure that defines the layout of a physical HSAIL frame
> 
> hsailArgumentsBase.*, hsailKernelArguments.hpp hsailJavaCallArguments.hpp
>   * refactored to share code between kernel argument setup and
>     javaCall argument setup
> 
> javaClasses.cpp
> 
>   * contains logic to check the new gpu_exception fields in thread
>     structure and if detected, set as top frame on return
> 
> graalCompiler.cpp, hpp
>   * logic added by Gilles for external_deopt_i2c
> 
> javaCalls.cpp, hpp
>   * logic added by Gilles for external_deopt_i2c
> 
> sharedRuntime.cpp
>   * maybe Gilles can explain why the assert was removed in the patch
>     he gave me (it asserts if I put it back in)
> 
> thread.cpp, hpp
>   * handle new gpu_exception fields
> 
> vmStructs.cpp
> vmStructs_hsail.hpp
>   * handle new hsail deopt structs
> 
> 
> 
> 



More information about the graal-dev mailing list