webrev for hsail deoptimization

Christian Thalinger christian.thalinger at oracle.com
Tue Mar 11 22:11:46 UTC 2014


On Mar 11, 2014, at 1:08 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> I have placed a webrev up at 
> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-deopt 
> which we would like to get checked into the graal trunk.
> 
> This consists of at least the start of support for deoptimization in
> HSAIL kernels.  Although the list of files changed may look long, many
> of the files have only a few lines changed.  Special thanks to Gilles
> Duboscq and Doug Simon who provided some of the infrastructure that
> this webrev uses.
> 
> Below I have described
> 
>   * an overview of the codepaths the data structures
>   * java and hotspot source changes
> 
> 
> Deoptimization Data Structures and Overview
> ===========================================
> 
> At kernel dispatch time, we allocate space for any workitems should
> want to deopt.  To reduce the space requirements, space is only
> reserved for the maximum number of possible concurrent workitems that
> could be running on that hsa device.
> 
> A deopting workitem sets a "deopt happened" flag, and future workitems
> that see "deopt happened" as true will just set a flag saying they
> never ran and exit early.  Currently the never_ran array is one per
> workitem.  We are looking at ways to make this smaller but HSA devices
> have a lot of freedom in how they schedule workitems (current hardware
> and the simulator are quite different).
> 
> Workitems that deopt atomically bump an index saying where they should
> store their deopt data.  The deopt data consists of
>   * workitemid
>   * deopt actionAndReason
>   * the first HSAILFrame
> 
> An HSAILFrame consists of
>   * the deoptId or "pc" offset where the deopt occurred
>   * number of s registers
>   * number of d registers
>   * a bitmap indicating which d registers are oops
>   * space for saving the d and s registers
> 
> Currently we always set num_s_registers to 32 and num_d_registers to
> 16 but in the hsail code of the kernel we only save the union of the
> actual registers that are live at any of the infopoints.
> 
> On return from the GPU, we check if there were any deopts.  If not, we
> just return back to java.  If there was at least one deopt then
> 
>   a) for the workitems that finished normally, there is nothing to do
> 
>   b) if there are any deopted workitems, we want to run each deopting
>      workitem thru the interpreter going first thru the special host
>      trampoline code infrastructure that Gilles created.  The
>      trampoline host code takes the deoptId and a pointer to the
>      saved hsail frame.  We currently do this sequentially although
>      other policies are possible.
> 
>   c) for each never ran workitem, we can just run it from the
>      beginning of the kernel "method", just making sure we pass the
>      arguments and the appropriate workitem id for each one.  Again,
>      we currently do this sequentially although other policies are
>      possible.
> 
> Because running either type b or c above can cause GCs, and because
> some of our saved d registers are pointers into the java heap, we take
> care in case any of these saved pointers are affected by GC.  The
> current strategy of using an Object array supplied by the java side
> will be replaced later with an oops_do type of strategy.
> 
> 
> 
> Description of source changes in this webrev.
> =============================================
> graal source changes
> ====================
> 
> Assembler, HSAILAssembler
>   minor changes for new instructions needed for saving deopt information
> 
> GraalKernelTester.java
>   force simulator to run single threaded.
> 
> KernelTester.java
>   minor changes to handle exceptions which escape the kernel method
> 
> HSAILLIRGenerator.java
>   support switches with keys of type long
> 
> 
> HSAILHotSpotBackend.java
> 
>   * compileKernel uses some new optimisticOpts which help generate
>     deopts when needed.  Also, we dump the infopoints if Log:CodeGen
>     is on
> 
>   * HSAILNonNullParametersPhase stamps the appropriate parameters as
>     nonNull
> 
>   * installKernel uses the new trampoline infrastructure added by
>     Gilles do produce the host trampoline deopt method and install
>     it.
> 
>   * emitCode adds a little bit of code to the prologue and a lot of
>     code to the epilogue.  See description at the bottom for the data
>     structures used by the never-ran path and the deopt path.
> 
> HSAILHotSpotLIRGenerator.java
> 
>   * code added by Gilles to build the host graph for the host
>     trampoline deopt method.  I suppose some of this would be common
>     to any gpu trampoline deopt and should be moved to some
>     hsail-independent location.
> 
>   * code to handle the creation of a DeoptimizeOp for emitting HSAIL
>     code for a deoptimization
> 
> HSAILHotSpotLoweringProvider.java
> 
>   * refactored to support different strategies for different nodes.
>     UnwindNode strategy is to get replaced by a DeoptimizeNode.
> 
> HotSpotVMConfig.java
> 
>  * define offets to fields in the deopt data structures
> 
> VMErrorNode.java
> 
>  * public access to constructor (used by building of host graph for
>    trampoline code)
> 
> HSAIL.java
>  * some new non-allocatable registers defined (used by deopt paths)
> 
> HSAILControlFlow.java
>  * code to emit hsail for a deoptimizationNode
> 
> ComputeProbabilityClosure.java
>  * just using a change that Gilles made in the patch he gave me.
> 
> 
> mx/projects was affected by the move of ExternalCompilationResult to
> com.oracle.graal.gpu.  In addition, several files had one line import
> changes from the move of ExternalCompilationResult.
> 
> 
> hotspot source changes
> ======================
> 
> gpu_hsail.cpp, hpp
> 
>   * the execute_kernel routine pushes an extra parameter where deopt info can be saved
> 
>   * while pushing kernel args, keeps track if any are null and if so
>     sets some new gpu_exception fields in thread structure which gets
>     used when thread returns to java mode
> 
>   * on return from kernel checks if any deopt occurred.  If so,
> 
>      * runs any deopting workitems thru the trampoline deopt code
>        which ends up running the kernel method thru the interpreter
>        for that workitem.
> 
>      * runs any never-ran workitems using simple javaCall.
> 
> gpu_hsail_Frame.hpp
>   * new structure that defines the layout of a physical HSAIL frame
> 
> hsailArgumentsBase.*, hsailKernelArguments.hpp hsailJavaCallArguments.hpp
>   * refactored to share code between kernel argument setup and
>     javaCall argument setup
> 
> javaClasses.cpp
> 
>   * contains logic to check the new gpu_exception fields in thread
>     structure and if detected, set as top frame on return
> 
> graalCompiler.cpp, hpp
>   * logic added by Gilles for external_deopt_i2c
> 
> javaCalls.cpp, hpp
>   * logic added by Gilles for external_deopt_i2c
> 
> sharedRuntime.cpp
>   * maybe Gilles can explain why the assert was removed in the patch
>     he gave me (it asserts if I put it back in)

Yeah, that made me suspicious.  It’s related to the changes in javaCalls but I couldn’t see (yet) why Gilles made these.

> 
> thread.cpp, hpp
>   * handle new gpu_exception fields
> 
> vmStructs.cpp
> vmStructs_hsail.hpp
>   * handle new hsail deopt structs
> 
> 
> 



More information about the graal-dev mailing list