webrev for hsail deoptimization
Christian Thalinger
christian.thalinger at oracle.com
Tue Mar 11 22:11:46 UTC 2014
On Mar 11, 2014, at 1:08 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> I have placed a webrev up at
> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-deopt
> which we would like to get checked into the graal trunk.
>
> This consists of at least the start of support for deoptimization in
> HSAIL kernels. Although the list of files changed may look long, many
> of the files have only a few lines changed. Special thanks to Gilles
> Duboscq and Doug Simon who provided some of the infrastructure that
> this webrev uses.
>
> Below I have described
>
> * an overview of the codepaths the data structures
> * java and hotspot source changes
>
>
> Deoptimization Data Structures and Overview
> ===========================================
>
> At kernel dispatch time, we allocate space for any workitems should
> want to deopt. To reduce the space requirements, space is only
> reserved for the maximum number of possible concurrent workitems that
> could be running on that hsa device.
>
> A deopting workitem sets a "deopt happened" flag, and future workitems
> that see "deopt happened" as true will just set a flag saying they
> never ran and exit early. Currently the never_ran array is one per
> workitem. We are looking at ways to make this smaller but HSA devices
> have a lot of freedom in how they schedule workitems (current hardware
> and the simulator are quite different).
>
> Workitems that deopt atomically bump an index saying where they should
> store their deopt data. The deopt data consists of
> * workitemid
> * deopt actionAndReason
> * the first HSAILFrame
>
> An HSAILFrame consists of
> * the deoptId or "pc" offset where the deopt occurred
> * number of s registers
> * number of d registers
> * a bitmap indicating which d registers are oops
> * space for saving the d and s registers
>
> Currently we always set num_s_registers to 32 and num_d_registers to
> 16 but in the hsail code of the kernel we only save the union of the
> actual registers that are live at any of the infopoints.
>
> On return from the GPU, we check if there were any deopts. If not, we
> just return back to java. If there was at least one deopt then
>
> a) for the workitems that finished normally, there is nothing to do
>
> b) if there are any deopted workitems, we want to run each deopting
> workitem thru the interpreter going first thru the special host
> trampoline code infrastructure that Gilles created. The
> trampoline host code takes the deoptId and a pointer to the
> saved hsail frame. We currently do this sequentially although
> other policies are possible.
>
> c) for each never ran workitem, we can just run it from the
> beginning of the kernel "method", just making sure we pass the
> arguments and the appropriate workitem id for each one. Again,
> we currently do this sequentially although other policies are
> possible.
>
> Because running either type b or c above can cause GCs, and because
> some of our saved d registers are pointers into the java heap, we take
> care in case any of these saved pointers are affected by GC. The
> current strategy of using an Object array supplied by the java side
> will be replaced later with an oops_do type of strategy.
>
>
>
> Description of source changes in this webrev.
> =============================================
> graal source changes
> ====================
>
> Assembler, HSAILAssembler
> minor changes for new instructions needed for saving deopt information
>
> GraalKernelTester.java
> force simulator to run single threaded.
>
> KernelTester.java
> minor changes to handle exceptions which escape the kernel method
>
> HSAILLIRGenerator.java
> support switches with keys of type long
>
>
> HSAILHotSpotBackend.java
>
> * compileKernel uses some new optimisticOpts which help generate
> deopts when needed. Also, we dump the infopoints if Log:CodeGen
> is on
>
> * HSAILNonNullParametersPhase stamps the appropriate parameters as
> nonNull
>
> * installKernel uses the new trampoline infrastructure added by
> Gilles do produce the host trampoline deopt method and install
> it.
>
> * emitCode adds a little bit of code to the prologue and a lot of
> code to the epilogue. See description at the bottom for the data
> structures used by the never-ran path and the deopt path.
>
> HSAILHotSpotLIRGenerator.java
>
> * code added by Gilles to build the host graph for the host
> trampoline deopt method. I suppose some of this would be common
> to any gpu trampoline deopt and should be moved to some
> hsail-independent location.
>
> * code to handle the creation of a DeoptimizeOp for emitting HSAIL
> code for a deoptimization
>
> HSAILHotSpotLoweringProvider.java
>
> * refactored to support different strategies for different nodes.
> UnwindNode strategy is to get replaced by a DeoptimizeNode.
>
> HotSpotVMConfig.java
>
> * define offets to fields in the deopt data structures
>
> VMErrorNode.java
>
> * public access to constructor (used by building of host graph for
> trampoline code)
>
> HSAIL.java
> * some new non-allocatable registers defined (used by deopt paths)
>
> HSAILControlFlow.java
> * code to emit hsail for a deoptimizationNode
>
> ComputeProbabilityClosure.java
> * just using a change that Gilles made in the patch he gave me.
>
>
> mx/projects was affected by the move of ExternalCompilationResult to
> com.oracle.graal.gpu. In addition, several files had one line import
> changes from the move of ExternalCompilationResult.
>
>
> hotspot source changes
> ======================
>
> gpu_hsail.cpp, hpp
>
> * the execute_kernel routine pushes an extra parameter where deopt info can be saved
>
> * while pushing kernel args, keeps track if any are null and if so
> sets some new gpu_exception fields in thread structure which gets
> used when thread returns to java mode
>
> * on return from kernel checks if any deopt occurred. If so,
>
> * runs any deopting workitems thru the trampoline deopt code
> which ends up running the kernel method thru the interpreter
> for that workitem.
>
> * runs any never-ran workitems using simple javaCall.
>
> gpu_hsail_Frame.hpp
> * new structure that defines the layout of a physical HSAIL frame
>
> hsailArgumentsBase.*, hsailKernelArguments.hpp hsailJavaCallArguments.hpp
> * refactored to share code between kernel argument setup and
> javaCall argument setup
>
> javaClasses.cpp
>
> * contains logic to check the new gpu_exception fields in thread
> structure and if detected, set as top frame on return
>
> graalCompiler.cpp, hpp
> * logic added by Gilles for external_deopt_i2c
>
> javaCalls.cpp, hpp
> * logic added by Gilles for external_deopt_i2c
>
> sharedRuntime.cpp
> * maybe Gilles can explain why the assert was removed in the patch
> he gave me (it asserts if I put it back in)
Yeah, that made me suspicious. It’s related to the changes in javaCalls but I couldn’t see (yet) why Gilles made these.
>
> thread.cpp, hpp
> * handle new gpu_exception fields
>
> vmStructs.cpp
> vmStructs_hsail.hpp
> * handle new hsail deopt structs
>
>
>
More information about the graal-dev
mailing list