webrev for hsail deoptimization
Gilles Duboscq
duboscq at ssw.jku.at
Wed Mar 12 10:08:24 UTC 2014
On Tue, Mar 11, 2014 at 11:11 PM, Christian Thalinger
<christian.thalinger at oracle.com> wrote:
>
> On Mar 11, 2014, at 1:08 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>
>> I have placed a webrev up at
>> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-deopt
>> which we would like to get checked into the graal trunk.
>>
>> This consists of at least the start of support for deoptimization in
>> HSAIL kernels. Although the list of files changed may look long, many
>> of the files have only a few lines changed. Special thanks to Gilles
>> Duboscq and Doug Simon who provided some of the infrastructure that
>> this webrev uses.
>>
>> Below I have described
>>
>> * an overview of the codepaths the data structures
>> * java and hotspot source changes
>>
>>
>> Deoptimization Data Structures and Overview
>> ===========================================
>>
>> At kernel dispatch time, we allocate space for any workitems should
>> want to deopt. To reduce the space requirements, space is only
>> reserved for the maximum number of possible concurrent workitems that
>> could be running on that hsa device.
>>
>> A deopting workitem sets a "deopt happened" flag, and future workitems
>> that see "deopt happened" as true will just set a flag saying they
>> never ran and exit early. Currently the never_ran array is one per
>> workitem. We are looking at ways to make this smaller but HSA devices
>> have a lot of freedom in how they schedule workitems (current hardware
>> and the simulator are quite different).
>>
>> Workitems that deopt atomically bump an index saying where they should
>> store their deopt data. The deopt data consists of
>> * workitemid
>> * deopt actionAndReason
>> * the first HSAILFrame
>>
>> An HSAILFrame consists of
>> * the deoptId or "pc" offset where the deopt occurred
>> * number of s registers
>> * number of d registers
>> * a bitmap indicating which d registers are oops
>> * space for saving the d and s registers
>>
>> Currently we always set num_s_registers to 32 and num_d_registers to
>> 16 but in the hsail code of the kernel we only save the union of the
>> actual registers that are live at any of the infopoints.
>>
>> On return from the GPU, we check if there were any deopts. If not, we
>> just return back to java. If there was at least one deopt then
>>
>> a) for the workitems that finished normally, there is nothing to do
>>
>> b) if there are any deopted workitems, we want to run each deopting
>> workitem thru the interpreter going first thru the special host
>> trampoline code infrastructure that Gilles created. The
>> trampoline host code takes the deoptId and a pointer to the
>> saved hsail frame. We currently do this sequentially although
>> other policies are possible.
>>
>> c) for each never ran workitem, we can just run it from the
>> beginning of the kernel "method", just making sure we pass the
>> arguments and the appropriate workitem id for each one. Again,
>> we currently do this sequentially although other policies are
>> possible.
>>
>> Because running either type b or c above can cause GCs, and because
>> some of our saved d registers are pointers into the java heap, we take
>> care in case any of these saved pointers are affected by GC. The
>> current strategy of using an Object array supplied by the java side
>> will be replaced later with an oops_do type of strategy.
>>
>>
>>
>> Description of source changes in this webrev.
>> =============================================
>> graal source changes
>> ====================
>>
>> Assembler, HSAILAssembler
>> minor changes for new instructions needed for saving deopt information
>>
>> GraalKernelTester.java
>> force simulator to run single threaded.
>>
>> KernelTester.java
>> minor changes to handle exceptions which escape the kernel method
>>
>> HSAILLIRGenerator.java
>> support switches with keys of type long
>>
>>
>> HSAILHotSpotBackend.java
>>
>> * compileKernel uses some new optimisticOpts which help generate
>> deopts when needed. Also, we dump the infopoints if Log:CodeGen
>> is on
>>
>> * HSAILNonNullParametersPhase stamps the appropriate parameters as
>> nonNull
>>
>> * installKernel uses the new trampoline infrastructure added by
>> Gilles do produce the host trampoline deopt method and install
>> it.
>>
>> * emitCode adds a little bit of code to the prologue and a lot of
>> code to the epilogue. See description at the bottom for the data
>> structures used by the never-ran path and the deopt path.
>>
>> HSAILHotSpotLIRGenerator.java
>>
>> * code added by Gilles to build the host graph for the host
>> trampoline deopt method. I suppose some of this would be common
>> to any gpu trampoline deopt and should be moved to some
>> hsail-independent location.
>>
>> * code to handle the creation of a DeoptimizeOp for emitting HSAIL
>> code for a deoptimization
>>
>> HSAILHotSpotLoweringProvider.java
>>
>> * refactored to support different strategies for different nodes.
>> UnwindNode strategy is to get replaced by a DeoptimizeNode.
>>
>> HotSpotVMConfig.java
>>
>> * define offets to fields in the deopt data structures
>>
>> VMErrorNode.java
>>
>> * public access to constructor (used by building of host graph for
>> trampoline code)
>>
>> HSAIL.java
>> * some new non-allocatable registers defined (used by deopt paths)
>>
>> HSAILControlFlow.java
>> * code to emit hsail for a deoptimizationNode
>>
>> ComputeProbabilityClosure.java
>> * just using a change that Gilles made in the patch he gave me.
>>
>>
>> mx/projects was affected by the move of ExternalCompilationResult to
>> com.oracle.graal.gpu. In addition, several files had one line import
>> changes from the move of ExternalCompilationResult.
>>
>>
>> hotspot source changes
>> ======================
>>
>> gpu_hsail.cpp, hpp
>>
>> * the execute_kernel routine pushes an extra parameter where deopt info can be saved
>>
>> * while pushing kernel args, keeps track if any are null and if so
>> sets some new gpu_exception fields in thread structure which gets
>> used when thread returns to java mode
>>
>> * on return from kernel checks if any deopt occurred. If so,
>>
>> * runs any deopting workitems thru the trampoline deopt code
>> which ends up running the kernel method thru the interpreter
>> for that workitem.
>>
>> * runs any never-ran workitems using simple javaCall.
>>
>> gpu_hsail_Frame.hpp
>> * new structure that defines the layout of a physical HSAIL frame
>>
>> hsailArgumentsBase.*, hsailKernelArguments.hpp hsailJavaCallArguments.hpp
>> * refactored to share code between kernel argument setup and
>> javaCall argument setup
>>
>> javaClasses.cpp
>>
>> * contains logic to check the new gpu_exception fields in thread
>> structure and if detected, set as top frame on return
>>
>> graalCompiler.cpp, hpp
>> * logic added by Gilles for external_deopt_i2c
>>
>> javaCalls.cpp, hpp
>> * logic added by Gilles for external_deopt_i2c
>>
>> sharedRuntime.cpp
>> * maybe Gilles can explain why the assert was removed in the patch
>> he gave me (it asserts if I put it back in)
>
> Yeah, that made me suspicious. It’s related to the changes in javaCalls but I couldn’t see (yet) why Gilles made these.
If a HSAIL compilation contains a Deopt then we compile some code for
the host architecture.
This code looks like this:
f(int deoptId, HSAILFrame* frame, int reasonAndAction, Object speculation) {
switch(deoptId) {
case 1:
deopt(FrameState1(frame), reasonAndAction, speculation);
case 2:
deopt(FrameState2(frame), reasonAndAction, speculation);
...
default:
VM_Error("Error in HSAIL deopt. DeoptId=%d", deoptId);
}
}
Where FrameState1 and FrameState2 build the correct framestate by
extracting values from the HSAILFrame based on the HSAIL compilation's
debug info.
This special host method is installed and associated to the Method*
for which we compiled HSAIL code.
On deopt, we make a special javaCall to this Method* and in javaCall
we make sure it uses our special nmethod and an i2c that is crafted
for the special nmethod's signature rather than for the original
method's signature.
This allows us to get as close as we can to HotSpot's assumption
regarding execution, how the stack should look like and how
deoptimization works. In particular, there actually is a compiled
frame for this Method* on the stack which is triggering the
deoptimization and this compiled frame didn't just appear out of the
blue but comes from a javaCall.
The only divergence here is that the arguments we pass for this
javaCall are not what you would expect from the signature of the
method. For example even if the method may be virtual, we don't have a
receiver.
>
>>
>> thread.cpp, hpp
>> * handle new gpu_exception fields
>>
>> vmStructs.cpp
>> vmStructs_hsail.hpp
>> * handle new hsail deopt structs
>>
>>
>>
>
More information about the graal-dev
mailing list