webrev-hsail-deopt-v2
Deneau, Tom
tom.deneau at amd.com
Thu Mar 20 20:57:45 UTC 2014
Doug, Gilles --
I believe I have gotten rid of the warnings from the original hsail-deopt webrev.
I also fixed a few casts that Eric Caspole noticed were preventing the fastdebug build from building
(but did not affect product and debug builds). Please take a look at
http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-deopt-v2
I've included here the explanatory text from the first submission
This consists of at least the start of support for deoptimization in HSAIL kernels. Although the list of files changed may look long, many of the files have only a few lines changed. Special thanks to Gilles Duboscq and Doug Simon who provided some of the infrastructure that this webrev uses.
-- Tom Deneau
Below I have described
* an overview of the codepaths the data structures
* java and hotspot source changes
Deoptimization Data Structures and Overview
===========================================
At kernel dispatch time, we allocate space for any workitems should want to deopt. To reduce the space requirements, space is only reserved for the maximum number of possible concurrent workitems that could be running on that hsa device.
A deopting workitem sets a "deopt happened" flag, and future workitems that see "deopt happened" as true will just set a flag saying they never ran and exit early. Currently the never_ran array is one per workitem. We are looking at ways to make this smaller but HSA devices have a lot of freedom in how they schedule workitems (current hardware and the simulator are quite different).
Workitems that deopt atomically bump an index saying where they should store their deopt data. The deopt data consists of
* workitemid
* deopt actionAndReason
* the first HSAILFrame
An HSAILFrame consists of
* the deoptId or "pc" offset where the deopt occurred
* number of s registers
* number of d registers
* a bitmap indicating which d registers are oops
* space for saving the d and s registers
Currently we always set num_s_registers to 32 and num_d_registers to
16 but in the hsail code of the kernel we only save the union of the actual registers that are live at any of the infopoints.
On return from the GPU, we check if there were any deopts. If not, we just return back to java. If there was at least one deopt then
a) for the workitems that finished normally, there is nothing to do
b) if there are any deopted workitems, we want to run each deopting
workitem thru the interpreter going first thru the special host
trampoline code infrastructure that Gilles created. The
trampoline host code takes the deoptId and a pointer to the
saved hsail frame. We currently do this sequentially although
other policies are possible.
c) for each never ran workitem, we can just run it from the
beginning of the kernel "method", just making sure we pass the
arguments and the appropriate workitem id for each one. Again,
we currently do this sequentially although other policies are
possible.
Because running either type b or c above can cause GCs, and because some of our saved d registers are pointers into the java heap, we take care in case any of these saved pointers are affected by GC. The current strategy of using an Object array supplied by the java side will be replaced later with an oops_do type of strategy.
Description of source changes in this webrev.
=============================================
graal source changes
====================
Assembler, HSAILAssembler
minor changes for new instructions needed for saving deopt information
GraalKernelTester.java
force simulator to run single threaded.
KernelTester.java
minor changes to handle exceptions which escape the kernel method
HSAILLIRGenerator.java
support switches with keys of type long
HSAILHotSpotBackend.java
* compileKernel uses some new optimisticOpts which help generate
deopts when needed. Also, we dump the infopoints if Log:CodeGen
is on
* HSAILNonNullParametersPhase stamps the appropriate parameters as
nonNull
* installKernel uses the new trampoline infrastructure added by
Gilles do produce the host trampoline deopt method and install
it.
* emitCode adds a little bit of code to the prologue and a lot of
code to the epilogue. See description at the bottom for the data
structures used by the never-ran path and the deopt path.
HSAILHotSpotLIRGenerator.java
* code added by Gilles to build the host graph for the host
trampoline deopt method. I suppose some of this would be common
to any gpu trampoline deopt and should be moved to some
hsail-independent location.
* code to handle the creation of a DeoptimizeOp for emitting HSAIL
code for a deoptimization
HSAILHotSpotLoweringProvider.java
* refactored to support different strategies for different nodes.
UnwindNode strategy is to get replaced by a DeoptimizeNode.
HotSpotVMConfig.java
* define offets to fields in the deopt data structures
VMErrorNode.java
* public access to constructor (used by building of host graph for
trampoline code)
HSAIL.java
* some new non-allocatable registers defined (used by deopt paths)
HSAILControlFlow.java
* code to emit hsail for a deoptimizationNode
ComputeProbabilityClosure.java
* just using a change that Gilles made in the patch he gave me.
mx/projects was affected by the move of ExternalCompilationResult to com.oracle.graal.gpu. In addition, several files had one line import changes from the move of ExternalCompilationResult.
hotspot source changes
======================
gpu_hsail.cpp, hpp
* the execute_kernel routine pushes an extra parameter where deopt info can be saved
* while pushing kernel args, keeps track if any are null and if so
sets some new gpu_exception fields in thread structure which gets
used when thread returns to java mode
* on return from kernel checks if any deopt occurred. If so,
* runs any deopting workitems thru the trampoline deopt code
which ends up running the kernel method thru the interpreter
for that workitem.
* runs any never-ran workitems using simple javaCall.
gpu_hsail_Frame.hpp
* new structure that defines the layout of a physical HSAIL frame
hsailArgumentsBase.*, hsailKernelArguments.hpp hsailJavaCallArguments.hpp
* refactored to share code between kernel argument setup and
javaCall argument setup
javaClasses.cpp
* contains logic to check the new gpu_exception fields in thread
structure and if detected, set as top frame on return
graalCompiler.cpp, hpp
* logic added by Gilles for external_deopt_i2c
javaCalls.cpp, hpp
* logic added by Gilles for external_deopt_i2c
sharedRuntime.cpp
* maybe Gilles can explain why the assert was removed in the patch
he gave me (it asserts if I put it back in)
thread.cpp, hpp
* handle new gpu_exception fields
vmStructs.cpp
vmStructs_hsail.hpp
* handle new hsail deopt structs
More information about the graal-dev
mailing list