webrev-hsail-deopt-v2
Doug Simon
doug.simon at oracle.com
Tue Mar 25 19:14:01 UTC 2014
Sounds like a reasonable workaround to me.
On Mar 25, 2014, at 7:01 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> I ended up creating (or re-creating) an HSAILCompilationResultBuilder whose only purpose is to save the lirGen.
> It works, but let me know if that meets the style.
>
> -- Tom
>
>> -----Original Message-----
>> From: graal-dev [mailto:graal-dev-bounces at openjdk.java.net] On Behalf Of
>> Deneau, Tom
>> Sent: Tuesday, March 25, 2014 12:10 PM
>> To: Doug Simon; Gilles Duboscq
>> Cc: graal-dev at openjdk.java.net
>> Subject: RE: webrev-hsail-deopt-v2
>>
>> Doug, Gilles --
>>
>> Regarding the
>> private LIRGenerator lirGen;
>> in HSAILHotSpotBackend.java.
>>
>> I agree with your statement that lirGen should be per-compilation, not
>> per backend.
>> I added it only because it was needed by the code Gilles added at the
>> end of emitCode
>>
>> compilationResult.setHostGraph(((HSAILHotSpotLIRGenerator)
>> lirGen).prepareHostGraph());
>>
>> I see that lirGen is passed in to newCompilationResultBuilder and used
>> in that method but was not saved anywhere.
>>
>> I thought perhaps it could be saved in CompilationResultBuilder itself,
>> but that led to the dreaded loops in the project graphs (if I made
>> com.oracle.graal.compiler project a dependency of com.oracle.graal.lir).
>>
>> So what is the best way to solve this problem?
>>
>> -- Tom
>>
>>
>>> -----Original Message-----
>>> From: Doug Simon [mailto:doug.simon at oracle.com]
>>> Sent: Monday, March 24, 2014 6:13 AM
>>> To: Deneau, Tom
>>> Cc: graal-dev at openjdk.java.net
>>> Subject: Re: webrev-hsail-deopt-v2
>>>
>>> Tom,
>>>
>>> Thanks for fixing the issues in the first patch. Apart from some
>>> seemingly excessive memory usage (which can hopefully be address
>>> later), the logic in this patch looks mostly correct to me. However,
>>> Gilles needs to also provide feedback before we can integrate this
>>> patch as he is more familiar with some of the details.
>>>
>>> Per-file comments below.
>>>
>>> graal/com.oracle.graal.hotspot.hsail/src/com/oracle/graal/hotspot/hsai
>>> l/
>>> HSAILHotSpotBackend.java:
>>>
>>> These cannot be fields in HSAILHotSpotBackend as they are per-
>>> compilation values, not per-backend values:
>>>
>>> + private HSAILAssembler asm;
>>> + private LIRGenerator lirGen;
>>>
>>> There's no real value in having HSAILNonNullParametersPhase extend
>>> NonNullParametersPhase since the former overrides all the logic in the
>>> latter.
>>>
>>> graal/com.oracle.graal.hotspot.hsail/src/com/oracle/graal/hotspot/hsai
>>> l/
>>> HSAILHotSpotLIRGenerator.java:
>>>
>>> Please create named constants for 40 and 8 that explains what they
>> are:
>>>
>>> 204 if (regNumber < 40) {
>>> 205 long offset = config.hsailFrameSaveAreaOffset + 4 *
>>> (regNumber - 8);
>>>
>>> src/gpu/hsail/vm/gpu_hsail.hpp:
>>>
>>> It would be nice to see the CU acronym explained somewhere.
>>>
>>> I know that you are thinking of supporting multiple frames in the
>>> future but I would not include code partially supporting that now
>>> otherwise it just adds confusion. For example, the first frame is
>>> contained in a HSAILKernelDeoptimization object, but
>>> HSAILFrame::_nextFrame is a pointer to a HSAILFrame. Who
>> allocates/deallocates the next frame(s)?
>>>
>>> 46 HSAILFrame _first_frame;
>>>
>>> What's the point of HSAILComputeUnitSaveState? In future will it do
>>> more than simply wrap a HSAILKernelDeoptimization?
>>>
>>> 55 class HSAILComputeUnitSaveState {
>>> 56 friend class VMStructs;
>>> 57 public:
>>> 58 HSAILKernelDeoptimization _deopt_info;
>>> 59 };
>>>
>>> Why not make the definition extra clear by making the preprocessor do
>>> the math (i.e. "#define CU_SAVE_STATES_SIZE (8 * 36 * 64)")
>>>
>>> 61 // 8 cus * 36 waves per cu * wavesize 64
>>> 62 #define CU_SAVE_STATES_SIZE 18432
>>>
>>> Let's use a longer name for clarity (e.g.,
>>> _sizeof_compute_unit_save_state). This should also be reflect in
>>> HotSpotVMConfig.
>>>
>>> 69 jint _sizeof_cuss;
>>>
>>> src/gpu/hsail/vm/gpu_hsail.cpp:
>>>
>>> I was going to suggest stack allocating the HSAILDeoptimizationInfo
>>> until I realized it's 20Mb! Is that a conservative approach we can
>>> improve upon somehow (e.g., a lazily created thread local)? In any
>>> case, shouldn't the actual allocation be within the "if
>> (useDeoptInfo)" guard?
>>>
>>> 149 HSAILDeoptimizationInfo* e = new (ResourceObj::C_HEAP,
>> mtInternal)
>>> HSAILDeoptimizationInfo();
>>> 150 if (useDeoptInfo) {
>>>
>>> -Doug
>>>
>>> On Mar 20, 2014, at 9:57 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>>>
>>>> Doug, Gilles --
>>>>
>>>> I believe I have gotten rid of the warnings from the original hsail-
>>> deopt webrev.
>>>> I also fixed a few casts that Eric Caspole noticed were preventing
>>>> the
>>> fastdebug build from building
>>>> (but did not affect product and debug builds). Please take a look
>>> at
>>>>
>>>> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-deopt
>>>> -
>>> v2
>>>>
>>>> I've included here the explanatory text from the first submission
>>>>
>>>> This consists of at least the start of support for deoptimization in
>>> HSAIL kernels. Although the list of files changed may look long, many
>>> of the files have only a few lines changed. Special thanks to Gilles
>>> Duboscq and Doug Simon who provided some of the infrastructure that
>>> this webrev uses.
>>>>
>>>> -- Tom Deneau
>>>>
>>>>
>>>> Below I have described
>>>>
>>>> * an overview of the codepaths the data structures
>>>> * java and hotspot source changes
>>>>
>>>>
>>>> Deoptimization Data Structures and Overview
>>>> ===========================================
>>>>
>>>> At kernel dispatch time, we allocate space for any workitems should
>>> want to deopt. To reduce the space requirements, space is only
>>> reserved for the maximum number of possible concurrent workitems that
>>> could be running on that hsa device.
>>>>
>>>> A deopting workitem sets a "deopt happened" flag, and future
>>>> workitems
>>> that see "deopt happened" as true will just set a flag saying they
>>> never ran and exit early. Currently the never_ran array is one per
>> workitem.
>>> We are looking at ways to make this smaller but HSA devices have a lot
>>> of freedom in how they schedule workitems (current hardware and the
>>> simulator are quite different).
>>>>
>>>> Workitems that deopt atomically bump an index saying where they
>>>> should
>>> store their deopt data. The deopt data consists of
>>>> * workitemid
>>>> * deopt actionAndReason
>>>> * the first HSAILFrame
>>>>
>>>> An HSAILFrame consists of
>>>> * the deoptId or "pc" offset where the deopt occurred
>>>> * number of s registers
>>>> * number of d registers
>>>> * a bitmap indicating which d registers are oops
>>>> * space for saving the d and s registers
>>>>
>>>> Currently we always set num_s_registers to 32 and num_d_registers to
>>>> 16 but in the hsail code of the kernel we only save the union of the
>>> actual registers that are live at any of the infopoints.
>>>>
>>>> On return from the GPU, we check if there were any deopts. If not,
>>>> we
>>> just return back to java. If there was at least one deopt then
>>>>
>>>> a) for the workitems that finished normally, there is nothing to
>>>> do
>>>>
>>>> b) if there are any deopted workitems, we want to run each
>> deopting
>>>> workitem thru the interpreter going first thru the special host
>>>> trampoline code infrastructure that Gilles created. The
>>>> trampoline host code takes the deoptId and a pointer to the
>>>> saved hsail frame. We currently do this sequentially although
>>>> other policies are possible.
>>>>
>>>> c) for each never ran workitem, we can just run it from the
>>>> beginning of the kernel "method", just making sure we pass the
>>>> arguments and the appropriate workitem id for each one. Again,
>>>> we currently do this sequentially although other policies are
>>>> possible.
>>>>
>>>> Because running either type b or c above can cause GCs, and because
>>> some of our saved d registers are pointers into the java heap, we take
>>> care in case any of these saved pointers are affected by GC. The
>>> current strategy of using an Object array supplied by the java side
>>> will be replaced later with an oops_do type of strategy.
>>>>
>>>>
>>>>
>>>> Description of source changes in this webrev.
>>>> =============================================
>>>> graal source changes
>>>> ====================
>>>>
>>>> Assembler, HSAILAssembler
>>>> minor changes for new instructions needed for saving deopt
>>> information
>>>>
>>>> GraalKernelTester.java
>>>> force simulator to run single threaded.
>>>>
>>>> KernelTester.java
>>>> minor changes to handle exceptions which escape the kernel method
>>>>
>>>> HSAILLIRGenerator.java
>>>> support switches with keys of type long
>>>>
>>>>
>>>> HSAILHotSpotBackend.java
>>>>
>>>> * compileKernel uses some new optimisticOpts which help generate
>>>> deopts when needed. Also, we dump the infopoints if Log:CodeGen
>>>> is on
>>>>
>>>> * HSAILNonNullParametersPhase stamps the appropriate parameters as
>>>> nonNull
>>>>
>>>> * installKernel uses the new trampoline infrastructure added by
>>>> Gilles do produce the host trampoline deopt method and install
>>>> it.
>>>>
>>>> * emitCode adds a little bit of code to the prologue and a lot of
>>>> code to the epilogue. See description at the bottom for the
>> data
>>>> structures used by the never-ran path and the deopt path.
>>>>
>>>> HSAILHotSpotLIRGenerator.java
>>>>
>>>> * code added by Gilles to build the host graph for the host
>>>> trampoline deopt method. I suppose some of this would be common
>>>> to any gpu trampoline deopt and should be moved to some
>>>> hsail-independent location.
>>>>
>>>> * code to handle the creation of a DeoptimizeOp for emitting HSAIL
>>>> code for a deoptimization
>>>>
>>>> HSAILHotSpotLoweringProvider.java
>>>>
>>>> * refactored to support different strategies for different nodes.
>>>> UnwindNode strategy is to get replaced by a DeoptimizeNode.
>>>>
>>>> HotSpotVMConfig.java
>>>>
>>>> * define offets to fields in the deopt data structures
>>>>
>>>> VMErrorNode.java
>>>>
>>>> * public access to constructor (used by building of host graph for
>>>> trampoline code)
>>>>
>>>> HSAIL.java
>>>> * some new non-allocatable registers defined (used by deopt paths)
>>>>
>>>> HSAILControlFlow.java
>>>> * code to emit hsail for a deoptimizationNode
>>>>
>>>> ComputeProbabilityClosure.java
>>>> * just using a change that Gilles made in the patch he gave me.
>>>>
>>>>
>>>> mx/projects was affected by the move of ExternalCompilationResult to
>>> com.oracle.graal.gpu. In addition, several files had one line import
>>> changes from the move of ExternalCompilationResult.
>>>>
>>>>
>>>> hotspot source changes
>>>> ======================
>>>>
>>>> gpu_hsail.cpp, hpp
>>>>
>>>> * the execute_kernel routine pushes an extra parameter where deopt
>>> info can be saved
>>>>
>>>> * while pushing kernel args, keeps track if any are null and if so
>>>> sets some new gpu_exception fields in thread structure which
>> gets
>>>> used when thread returns to java mode
>>>>
>>>> * on return from kernel checks if any deopt occurred. If so,
>>>>
>>>> * runs any deopting workitems thru the trampoline deopt code
>>>> which ends up running the kernel method thru the interpreter
>>>> for that workitem.
>>>>
>>>> * runs any never-ran workitems using simple javaCall.
>>>>
>>>> gpu_hsail_Frame.hpp
>>>> * new structure that defines the layout of a physical HSAIL frame
>>>>
>>>> hsailArgumentsBase.*, hsailKernelArguments.hpp
>>> hsailJavaCallArguments.hpp
>>>> * refactored to share code between kernel argument setup and
>>>> javaCall argument setup
>>>>
>>>> javaClasses.cpp
>>>>
>>>> * contains logic to check the new gpu_exception fields in thread
>>>> structure and if detected, set as top frame on return
>>>>
>>>> graalCompiler.cpp, hpp
>>>> * logic added by Gilles for external_deopt_i2c
>>>>
>>>> javaCalls.cpp, hpp
>>>> * logic added by Gilles for external_deopt_i2c
>>>>
>>>> sharedRuntime.cpp
>>>> * maybe Gilles can explain why the assert was removed in the patch
>>>> he gave me (it asserts if I put it back in)
>>>>
>>>> thread.cpp, hpp
>>>> * handle new gpu_exception fields
>>>>
>>>> vmStructs.cpp
>>>> vmStructs_hsail.hpp
>>>> * handle new hsail deopt structs
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>
>
More information about the graal-dev
mailing list