webrev-hsail-deopt-v2
Doug Simon
doug.simon at oracle.com
Tue Mar 25 21:09:34 UTC 2014
Before anything else, I want to give Gilles a chance to comment on the latest webrev.
-Doug
On Mar 25, 2014, at 9:17 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> Doug --
>
> Do you want the next webrev merged with the tip of trunk or is it OK to leave it as is branching off trunk from last week?
>
> -- Tom
>
>
>> -----Original Message-----
>> From: Doug Simon [mailto:doug.simon at oracle.com]
>> Sent: Tuesday, March 25, 2014 2:14 PM
>> To: Deneau, Tom
>> Cc: Gilles Duboscq; graal-dev at openjdk.java.net
>> Subject: Re: webrev-hsail-deopt-v2
>>
>> Sounds like a reasonable workaround to me.
>>
>> On Mar 25, 2014, at 7:01 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>>
>>> I ended up creating (or re-creating) an HSAILCompilationResultBuilder
>> whose only purpose is to save the lirGen.
>>> It works, but let me know if that meets the style.
>>>
>>> -- Tom
>>>
>>>> -----Original Message-----
>>>> From: graal-dev [mailto:graal-dev-bounces at openjdk.java.net] On Behalf
>> Of
>>>> Deneau, Tom
>>>> Sent: Tuesday, March 25, 2014 12:10 PM
>>>> To: Doug Simon; Gilles Duboscq
>>>> Cc: graal-dev at openjdk.java.net
>>>> Subject: RE: webrev-hsail-deopt-v2
>>>>
>>>> Doug, Gilles --
>>>>
>>>> Regarding the
>>>> private LIRGenerator lirGen;
>>>> in HSAILHotSpotBackend.java.
>>>>
>>>> I agree with your statement that lirGen should be per-compilation,
>> not
>>>> per backend.
>>>> I added it only because it was needed by the code Gilles added at the
>>>> end of emitCode
>>>>
>>>> compilationResult.setHostGraph(((HSAILHotSpotLIRGenerator)
>>>> lirGen).prepareHostGraph());
>>>>
>>>> I see that lirGen is passed in to newCompilationResultBuilder and
>> used
>>>> in that method but was not saved anywhere.
>>>>
>>>> I thought perhaps it could be saved in CompilationResultBuilder
>> itself,
>>>> but that led to the dreaded loops in the project graphs (if I made
>>>> com.oracle.graal.compiler project a dependency of
>> com.oracle.graal.lir).
>>>>
>>>> So what is the best way to solve this problem?
>>>>
>>>> -- Tom
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Doug Simon [mailto:doug.simon at oracle.com]
>>>>> Sent: Monday, March 24, 2014 6:13 AM
>>>>> To: Deneau, Tom
>>>>> Cc: graal-dev at openjdk.java.net
>>>>> Subject: Re: webrev-hsail-deopt-v2
>>>>>
>>>>> Tom,
>>>>>
>>>>> Thanks for fixing the issues in the first patch. Apart from some
>>>>> seemingly excessive memory usage (which can hopefully be address
>>>>> later), the logic in this patch looks mostly correct to me. However,
>>>>> Gilles needs to also provide feedback before we can integrate this
>>>>> patch as he is more familiar with some of the details.
>>>>>
>>>>> Per-file comments below.
>>>>>
>>>>>
>> graal/com.oracle.graal.hotspot.hsail/src/com/oracle/graal/hotspot/hsai
>>>>> l/
>>>>> HSAILHotSpotBackend.java:
>>>>>
>>>>> These cannot be fields in HSAILHotSpotBackend as they are per-
>>>>> compilation values, not per-backend values:
>>>>>
>>>>> + private HSAILAssembler asm;
>>>>> + private LIRGenerator lirGen;
>>>>>
>>>>> There's no real value in having HSAILNonNullParametersPhase extend
>>>>> NonNullParametersPhase since the former overrides all the logic in
>> the
>>>>> latter.
>>>>>
>>>>>
>> graal/com.oracle.graal.hotspot.hsail/src/com/oracle/graal/hotspot/hsai
>>>>> l/
>>>>> HSAILHotSpotLIRGenerator.java:
>>>>>
>>>>> Please create named constants for 40 and 8 that explains what they
>>>> are:
>>>>>
>>>>> 204 if (regNumber < 40) {
>>>>> 205 long offset = config.hsailFrameSaveAreaOffset + 4 *
>>>>> (regNumber - 8);
>>>>>
>>>>> src/gpu/hsail/vm/gpu_hsail.hpp:
>>>>>
>>>>> It would be nice to see the CU acronym explained somewhere.
>>>>>
>>>>> I know that you are thinking of supporting multiple frames in the
>>>>> future but I would not include code partially supporting that now
>>>>> otherwise it just adds confusion. For example, the first frame is
>>>>> contained in a HSAILKernelDeoptimization object, but
>>>>> HSAILFrame::_nextFrame is a pointer to a HSAILFrame. Who
>>>> allocates/deallocates the next frame(s)?
>>>>>
>>>>> 46 HSAILFrame _first_frame;
>>>>>
>>>>> What's the point of HSAILComputeUnitSaveState? In future will it do
>>>>> more than simply wrap a HSAILKernelDeoptimization?
>>>>>
>>>>> 55 class HSAILComputeUnitSaveState {
>>>>> 56 friend class VMStructs;
>>>>> 57 public:
>>>>> 58 HSAILKernelDeoptimization _deopt_info;
>>>>> 59 };
>>>>>
>>>>> Why not make the definition extra clear by making the preprocessor
>> do
>>>>> the math (i.e. "#define CU_SAVE_STATES_SIZE (8 * 36 * 64)")
>>>>>
>>>>> 61 // 8 cus * 36 waves per cu * wavesize 64
>>>>> 62 #define CU_SAVE_STATES_SIZE 18432
>>>>>
>>>>> Let's use a longer name for clarity (e.g.,
>>>>> _sizeof_compute_unit_save_state). This should also be reflect in
>>>>> HotSpotVMConfig.
>>>>>
>>>>> 69 jint _sizeof_cuss;
>>>>>
>>>>> src/gpu/hsail/vm/gpu_hsail.cpp:
>>>>>
>>>>> I was going to suggest stack allocating the HSAILDeoptimizationInfo
>>>>> until I realized it's 20Mb! Is that a conservative approach we can
>>>>> improve upon somehow (e.g., a lazily created thread local)? In any
>>>>> case, shouldn't the actual allocation be within the "if
>>>> (useDeoptInfo)" guard?
>>>>>
>>>>> 149 HSAILDeoptimizationInfo* e = new (ResourceObj::C_HEAP,
>>>> mtInternal)
>>>>> HSAILDeoptimizationInfo();
>>>>> 150 if (useDeoptInfo) {
>>>>>
>>>>> -Doug
>>>>>
>>>>> On Mar 20, 2014, at 9:57 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>>>>>
>>>>>> Doug, Gilles --
>>>>>>
>>>>>> I believe I have gotten rid of the warnings from the original
>> hsail-
>>>>> deopt webrev.
>>>>>> I also fixed a few casts that Eric Caspole noticed were preventing
>>>>>> the
>>>>> fastdebug build from building
>>>>>> (but did not affect product and debug builds). Please take a
>> look
>>>>> at
>>>>>>
>>>>>> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-
>> deopt
>>>>>> -
>>>>> v2
>>>>>>
>>>>>> I've included here the explanatory text from the first submission
>>>>>>
>>>>>> This consists of at least the start of support for deoptimization
>> in
>>>>> HSAIL kernels. Although the list of files changed may look long,
>> many
>>>>> of the files have only a few lines changed. Special thanks to
>> Gilles
>>>>> Duboscq and Doug Simon who provided some of the infrastructure that
>>>>> this webrev uses.
>>>>>>
>>>>>> -- Tom Deneau
>>>>>>
>>>>>>
>>>>>> Below I have described
>>>>>>
>>>>>> * an overview of the codepaths the data structures
>>>>>> * java and hotspot source changes
>>>>>>
>>>>>>
>>>>>> Deoptimization Data Structures and Overview
>>>>>> ===========================================
>>>>>>
>>>>>> At kernel dispatch time, we allocate space for any workitems should
>>>>> want to deopt. To reduce the space requirements, space is only
>>>>> reserved for the maximum number of possible concurrent workitems
>> that
>>>>> could be running on that hsa device.
>>>>>>
>>>>>> A deopting workitem sets a "deopt happened" flag, and future
>>>>>> workitems
>>>>> that see "deopt happened" as true will just set a flag saying they
>>>>> never ran and exit early. Currently the never_ran array is one per
>>>> workitem.
>>>>> We are looking at ways to make this smaller but HSA devices have a
>> lot
>>>>> of freedom in how they schedule workitems (current hardware and the
>>>>> simulator are quite different).
>>>>>>
>>>>>> Workitems that deopt atomically bump an index saying where they
>>>>>> should
>>>>> store their deopt data. The deopt data consists of
>>>>>> * workitemid
>>>>>> * deopt actionAndReason
>>>>>> * the first HSAILFrame
>>>>>>
>>>>>> An HSAILFrame consists of
>>>>>> * the deoptId or "pc" offset where the deopt occurred
>>>>>> * number of s registers
>>>>>> * number of d registers
>>>>>> * a bitmap indicating which d registers are oops
>>>>>> * space for saving the d and s registers
>>>>>>
>>>>>> Currently we always set num_s_registers to 32 and num_d_registers
>> to
>>>>>> 16 but in the hsail code of the kernel we only save the union of
>> the
>>>>> actual registers that are live at any of the infopoints.
>>>>>>
>>>>>> On return from the GPU, we check if there were any deopts. If not,
>>>>>> we
>>>>> just return back to java. If there was at least one deopt then
>>>>>>
>>>>>> a) for the workitems that finished normally, there is nothing to
>>>>>> do
>>>>>>
>>>>>> b) if there are any deopted workitems, we want to run each
>>>> deopting
>>>>>> workitem thru the interpreter going first thru the special host
>>>>>> trampoline code infrastructure that Gilles created. The
>>>>>> trampoline host code takes the deoptId and a pointer to the
>>>>>> saved hsail frame. We currently do this sequentially although
>>>>>> other policies are possible.
>>>>>>
>>>>>> c) for each never ran workitem, we can just run it from the
>>>>>> beginning of the kernel "method", just making sure we pass the
>>>>>> arguments and the appropriate workitem id for each one. Again,
>>>>>> we currently do this sequentially although other policies are
>>>>>> possible.
>>>>>>
>>>>>> Because running either type b or c above can cause GCs, and because
>>>>> some of our saved d registers are pointers into the java heap, we
>> take
>>>>> care in case any of these saved pointers are affected by GC. The
>>>>> current strategy of using an Object array supplied by the java side
>>>>> will be replaced later with an oops_do type of strategy.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Description of source changes in this webrev.
>>>>>> =============================================
>>>>>> graal source changes
>>>>>> ====================
>>>>>>
>>>>>> Assembler, HSAILAssembler
>>>>>> minor changes for new instructions needed for saving deopt
>>>>> information
>>>>>>
>>>>>> GraalKernelTester.java
>>>>>> force simulator to run single threaded.
>>>>>>
>>>>>> KernelTester.java
>>>>>> minor changes to handle exceptions which escape the kernel method
>>>>>>
>>>>>> HSAILLIRGenerator.java
>>>>>> support switches with keys of type long
>>>>>>
>>>>>>
>>>>>> HSAILHotSpotBackend.java
>>>>>>
>>>>>> * compileKernel uses some new optimisticOpts which help generate
>>>>>> deopts when needed. Also, we dump the infopoints if Log:CodeGen
>>>>>> is on
>>>>>>
>>>>>> * HSAILNonNullParametersPhase stamps the appropriate parameters as
>>>>>> nonNull
>>>>>>
>>>>>> * installKernel uses the new trampoline infrastructure added by
>>>>>> Gilles do produce the host trampoline deopt method and install
>>>>>> it.
>>>>>>
>>>>>> * emitCode adds a little bit of code to the prologue and a lot of
>>>>>> code to the epilogue. See description at the bottom for the
>>>> data
>>>>>> structures used by the never-ran path and the deopt path.
>>>>>>
>>>>>> HSAILHotSpotLIRGenerator.java
>>>>>>
>>>>>> * code added by Gilles to build the host graph for the host
>>>>>> trampoline deopt method. I suppose some of this would be common
>>>>>> to any gpu trampoline deopt and should be moved to some
>>>>>> hsail-independent location.
>>>>>>
>>>>>> * code to handle the creation of a DeoptimizeOp for emitting HSAIL
>>>>>> code for a deoptimization
>>>>>>
>>>>>> HSAILHotSpotLoweringProvider.java
>>>>>>
>>>>>> * refactored to support different strategies for different nodes.
>>>>>> UnwindNode strategy is to get replaced by a DeoptimizeNode.
>>>>>>
>>>>>> HotSpotVMConfig.java
>>>>>>
>>>>>> * define offets to fields in the deopt data structures
>>>>>>
>>>>>> VMErrorNode.java
>>>>>>
>>>>>> * public access to constructor (used by building of host graph for
>>>>>> trampoline code)
>>>>>>
>>>>>> HSAIL.java
>>>>>> * some new non-allocatable registers defined (used by deopt paths)
>>>>>>
>>>>>> HSAILControlFlow.java
>>>>>> * code to emit hsail for a deoptimizationNode
>>>>>>
>>>>>> ComputeProbabilityClosure.java
>>>>>> * just using a change that Gilles made in the patch he gave me.
>>>>>>
>>>>>>
>>>>>> mx/projects was affected by the move of ExternalCompilationResult
>> to
>>>>> com.oracle.graal.gpu. In addition, several files had one line
>> import
>>>>> changes from the move of ExternalCompilationResult.
>>>>>>
>>>>>>
>>>>>> hotspot source changes
>>>>>> ======================
>>>>>>
>>>>>> gpu_hsail.cpp, hpp
>>>>>>
>>>>>> * the execute_kernel routine pushes an extra parameter where deopt
>>>>> info can be saved
>>>>>>
>>>>>> * while pushing kernel args, keeps track if any are null and if so
>>>>>> sets some new gpu_exception fields in thread structure which
>>>> gets
>>>>>> used when thread returns to java mode
>>>>>>
>>>>>> * on return from kernel checks if any deopt occurred. If so,
>>>>>>
>>>>>> * runs any deopting workitems thru the trampoline deopt code
>>>>>> which ends up running the kernel method thru the interpreter
>>>>>> for that workitem.
>>>>>>
>>>>>> * runs any never-ran workitems using simple javaCall.
>>>>>>
>>>>>> gpu_hsail_Frame.hpp
>>>>>> * new structure that defines the layout of a physical HSAIL frame
>>>>>>
>>>>>> hsailArgumentsBase.*, hsailKernelArguments.hpp
>>>>> hsailJavaCallArguments.hpp
>>>>>> * refactored to share code between kernel argument setup and
>>>>>> javaCall argument setup
>>>>>>
>>>>>> javaClasses.cpp
>>>>>>
>>>>>> * contains logic to check the new gpu_exception fields in thread
>>>>>> structure and if detected, set as top frame on return
>>>>>>
>>>>>> graalCompiler.cpp, hpp
>>>>>> * logic added by Gilles for external_deopt_i2c
>>>>>>
>>>>>> javaCalls.cpp, hpp
>>>>>> * logic added by Gilles for external_deopt_i2c
>>>>>>
>>>>>> sharedRuntime.cpp
>>>>>> * maybe Gilles can explain why the assert was removed in the patch
>>>>>> he gave me (it asserts if I put it back in)
>>>>>>
>>>>>> thread.cpp, hpp
>>>>>> * handle new gpu_exception fields
>>>>>>
>>>>>> vmStructs.cpp
>>>>>> vmStructs_hsail.hpp
>>>>>> * handle new hsail deopt structs
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
More information about the graal-dev
mailing list