webrev-hsail-deopt-v2

Doug Simon doug.simon at oracle.com
Tue Mar 25 21:09:34 UTC 2014


Before anything else, I want to give Gilles a chance to comment on the latest webrev.

-Doug

On Mar 25, 2014, at 9:17 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> Doug --
> 
> Do you want the next webrev merged with the tip of trunk or is it OK to leave it as is branching off trunk from last week?
> 
> -- Tom
> 
> 
>> -----Original Message-----
>> From: Doug Simon [mailto:doug.simon at oracle.com]
>> Sent: Tuesday, March 25, 2014 2:14 PM
>> To: Deneau, Tom
>> Cc: Gilles Duboscq; graal-dev at openjdk.java.net
>> Subject: Re: webrev-hsail-deopt-v2
>> 
>> Sounds like a reasonable workaround to me.
>> 
>> On Mar 25, 2014, at 7:01 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>> 
>>> I ended up creating (or re-creating) an HSAILCompilationResultBuilder
>> whose only purpose is to save the lirGen.
>>> It works, but let me know if that meets the style.
>>> 
>>> -- Tom
>>> 
>>>> -----Original Message-----
>>>> From: graal-dev [mailto:graal-dev-bounces at openjdk.java.net] On Behalf
>> Of
>>>> Deneau, Tom
>>>> Sent: Tuesday, March 25, 2014 12:10 PM
>>>> To: Doug Simon; Gilles Duboscq
>>>> Cc: graal-dev at openjdk.java.net
>>>> Subject: RE: webrev-hsail-deopt-v2
>>>> 
>>>> Doug, Gilles --
>>>> 
>>>> Regarding the
>>>>    private LIRGenerator lirGen;
>>>> in HSAILHotSpotBackend.java.
>>>> 
>>>> I agree with your statement that lirGen should be per-compilation,
>> not
>>>> per backend.
>>>> I added it only because it was needed by the code Gilles added at the
>>>> end of emitCode
>>>> 
>>>>       compilationResult.setHostGraph(((HSAILHotSpotLIRGenerator)
>>>> lirGen).prepareHostGraph());
>>>> 
>>>> I see that lirGen is passed in to newCompilationResultBuilder and
>> used
>>>> in that method but was not saved anywhere.
>>>> 
>>>> I thought perhaps it could be saved in CompilationResultBuilder
>> itself,
>>>> but that led to the dreaded loops in the project graphs (if I made
>>>> com.oracle.graal.compiler project a dependency of
>> com.oracle.graal.lir).
>>>> 
>>>> So what is the best way to solve this problem?
>>>> 
>>>> -- Tom
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: Doug Simon [mailto:doug.simon at oracle.com]
>>>>> Sent: Monday, March 24, 2014 6:13 AM
>>>>> To: Deneau, Tom
>>>>> Cc: graal-dev at openjdk.java.net
>>>>> Subject: Re: webrev-hsail-deopt-v2
>>>>> 
>>>>> Tom,
>>>>> 
>>>>> Thanks for fixing the issues in the first patch. Apart from some
>>>>> seemingly excessive memory usage (which can hopefully be address
>>>>> later), the logic in this patch looks mostly correct to me. However,
>>>>> Gilles needs to also provide feedback before we can integrate this
>>>>> patch as he is more familiar with some of the details.
>>>>> 
>>>>> Per-file comments below.
>>>>> 
>>>>> 
>> graal/com.oracle.graal.hotspot.hsail/src/com/oracle/graal/hotspot/hsai
>>>>> l/
>>>>> HSAILHotSpotBackend.java:
>>>>> 
>>>>> These cannot be fields in HSAILHotSpotBackend as they are per-
>>>>> compilation values, not per-backend values:
>>>>> 
>>>>> +    private HSAILAssembler asm;
>>>>> +    private LIRGenerator lirGen;
>>>>> 
>>>>> There's no real value in having HSAILNonNullParametersPhase extend
>>>>> NonNullParametersPhase since the former overrides all the logic in
>> the
>>>>> latter.
>>>>> 
>>>>> 
>> graal/com.oracle.graal.hotspot.hsail/src/com/oracle/graal/hotspot/hsai
>>>>> l/
>>>>> HSAILHotSpotLIRGenerator.java:
>>>>> 
>>>>> Please create named constants for 40 and 8 that explains what they
>>>> are:
>>>>> 
>>>>> 204         if (regNumber < 40) {
>>>>> 205             long offset = config.hsailFrameSaveAreaOffset + 4 *
>>>>> (regNumber - 8);
>>>>> 
>>>>> src/gpu/hsail/vm/gpu_hsail.hpp:
>>>>> 
>>>>> It would be nice to see the CU acronym explained somewhere.
>>>>> 
>>>>> I know that you are thinking of supporting multiple frames in the
>>>>> future but I would not include code partially supporting that now
>>>>> otherwise it just adds confusion. For example, the first frame is
>>>>> contained in a HSAILKernelDeoptimization object, but
>>>>> HSAILFrame::_nextFrame is a pointer to a HSAILFrame. Who
>>>> allocates/deallocates the next frame(s)?
>>>>> 
>>>>> 46     HSAILFrame _first_frame;
>>>>> 
>>>>> What's the point of HSAILComputeUnitSaveState? In future will it do
>>>>> more than simply wrap a HSAILKernelDeoptimization?
>>>>> 
>>>>> 55   class HSAILComputeUnitSaveState {
>>>>> 56     friend class VMStructs;
>>>>> 57    public:
>>>>> 58     HSAILKernelDeoptimization  _deopt_info;
>>>>> 59   };
>>>>> 
>>>>> Why not make the definition extra clear by making the preprocessor
>> do
>>>>> the math (i.e. "#define CU_SAVE_STATES_SIZE (8 * 36 * 64)")
>>>>> 
>>>>> 61 // 8 cus * 36 waves per cu * wavesize 64
>>>>> 62 #define CU_SAVE_STATES_SIZE     18432
>>>>> 
>>>>> Let's use a longer name for clarity (e.g.,
>>>>> _sizeof_compute_unit_save_state). This should also be reflect in
>>>>> HotSpotVMConfig.
>>>>> 
>>>>> 69     jint _sizeof_cuss;
>>>>> 
>>>>> src/gpu/hsail/vm/gpu_hsail.cpp:
>>>>> 
>>>>> I was going to suggest stack allocating the HSAILDeoptimizationInfo
>>>>> until I realized it's 20Mb! Is that a conservative approach we can
>>>>> improve upon somehow (e.g., a lazily created thread local)? In any
>>>>> case, shouldn't the actual allocation be within the "if
>>>> (useDeoptInfo)" guard?
>>>>> 
>>>>> 149   HSAILDeoptimizationInfo* e = new (ResourceObj::C_HEAP,
>>>> mtInternal)
>>>>> HSAILDeoptimizationInfo();
>>>>> 150   if (useDeoptInfo) {
>>>>> 
>>>>> -Doug
>>>>> 
>>>>> On Mar 20, 2014, at 9:57 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>>>>> 
>>>>>> Doug, Gilles --
>>>>>> 
>>>>>> I believe I have gotten rid of the warnings from the original
>> hsail-
>>>>> deopt webrev.
>>>>>> I also fixed a few casts that Eric Caspole noticed were preventing
>>>>>> the
>>>>> fastdebug build from building
>>>>>> (but did not affect product and debug builds).    Please take a
>> look
>>>>> at
>>>>>> 
>>>>>> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-
>> deopt
>>>>>> -
>>>>> v2
>>>>>> 
>>>>>> I've included here the explanatory text from the first submission
>>>>>> 
>>>>>> This consists of at least the start of support for deoptimization
>> in
>>>>> HSAIL kernels.  Although the list of files changed may look long,
>> many
>>>>> of the files have only a few lines changed.  Special thanks to
>> Gilles
>>>>> Duboscq and Doug Simon who provided some of the infrastructure that
>>>>> this webrev uses.
>>>>>> 
>>>>>> -- Tom Deneau
>>>>>> 
>>>>>> 
>>>>>> Below I have described
>>>>>> 
>>>>>> * an overview of the codepaths the data structures
>>>>>> * java and hotspot source changes
>>>>>> 
>>>>>> 
>>>>>> Deoptimization Data Structures and Overview
>>>>>> ===========================================
>>>>>> 
>>>>>> At kernel dispatch time, we allocate space for any workitems should
>>>>> want to deopt.  To reduce the space requirements, space is only
>>>>> reserved for the maximum number of possible concurrent workitems
>> that
>>>>> could be running on that hsa device.
>>>>>> 
>>>>>> A deopting workitem sets a "deopt happened" flag, and future
>>>>>> workitems
>>>>> that see "deopt happened" as true will just set a flag saying they
>>>>> never ran and exit early.  Currently the never_ran array is one per
>>>> workitem.
>>>>> We are looking at ways to make this smaller but HSA devices have a
>> lot
>>>>> of freedom in how they schedule workitems (current hardware and the
>>>>> simulator are quite different).
>>>>>> 
>>>>>> Workitems that deopt atomically bump an index saying where they
>>>>>> should
>>>>> store their deopt data.  The deopt data consists of
>>>>>> * workitemid
>>>>>> * deopt actionAndReason
>>>>>> * the first HSAILFrame
>>>>>> 
>>>>>> An HSAILFrame consists of
>>>>>> * the deoptId or "pc" offset where the deopt occurred
>>>>>> * number of s registers
>>>>>> * number of d registers
>>>>>> * a bitmap indicating which d registers are oops
>>>>>> * space for saving the d and s registers
>>>>>> 
>>>>>> Currently we always set num_s_registers to 32 and num_d_registers
>> to
>>>>>> 16 but in the hsail code of the kernel we only save the union of
>> the
>>>>> actual registers that are live at any of the infopoints.
>>>>>> 
>>>>>> On return from the GPU, we check if there were any deopts.  If not,
>>>>>> we
>>>>> just return back to java.  If there was at least one deopt then
>>>>>> 
>>>>>> a) for the workitems that finished normally, there is nothing to
>>>>>> do
>>>>>> 
>>>>>> b) if there are any deopted workitems, we want to run each
>>>> deopting
>>>>>>    workitem thru the interpreter going first thru the special host
>>>>>>    trampoline code infrastructure that Gilles created.  The
>>>>>>    trampoline host code takes the deoptId and a pointer to the
>>>>>>    saved hsail frame.  We currently do this sequentially although
>>>>>>    other policies are possible.
>>>>>> 
>>>>>> c) for each never ran workitem, we can just run it from the
>>>>>>    beginning of the kernel "method", just making sure we pass the
>>>>>>    arguments and the appropriate workitem id for each one.  Again,
>>>>>>    we currently do this sequentially although other policies are
>>>>>>    possible.
>>>>>> 
>>>>>> Because running either type b or c above can cause GCs, and because
>>>>> some of our saved d registers are pointers into the java heap, we
>> take
>>>>> care in case any of these saved pointers are affected by GC.  The
>>>>> current strategy of using an Object array supplied by the java side
>>>>> will be replaced later with an oops_do type of strategy.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Description of source changes in this webrev.
>>>>>> =============================================
>>>>>> graal source changes
>>>>>> ====================
>>>>>> 
>>>>>> Assembler, HSAILAssembler
>>>>>> minor changes for new instructions needed for saving deopt
>>>>> information
>>>>>> 
>>>>>> GraalKernelTester.java
>>>>>> force simulator to run single threaded.
>>>>>> 
>>>>>> KernelTester.java
>>>>>> minor changes to handle exceptions which escape the kernel method
>>>>>> 
>>>>>> HSAILLIRGenerator.java
>>>>>> support switches with keys of type long
>>>>>> 
>>>>>> 
>>>>>> HSAILHotSpotBackend.java
>>>>>> 
>>>>>> * compileKernel uses some new optimisticOpts which help generate
>>>>>>   deopts when needed.  Also, we dump the infopoints if Log:CodeGen
>>>>>>   is on
>>>>>> 
>>>>>> * HSAILNonNullParametersPhase stamps the appropriate parameters as
>>>>>>   nonNull
>>>>>> 
>>>>>> * installKernel uses the new trampoline infrastructure added by
>>>>>>   Gilles do produce the host trampoline deopt method and install
>>>>>>   it.
>>>>>> 
>>>>>> * emitCode adds a little bit of code to the prologue and a lot of
>>>>>>   code to the epilogue.  See description at the bottom for the
>>>> data
>>>>>>   structures used by the never-ran path and the deopt path.
>>>>>> 
>>>>>> HSAILHotSpotLIRGenerator.java
>>>>>> 
>>>>>> * code added by Gilles to build the host graph for the host
>>>>>>   trampoline deopt method.  I suppose some of this would be common
>>>>>>   to any gpu trampoline deopt and should be moved to some
>>>>>>   hsail-independent location.
>>>>>> 
>>>>>> * code to handle the creation of a DeoptimizeOp for emitting HSAIL
>>>>>>   code for a deoptimization
>>>>>> 
>>>>>> HSAILHotSpotLoweringProvider.java
>>>>>> 
>>>>>> * refactored to support different strategies for different nodes.
>>>>>>   UnwindNode strategy is to get replaced by a DeoptimizeNode.
>>>>>> 
>>>>>> HotSpotVMConfig.java
>>>>>> 
>>>>>> * define offets to fields in the deopt data structures
>>>>>> 
>>>>>> VMErrorNode.java
>>>>>> 
>>>>>> * public access to constructor (used by building of host graph for
>>>>>>  trampoline code)
>>>>>> 
>>>>>> HSAIL.java
>>>>>> * some new non-allocatable registers defined (used by deopt paths)
>>>>>> 
>>>>>> HSAILControlFlow.java
>>>>>> * code to emit hsail for a deoptimizationNode
>>>>>> 
>>>>>> ComputeProbabilityClosure.java
>>>>>> * just using a change that Gilles made in the patch he gave me.
>>>>>> 
>>>>>> 
>>>>>> mx/projects was affected by the move of ExternalCompilationResult
>> to
>>>>> com.oracle.graal.gpu.  In addition, several files had one line
>> import
>>>>> changes from the move of ExternalCompilationResult.
>>>>>> 
>>>>>> 
>>>>>> hotspot source changes
>>>>>> ======================
>>>>>> 
>>>>>> gpu_hsail.cpp, hpp
>>>>>> 
>>>>>> * the execute_kernel routine pushes an extra parameter where deopt
>>>>> info can be saved
>>>>>> 
>>>>>> * while pushing kernel args, keeps track if any are null and if so
>>>>>>   sets some new gpu_exception fields in thread structure which
>>>> gets
>>>>>>   used when thread returns to java mode
>>>>>> 
>>>>>> * on return from kernel checks if any deopt occurred.  If so,
>>>>>> 
>>>>>>    * runs any deopting workitems thru the trampoline deopt code
>>>>>>      which ends up running the kernel method thru the interpreter
>>>>>>      for that workitem.
>>>>>> 
>>>>>>    * runs any never-ran workitems using simple javaCall.
>>>>>> 
>>>>>> gpu_hsail_Frame.hpp
>>>>>> * new structure that defines the layout of a physical HSAIL frame
>>>>>> 
>>>>>> hsailArgumentsBase.*, hsailKernelArguments.hpp
>>>>> hsailJavaCallArguments.hpp
>>>>>> * refactored to share code between kernel argument setup and
>>>>>>   javaCall argument setup
>>>>>> 
>>>>>> javaClasses.cpp
>>>>>> 
>>>>>> * contains logic to check the new gpu_exception fields in thread
>>>>>>   structure and if detected, set as top frame on return
>>>>>> 
>>>>>> graalCompiler.cpp, hpp
>>>>>> * logic added by Gilles for external_deopt_i2c
>>>>>> 
>>>>>> javaCalls.cpp, hpp
>>>>>> * logic added by Gilles for external_deopt_i2c
>>>>>> 
>>>>>> sharedRuntime.cpp
>>>>>> * maybe Gilles can explain why the assert was removed in the patch
>>>>>>   he gave me (it asserts if I put it back in)
>>>>>> 
>>>>>> thread.cpp, hpp
>>>>>> * handle new gpu_exception fields
>>>>>> 
>>>>>> vmStructs.cpp
>>>>>> vmStructs_hsail.hpp
>>>>>> * handle new hsail deopt structs
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
> 
> 



More information about the graal-dev mailing list