webrev to extend hsail allocation to allow gpu to refill tlab

Tue Jun 10 19:54:13 UTC 2014

On Jun 10, 2014, at 12:49 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> Sorry formatting error, instead use
> 
> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-refill-tlab-gpu-v4

Yes, that looks better.

> 
> 
>> -----Original Message-----
>> From: graal-dev [mailto:graal-dev-bounces at openjdk.java.net] On Behalf Of
>> Deneau, Tom
>> Sent: Tuesday, June 10, 2014 2:39 PM
>> To: Christian Thalinger
>> Cc: graal-dev at openjdk.java.net
>> Subject: RE: webrev to extend hsail allocation to allow gpu to refill
>> tlab
>> 
>> Christian --
>> 
>> OK a new version is available at
>> 
>> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-refill-
>> tlab-gpu-v3
>> 
>> I reverted to c++ style multiline comments in the c++ files, since
>> that's what was being used in other files.  In the java code I did
>> change some  multi-line comments to be /* ... */ style.
>> 
>> -- Tom
>> 
>> From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
>> Sent: Tuesday, June 10, 2014 12:18 PM
>> To: Deneau, Tom
>> Cc: graal-dev at openjdk.java.net
>> Subject: Re: webrev to extend hsail allocation to allow gpu to refill
>> tlab
>> 
>> graal/com.oracle.graal.lir.hsail/src/com/oracle/graal/lir/hsail/HSAILMov
>> e.java
>> 
>> 
>> 
>> +    /**
>> 
>> +     * A LoadOp that uses the HSAIL ld_acq instruction
>> 
>> +     **/
>> 
>> +    public static class LoadAcquireOp extends LoadOp {
>> 
>> Please use the Java convention of:
>> 
>> /**
>> *
>> */
>> 
>> in this file and all the others (if it applies).
>> 
>> src/gpu/hsail/vm/gpu_hsail.cpp
>> 
>> -void * Hsail::_device_context = NULL;
>> 
>> +void* Hsail::_device_context = NULL;
>> 
>> jint   Hsail::_notice_safepoints = false;
>> If you started to have names aligned you should keep it or get rid of
>> it.
>> 
>> 
>> +  /**
>> 
>> +   * We avoid HSAILAllocationInfo logic if kernel does not allocate
>> 
>> +   * in which case the donor_thread array passed in will be null
>> 
>> +   */
>> 
>> +  HSAILAllocationInfo* allocInfo = (donor_threads == NULL ? NULL : new
>> HSAILAllocationInfo(donor_threads, dimX, allocBytesPerWorkitem));
>> 
>> Don't use Doxygen-style method comments in code; just regular /* ... */.
>> 
>> src/gpu/hsail/vm/gpu_hsail.hpp
>> There are some space left, e.g.:
>> 
>> 
>> +    HSAILTlabInfo ** _cur_tlab_info;   // copy of what was in the
>> HSAILAllocationInfo, to avoid an extra indirection
>> 
>> +    HSAILAllocationInfo * _alloc_info;
>> 
>> and a couple more.
>> 
>> src/gpu/hsail/vm/hsailJavaCallArguments.hpp
>> 
>> -  JavaCallArguments *_javaArgs;
>> 
>> +  JavaCallArguments*_javaArgs;
>> 
>> Here is a space missing.
>> 
>> Thanks for the changes.  It looks much better already.
>> 
>> The thing I mentioned about multi-line comments was mainly for Java code
>> and you still have them in e.g.
>> graal/com.oracle.graal.hotspot.hsail/src/com/oracle/graal/hotspot/hsail/
>> replacements/HSAILNewObjectSnippets.java
>> 
>> 
>> On Jun 9, 2014, at 2:11 PM, Deneau, Tom
>> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
>> 
>> 
>> I have tried to address Christian's comments in
>> 
>> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-refill-
>> tlab-gpu-v2
>> 
>> -- Tom
>> 
>> 
>> 
>> From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
>> Sent: Monday, June 09, 2014 11:41 AM
>> To: Deneau, Tom
>> Cc: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>
>> Subject: Re: webrev to extend hsail allocation to allow gpu to refill
>> tlab
>> 
>> graal/com.oracle.graal.hotspot/src/com/oracle/graal/hotspot/HotSpotVMCon
>> fig.java
>> Remove the space between the type and * for e.g. HSAILAllocationInfo *.
>> Same in other files like src/gpu/hsail/vm/vmStructs_hsail.hpp.
>> graal/com.oracle.graal.lir.hsail/src/com/oracle/graal/lir/hsail/HSAILMov
>> e.java
>> 
>> +        public LoadOp(Kind kind, AllocatableValue result,
>> HSAILAddressValue address, LIRFrameState state, boolean useLoadAcquire)
>> {
>> 
>> +        public StoreOp(Kind kind, HSAILAddressValue address,
>> AllocatableValue input, LIRFrameState state, boolean useRelease) {
>> I would prefer separate LoadAcquireOp/StoreReleaseOp classes.  It makes
>> the uses more readable.
>> 
>> src/gpu/hsail/vm/gpu_hsail_Tlab.hpp
>> 
>>  26 #define GPU_HSAIL_TLAB_HPP
>> To be consistent with other header defines this should be:
>> 
>>  26 #define GPU_HSAIL_VM_GPU_HSAIL_TLAB_HPP
>> It seems this needs to be changed in existing files too.
>> 
>> As a general comment, can we make multi-line comments C-style (/* ...
>> */) comments instead of C++ style (// ...)?  The C-style comments
>> reformat themselves automatically if changed while the C++ ones don't.
>> 
>> On Jun 9, 2014, at 9:28 AM, Christian Thalinger
>> <christian.thalinger at oracle.com<mailto:christian.thalinger at oracle.com>>
>> wrote:
>> 
>> 
>> 
>> I have the webrev still open ;-)
>> 
>> On Jun 9, 2014, at 8:25 AM, Deneau, Tom
>> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
>> 
>> 
>> 
>> Hi all --
>> 
>> Has anyone had a chance to look at this webrev posted last Tuesday?
>> 
>> -- Tom
>> 
>> 
>> 
>> -----Original Message-----
>> From: Deneau, Tom
>> Sent: Tuesday, June 03, 2014 4:27 PM
>> To: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>
>> Subject: webrev to extend hsail allocation to allow gpu to refill tlab
>> 
>> I have placed a webrev up at
>> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-refill-
>> tlab-gpu
>> which we would like to get checked into the graal trunk.
>> 
>> This webrev extends the existing hsail heap allocation logic.  In the
>> existing logic, when a workitem cannot allocate from the current tlab,
>> it just deoptimizes.  In this webrev, we add logic to allocate a new
>> tlab from the gpu.
>> 
>> The algorithm must deal with the fact that multiple hsa workitems can
>> share a single tlab, and so multiple workitems can "overflow".  A
>> workitem can tell if it is the "first overflower" and the first
>> overflower is charged with getting a new tlab while the other workitems
>> wait for the new tlab to be announced.
>> 
>> Workitems access a tlab thru a fixed register (sort of like a thread
>> register) which instead of pointing to a donor thread now points to a
>> HSAILTlabInfo structure, which is sort of a subset of a full tlab
>> struct, containing just the fields that we would actually use on the
>> gpu.
>> 
>> struct HSAILTlabInfo {
>>    HeapWord *  _start;                 // normal vm tlab fields,
>> start, top, end, etc.
>>    HeapWord *  _top;
>>    HeapWord *  _end;
>>    // additional data not in a normal tlab
>>    HeapWord * _lastGoodTop;            // first overflower records
>> this
>>    JavaThread * _donor_thread;         // donor thread associated
>> with this tlabInfo
>> }
>> 
>> The first overflower grabs a new tlabInfo structure and allocates a new
>> tlab (using edenAllocate) and "publishes" the new tlabInfo for other
>> workitems to start using.  See the routine called
>> allocateFromTlabSlowPath in HSAILNewObjectSnippets.
>> 
>> Eventually when hsail function calls are supported, this slow path will
>> not be inlined but will be called as a stub.
>> 
>> Other changes:
>> 
>> * the allocation-related logic was removed from gpu_hsail.cpp into
>>   gpu_hsail_tlab.hpp.  The HSAILHotSpotNmethod now keeps track of
>>   whether a kernel uses allocation and avoids this logic if it does
>>   not.
>> 
>>    * Before the kernel runs, the donor thread tlabs are used to set
>>      up the initial tlabInfo records, and a tlab allocation is done
>>      here if the donor thread tlab is empty.
>> 
>>    * When kernel is finished running, the cpu side will see a list
>>      of one or more HSAILTlabInfos and basically postprocesses
>>      these, fixing up any overflows and making them parsable and
>>      copying them back to the appropriate donor thread as needed.
>> 
>> * the inter-workitem communication required the use of the hsail
>>   instructions for load_acquire and store_release from the
>>   snippets.  The HSAILDirectLoadAcquireNode and
>>   HSAILDirectStoreReleaseNode with associated NodeIntrinsics were
>>   created to handle this.  A node for creating a workitemabsid
>>   instruction was also created, it is not used in the algorithm as
>>   such but was useful for adding debug traces.
>> 
>> * In HSAILHotSpotBackend, the logic to decide whether a kernel uses
>>   allocation or not was made more precise.  (This flag is also made
>>   available at execute time.)  There were several atomic_add
>>   related tests were falsely being marked as requiring
>>   HSAILAllocation and thus HSAILDeoptimization support. This
>>   marking was removed.
>> 
>> -- Tom Deneau