webrev to extend hsail allocation to allow gpu to refill tlab

Mon Jun 9 16:28:30 UTC 2014

I have the webrev still open ;-)

On Jun 9, 2014, at 8:25 AM, Deneau, Tom <tom.deneau at amd.com> wrote:

> Hi all --
> 
> Has anyone had a chance to look at this webrev posted last Tuesday?
> 
> -- Tom
> 
>> -----Original Message-----
>> From: Deneau, Tom
>> Sent: Tuesday, June 03, 2014 4:27 PM
>> To: graal-dev at openjdk.java.net
>> Subject: webrev to extend hsail allocation to allow gpu to refill tlab
>> 
>> I have placed a webrev up at
>>  http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-refill-
>> tlab-gpu
>> which we would like to get checked into the graal trunk.
>> 
>> This webrev extends the existing hsail heap allocation logic.  In the
>> existing logic, when a workitem cannot allocate from the current tlab,
>> it just deoptimizes.  In this webrev, we add logic to allocate a new
>> tlab from the gpu.
>> 
>> The algorithm must deal with the fact that multiple hsa workitems can
>> share a single tlab, and so multiple workitems can "overflow".  A
>> workitem can tell if it is the "first overflower" and the first
>> overflower is charged with getting a new tlab while the other workitems
>> wait for the new tlab to be announced.
>> 
>> Workitems access a tlab thru a fixed register (sort of like a thread
>> register) which instead of pointing to a donor thread now points to a
>> HSAILTlabInfo structure, which is sort of a subset of a full tlab
>> struct, containing just the fields that we would actually use on the
>> gpu.
>> 
>>   struct HSAILTlabInfo {
>>      HeapWord *  _start;                 // normal vm tlab fields,
>> start, top, end, etc.
>>      HeapWord *  _top;
>>      HeapWord *  _end;
>>      // additional data not in a normal tlab
>>      HeapWord * _lastGoodTop;            // first overflower records
>> this
>>      JavaThread * _donor_thread;         // donor thread associated
>> with this tlabInfo
>>   }
>> 
>> The first overflower grabs a new tlabInfo structure and allocates a new
>> tlab (using edenAllocate) and "publishes" the new tlabInfo for other
>> workitems to start using.  See the routine called
>> allocateFromTlabSlowPath in HSAILNewObjectSnippets.
>> 
>> Eventually when hsail function calls are supported, this slow path will
>> not be inlined but will be called as a stub.
>> 
>> Other changes:
>> 
>>   * the allocation-related logic was removed from gpu_hsail.cpp into
>>     gpu_hsail_tlab.hpp.  The HSAILHotSpotNmethod now keeps track of
>>     whether a kernel uses allocation and avoids this logic if it does
>>     not.
>> 
>>      * Before the kernel runs, the donor thread tlabs are used to set
>>        up the initial tlabInfo records, and a tlab allocation is done
>>        here if the donor thread tlab is empty.
>> 
>>      * When kernel is finished running, the cpu side will see a list
>>        of one or more HSAILTlabInfos and basically postprocesses
>>        these, fixing up any overflows and making them parsable and
>>        copying them back to the appropriate donor thread as needed.
>> 
>>   * the inter-workitem communication required the use of the hsail
>>     instructions for load_acquire and store_release from the
>>     snippets.  The HSAILDirectLoadAcquireNode and
>>     HSAILDirectStoreReleaseNode with associated NodeIntrinsics were
>>     created to handle this.  A node for creating a workitemabsid
>>     instruction was also created, it is not used in the algorithm as
>>     such but was useful for adding debug traces.
>> 
>>   * In HSAILHotSpotBackend, the logic to decide whether a kernel uses
>>     allocation or not was made more precise.  (This flag is also made
>>     available at execute time.)  There were several atomic_add
>>     related tests were falsely being marked as requiring
>>     HSAILAllocation and thus HSAILDeoptimization support. This
>>     marking was removed.
>> 
>> -- Tom Deneau
>