webrev to extend hsail allocation to allow gpu to refill tlab

Mon Jun 9 15:25:44 UTC 2014

Hi all --

Has anyone had a chance to look at this webrev posted last Tuesday?

-- Tom

> -----Original Message-----
> From: Deneau, Tom
> Sent: Tuesday, June 03, 2014 4:27 PM
> To: graal-dev at openjdk.java.net
> Subject: webrev to extend hsail allocation to allow gpu to refill tlab
> 
> I have placed a webrev up at
>   http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-refill-
> tlab-gpu
> which we would like to get checked into the graal trunk.
> 
> This webrev extends the existing hsail heap allocation logic.  In the
> existing logic, when a workitem cannot allocate from the current tlab,
> it just deoptimizes.  In this webrev, we add logic to allocate a new
> tlab from the gpu.
> 
> The algorithm must deal with the fact that multiple hsa workitems can
> share a single tlab, and so multiple workitems can "overflow".  A
> workitem can tell if it is the "first overflower" and the first
> overflower is charged with getting a new tlab while the other workitems
> wait for the new tlab to be announced.
> 
> Workitems access a tlab thru a fixed register (sort of like a thread
> register) which instead of pointing to a donor thread now points to a
> HSAILTlabInfo structure, which is sort of a subset of a full tlab
> struct, containing just the fields that we would actually use on the
> gpu.
> 
>    struct HSAILTlabInfo {
>       HeapWord *  _start;                 // normal vm tlab fields,
> start, top, end, etc.
>       HeapWord *  _top;
>       HeapWord *  _end;
>       // additional data not in a normal tlab
>       HeapWord * _lastGoodTop;            // first overflower records
> this
>       JavaThread * _donor_thread;         // donor thread associated
> with this tlabInfo
>    }
> 
> The first overflower grabs a new tlabInfo structure and allocates a new
> tlab (using edenAllocate) and "publishes" the new tlabInfo for other
> workitems to start using.  See the routine called
> allocateFromTlabSlowPath in HSAILNewObjectSnippets.
> 
> Eventually when hsail function calls are supported, this slow path will
> not be inlined but will be called as a stub.
> 
> Other changes:
> 
>    * the allocation-related logic was removed from gpu_hsail.cpp into
>      gpu_hsail_tlab.hpp.  The HSAILHotSpotNmethod now keeps track of
>      whether a kernel uses allocation and avoids this logic if it does
>      not.
> 
>       * Before the kernel runs, the donor thread tlabs are used to set
>         up the initial tlabInfo records, and a tlab allocation is done
>         here if the donor thread tlab is empty.
> 
>       * When kernel is finished running, the cpu side will see a list
>         of one or more HSAILTlabInfos and basically postprocesses
>         these, fixing up any overflows and making them parsable and
>         copying them back to the appropriate donor thread as needed.
> 
>    * the inter-workitem communication required the use of the hsail
>      instructions for load_acquire and store_release from the
>      snippets.  The HSAILDirectLoadAcquireNode and
>      HSAILDirectStoreReleaseNode with associated NodeIntrinsics were
>      created to handle this.  A node for creating a workitemabsid
>      instruction was also created, it is not used in the algorithm as
>      such but was useful for adding debug traces.
> 
>    * In HSAILHotSpotBackend, the logic to decide whether a kernel uses
>      allocation or not was made more precise.  (This flag is also made
>      available at execute time.)  There were several atomic_add
>      related tests were falsely being marked as requiring
>      HSAILAllocation and thus HSAILDeoptimization support. This
>      marking was removed.
> 
> -- Tom Deneau