webrev to extend hsail allocation to allow gpu to refill tlab

Mon Jun 9 16:41:04 UTC 2014

graal/com.oracle.graal.hotspot/src/com/oracle/graal/hotspot/HotSpotVMConfig.java

Remove the space between the type and * for e.g. HSAILAllocationInfo *. Same in other files like src/gpu/hsail/vm/vmStructs_hsail.hpp.
graal/com.oracle.graal.lir.hsail/src/com/oracle/graal/lir/hsail/HSAILMove.java

+        public LoadOp(Kind kind, AllocatableValue result, HSAILAddressValue address, LIRFrameState state, boolean useLoadAcquire) {
+        public StoreOp(Kind kind, HSAILAddressValue address, AllocatableValue input, LIRFrameState state, boolean useRelease) {
I would prefer separate LoadAcquireOp/StoreReleaseOp classes.  It makes the uses more readable.

src/gpu/hsail/vm/gpu_hsail_Tlab.hpp
  26 #define GPU_HSAIL_TLAB_HPP
To be consistent with other header defines this should be:
  26 #define GPU_HSAIL_VM_GPU_HSAIL_TLAB_HPP
It seems this needs to be changed in existing files too.

As a general comment, can we make multi-line comments C-style (/* … */) comments instead of C++ style (// …)?  The C-style comments reformat themselves automatically if changed while the C++ ones don’t.

On Jun 9, 2014, at 9:28 AM, Christian Thalinger <christian.thalinger at oracle.com> wrote:

> I have the webrev still open ;-)
> 
> On Jun 9, 2014, at 8:25 AM, Deneau, Tom <tom.deneau at amd.com> wrote:
> 
>> Hi all --
>> 
>> Has anyone had a chance to look at this webrev posted last Tuesday?
>> 
>> -- Tom
>> 
>>> -----Original Message-----
>>> From: Deneau, Tom
>>> Sent: Tuesday, June 03, 2014 4:27 PM
>>> To: graal-dev at openjdk.java.net
>>> Subject: webrev to extend hsail allocation to allow gpu to refill tlab
>>> 
>>> I have placed a webrev up at
>>> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-refill-
>>> tlab-gpu
>>> which we would like to get checked into the graal trunk.
>>> 
>>> This webrev extends the existing hsail heap allocation logic.  In the
>>> existing logic, when a workitem cannot allocate from the current tlab,
>>> it just deoptimizes.  In this webrev, we add logic to allocate a new
>>> tlab from the gpu.
>>> 
>>> The algorithm must deal with the fact that multiple hsa workitems can
>>> share a single tlab, and so multiple workitems can "overflow".  A
>>> workitem can tell if it is the "first overflower" and the first
>>> overflower is charged with getting a new tlab while the other workitems
>>> wait for the new tlab to be announced.
>>> 
>>> Workitems access a tlab thru a fixed register (sort of like a thread
>>> register) which instead of pointing to a donor thread now points to a
>>> HSAILTlabInfo structure, which is sort of a subset of a full tlab
>>> struct, containing just the fields that we would actually use on the
>>> gpu.
>>> 
>>>  struct HSAILTlabInfo {
>>>     HeapWord *  _start;                 // normal vm tlab fields,
>>> start, top, end, etc.
>>>     HeapWord *  _top;
>>>     HeapWord *  _end;
>>>     // additional data not in a normal tlab
>>>     HeapWord * _lastGoodTop;            // first overflower records
>>> this
>>>     JavaThread * _donor_thread;         // donor thread associated
>>> with this tlabInfo
>>>  }
>>> 
>>> The first overflower grabs a new tlabInfo structure and allocates a new
>>> tlab (using edenAllocate) and "publishes" the new tlabInfo for other
>>> workitems to start using.  See the routine called
>>> allocateFromTlabSlowPath in HSAILNewObjectSnippets.
>>> 
>>> Eventually when hsail function calls are supported, this slow path will
>>> not be inlined but will be called as a stub.
>>> 
>>> Other changes:
>>> 
>>>  * the allocation-related logic was removed from gpu_hsail.cpp into
>>>    gpu_hsail_tlab.hpp.  The HSAILHotSpotNmethod now keeps track of
>>>    whether a kernel uses allocation and avoids this logic if it does
>>>    not.
>>> 
>>>     * Before the kernel runs, the donor thread tlabs are used to set
>>>       up the initial tlabInfo records, and a tlab allocation is done
>>>       here if the donor thread tlab is empty.
>>> 
>>>     * When kernel is finished running, the cpu side will see a list
>>>       of one or more HSAILTlabInfos and basically postprocesses
>>>       these, fixing up any overflows and making them parsable and
>>>       copying them back to the appropriate donor thread as needed.
>>> 
>>>  * the inter-workitem communication required the use of the hsail
>>>    instructions for load_acquire and store_release from the
>>>    snippets.  The HSAILDirectLoadAcquireNode and
>>>    HSAILDirectStoreReleaseNode with associated NodeIntrinsics were
>>>    created to handle this.  A node for creating a workitemabsid
>>>    instruction was also created, it is not used in the algorithm as
>>>    such but was useful for adding debug traces.
>>> 
>>>  * In HSAILHotSpotBackend, the logic to decide whether a kernel uses
>>>    allocation or not was made more precise.  (This flag is also made
>>>    available at execute time.)  There were several atomic_add
>>>    related tests were falsely being marked as requiring
>>>    HSAILAllocation and thus HSAILDeoptimization support. This
>>>    marking was removed.
>>> 
>>> -- Tom Deneau
>> 
>