webrev to extend hsail allocation to allow gpu to refill tlab

Mon Jun 9 17:42:19 UTC 2014

OK, I will address these suggestions.

I was thinking in the future there could be other kinds of HSAIL loads we want to express thru load intrinsics, but I guess we could have separate Op classes for each one.

-- Tom

From: Christian Thalinger [mailto:christian.thalinger at oracle.com]
Sent: Monday, June 09, 2014 11:41 AM
To: Deneau, Tom
Cc: graal-dev at openjdk.java.net
Subject: Re: webrev to extend hsail allocation to allow gpu to refill tlab

graal/com.oracle.graal.hotspot/src/com/oracle/graal/hotspot/HotSpotVMConfig.java
Remove the space between the type and * for e.g. HSAILAllocationInfo *. Same in other files like src/gpu/hsail/vm/vmStructs_hsail.hpp.
graal/com.oracle.graal.lir.hsail/src/com/oracle/graal/lir/hsail/HSAILMove.java

+        public LoadOp(Kind kind, AllocatableValue result, HSAILAddressValue address, LIRFrameState state, boolean useLoadAcquire) {

+        public StoreOp(Kind kind, HSAILAddressValue address, AllocatableValue input, LIRFrameState state, boolean useRelease) {
I would prefer separate LoadAcquireOp/StoreReleaseOp classes.  It makes the uses more readable.

src/gpu/hsail/vm/gpu_hsail_Tlab.hpp

  26 #define GPU_HSAIL_TLAB_HPP
To be consistent with other header defines this should be:

  26 #define GPU_HSAIL_VM_GPU_HSAIL_TLAB_HPP
It seems this needs to be changed in existing files too.

As a general comment, can we make multi-line comments C-style (/* ... */) comments instead of C++ style (// ...)?  The C-style comments reformat themselves automatically if changed while the C++ ones don't.

On Jun 9, 2014, at 9:28 AM, Christian Thalinger <christian.thalinger at oracle.com<mailto:christian.thalinger at oracle.com>> wrote:

I have the webrev still open ;-)

On Jun 9, 2014, at 8:25 AM, Deneau, Tom <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:

Hi all --

Has anyone had a chance to look at this webrev posted last Tuesday?

-- Tom

-----Original Message-----
From: Deneau, Tom
Sent: Tuesday, June 03, 2014 4:27 PM
To: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>
Subject: webrev to extend hsail allocation to allow gpu to refill tlab

I have placed a webrev up at
http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-refill-
tlab-gpu
which we would like to get checked into the graal trunk.

This webrev extends the existing hsail heap allocation logic.  In the
existing logic, when a workitem cannot allocate from the current tlab,
it just deoptimizes.  In this webrev, we add logic to allocate a new
tlab from the gpu.

The algorithm must deal with the fact that multiple hsa workitems can
share a single tlab, and so multiple workitems can "overflow".  A
workitem can tell if it is the "first overflower" and the first
overflower is charged with getting a new tlab while the other workitems
wait for the new tlab to be announced.

Workitems access a tlab thru a fixed register (sort of like a thread
register) which instead of pointing to a donor thread now points to a
HSAILTlabInfo structure, which is sort of a subset of a full tlab
struct, containing just the fields that we would actually use on the
gpu.

 struct HSAILTlabInfo {
    HeapWord *  _start;                 // normal vm tlab fields,
start, top, end, etc.
    HeapWord *  _top;
    HeapWord *  _end;
    // additional data not in a normal tlab
    HeapWord * _lastGoodTop;            // first overflower records
this
    JavaThread * _donor_thread;         // donor thread associated
with this tlabInfo
 }

The first overflower grabs a new tlabInfo structure and allocates a new
tlab (using edenAllocate) and "publishes" the new tlabInfo for other
workitems to start using.  See the routine called
allocateFromTlabSlowPath in HSAILNewObjectSnippets.

Eventually when hsail function calls are supported, this slow path will
not be inlined but will be called as a stub.

Other changes:

 * the allocation-related logic was removed from gpu_hsail.cpp into
   gpu_hsail_tlab.hpp.  The HSAILHotSpotNmethod now keeps track of
   whether a kernel uses allocation and avoids this logic if it does
   not.

    * Before the kernel runs, the donor thread tlabs are used to set
      up the initial tlabInfo records, and a tlab allocation is done
      here if the donor thread tlab is empty.

    * When kernel is finished running, the cpu side will see a list
      of one or more HSAILTlabInfos and basically postprocesses
      these, fixing up any overflows and making them parsable and
      copying them back to the appropriate donor thread as needed.

 * the inter-workitem communication required the use of the hsail
   instructions for load_acquire and store_release from the
   snippets.  The HSAILDirectLoadAcquireNode and
   HSAILDirectStoreReleaseNode with associated NodeIntrinsics were
   created to handle this.  A node for creating a workitemabsid
   instruction was also created, it is not used in the algorithm as
   such but was useful for adding debug traces.

 * In HSAILHotSpotBackend, the logic to decide whether a kernel uses
   allocation or not was made more precise.  (This flag is also made
   available at execute time.)  There were several atomic_add
   related tests were falsely being marked as requiring
   HSAILAllocation and thus HSAILDeoptimization support. This
   marking was removed.

-- Tom Deneau