webrev to extend hsail allocation to allow gpu to refill tlab
Deneau, Tom
tom.deneau at amd.com
Tue Jun 3 21:27:22 UTC 2014
I have placed a webrev up at
http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-refill-tlab-gpu
which we would like to get checked into the graal trunk.
This webrev extends the existing hsail heap allocation logic. In the
existing logic, when a workitem cannot allocate from the current tlab,
it just deoptimizes. In this webrev, we add logic to allocate a new
tlab from the gpu.
The algorithm must deal with the fact that multiple hsa workitems can
share a single tlab, and so multiple workitems can "overflow". A
workitem can tell if it is the "first overflower" and the first
overflower is charged with getting a new tlab while the other
workitems wait for the new tlab to be announced.
Workitems access a tlab thru a fixed register (sort of like a thread
register) which instead of pointing to a donor thread now points to a
HSAILTlabInfo structure, which is sort of a subset of a full tlab
struct, containing just the fields that we would actually use on the
gpu.
struct HSAILTlabInfo {
HeapWord * _start; // normal vm tlab fields, start, top, end, etc.
HeapWord * _top;
HeapWord * _end;
// additional data not in a normal tlab
HeapWord * _lastGoodTop; // first overflower records this
JavaThread * _donor_thread; // donor thread associated with this tlabInfo
}
The first overflower grabs a new tlabInfo structure and allocates a
new tlab (using edenAllocate) and "publishes" the new tlabInfo for
other workitems to start using. See the routine called
allocateFromTlabSlowPath in HSAILNewObjectSnippets.
Eventually when hsail function calls are supported, this slow path
will not be inlined but will be called as a stub.
Other changes:
* the allocation-related logic was removed from gpu_hsail.cpp into
gpu_hsail_tlab.hpp. The HSAILHotSpotNmethod now keeps track of
whether a kernel uses allocation and avoids this logic if it does
not.
* Before the kernel runs, the donor thread tlabs are used to set
up the initial tlabInfo records, and a tlab allocation is done
here if the donor thread tlab is empty.
* When kernel is finished running, the cpu side will see a list
of one or more HSAILTlabInfos and basically postprocesses
these, fixing up any overflows and making them parsable and
copying them back to the appropriate donor thread as needed.
* the inter-workitem communication required the use of the hsail
instructions for load_acquire and store_release from the
snippets. The HSAILDirectLoadAcquireNode and
HSAILDirectStoreReleaseNode with associated NodeIntrinsics were
created to handle this. A node for creating a workitemabsid
instruction was also created, it is not used in the algorithm as
such but was useful for adding debug traces.
* In HSAILHotSpotBackend, the logic to decide whether a kernel uses
allocation or not was made more precise. (This flag is also made
available at execute time.) There were several atomic_add
related tests were falsely being marked as requiring
HSAILAllocation and thus HSAILDeoptimization support. This
marking was removed.
-- Tom Deneau
More information about the graal-dev
mailing list