webrev for object allocation support in hsail
Deneau, Tom
tom.deneau at amd.com
Tue Apr 8 20:59:11 UTC 2014
Hi all --
I have placed a webrev up at
http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-allocation
which we would like to get checked into the graal trunk.
This consists of at least the start of support for object allocation
in HSAIL kernels and builds off the hsail deoptimization support.
Below I have described
* an overview of the codepaths and data structures
* java and hotspot source changes
-- Tom
Hsail Allocation Overview, Data Structures and Graal Options
============================================================
The HSAIL allocation code uses TLABs but if we had a TLAB for each
workitem, the number of TLABs would be too large. Thus multiple
workitems can allocate from a single TLAB. To simplify TLAB
collection by regular GC procedures, the TLABs that the HSAIL kernels
use are still owned by regular Java threads called "donor threads".
The graal option, -G:HsailDonorThreads controls how many such donor
threads (and TLABs) are created and passed to the gpu.
Since multiple workitems can allocate from a single tlab, the hsail
instruction atomic_add is used to atomically get and add to the
tlab.top pointer. If tlab.top overflows past tlab.end, the first
overflower (who is detectable because his oldTop is still less than
end) saves the oldTop as the "last good top". This "last good top" is
then restored in the JVM code when the kernel dispatch finishes so the
tlab invariants are met.
This allocation logic is specified in HSAILNewObjectSnippets.java and
in HSAILHotSpotReplacementsUtil.java and is currently implemented for
NewInstanceNode and NewArrayNode. The dynamic flavors are not
supported yet.
Other than the special treatment of tlab.top mentioned above, the
other logic in the fastpath allocation path inherits from its
superclass NewObjectSnippets (formatting object, etc).
If the fastpath allocation from the workitem's shared tlab fails (or
if UseTLAB is false), by default we deoptimize to the interpreter
using the hsail deoptimization logic. There is an additional graal
option called HsailUseEdenAllocate which, if set to true, will first
attempt to allocate from Eden (this eden allocation uses the hsail
platform atomic instruction atomic_cas). While eden allocation was
functionally correct, we saw a performance degradation using eden
allocation compared to simply deoptimizing and so have turned it off
by default. We may explore eden_allocation further in the future.
There is an additional graal hsail allocation option which can be used
for performance experiments. HsailAllocBytesPerWorkitem specifies how
many bytes each workitem expects to allocate. The JVM code before
invoking the kernel will look at the donor thread tlab free sizes and
attempt to "close" a tlab and try to allocate a new tlab if the
existing free space is not large enough. Behavior will be
functionally correct regardless of this option, there just might be
more deopts. We intend to explore other ways to reduce the
probability of deopts.
Description of source changes in this webrev.
=============================================
graal source changes
====================
HSAILAssembler
* support for emitting hsail atomic_add instruction
HSAILLIRGenerator
* implement IntegerTestBranch (unrelated to allocaton but happened to
show up in some of the junit tests used)
DonorThreadPool
* new file for creation of array of donorthreads.
HSAILHotSpotBackend
* if kernel uses allocation, emit code to setup thread register
HSAILHotSpotLoweringProvider
* lower NewInstanceNode and NewArrayNode to relevant HSAIL snippets
* lower AtomicGetAndAddNode
HSAILHotSpotNodeLIRBuilder
* AtomicGetAndAdd support
* DirectCompareAndSwap support (used by edenAllocate)
AtomicGetAndAddNode, LoweredAtomicGetAndAddNode, HSAILMove
* for generating hsail atomic_add instructions (modeled
after CompareAndSwapNode)
* at some point in the future, we should be able to use this node
for the j.u.c.atomic.getAndAdd, etc.
HSAILNewObjectSnippets, HSAILHotSpotReplacementsUtil
* new files for hsail snippets code
HSAIL.java
* threadRegister defined
hotspot source changes
======================
gpu_hsail.cpp
* logic for manipulating donor thread tlabs before and after dispatch
More information about the graal-dev
mailing list