webrev for object allocation support in hsail
Christian Thalinger
christian.thalinger at oracle.com
Tue Apr 8 21:32:20 UTC 2014
On Apr 8, 2014, at 1:59 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> Hi all --
>
> I have placed a webrev up at
> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-allocation
> which we would like to get checked into the graal trunk.
>
> This consists of at least the start of support for object allocation
> in HSAIL kernels and builds off the hsail deoptimization support.
>
> Below I have described
>
> * an overview of the codepaths and data structures
> * java and hotspot source changes
>
> -- Tom
>
>
> Hsail Allocation Overview, Data Structures and Graal Options
> ============================================================
>
> The HSAIL allocation code uses TLABs but if we had a TLAB for each
> workitem, the number of TLABs would be too large. Thus multiple
> workitems can allocate from a single TLAB. To simplify TLAB
> collection by regular GC procedures, the TLABs that the HSAIL kernels
> use are still owned by regular Java threads called "donor threads".
> The graal option, -G:HsailDonorThreads controls how many such donor
> threads (and TLABs) are created and passed to the gpu.
>
> Since multiple workitems can allocate from a single tlab, the hsail
> instruction atomic_add is used to atomically get and add to the
> tlab.top pointer. If tlab.top overflows past tlab.end, the first
> overflower (who is detectable because his oldTop is still less than
> end) saves the oldTop as the "last good top". This "last good top" is
> then restored in the JVM code when the kernel dispatch finishes so the
> tlab invariants are met.
>
> This allocation logic is specified in HSAILNewObjectSnippets.java and
> in HSAILHotSpotReplacementsUtil.java and is currently implemented for
> NewInstanceNode and NewArrayNode. The dynamic flavors are not
> supported yet.
>
> Other than the special treatment of tlab.top mentioned above, the
> other logic in the fastpath allocation path inherits from its
> superclass NewObjectSnippets (formatting object, etc).
>
> If the fastpath allocation from the workitem's shared tlab fails (or
> if UseTLAB is false), by default we deoptimize to the interpreter
> using the hsail deoptimization logic. There is an additional graal
> option called HsailUseEdenAllocate which, if set to true, will first
> attempt to allocate from Eden (this eden allocation uses the hsail
> platform atomic instruction atomic_cas). While eden allocation was
> functionally correct, we saw a performance degradation using eden
> allocation compared to simply deoptimizing and so have turned it off
> by default. We may explore eden_allocation further in the future.
>
> There is an additional graal hsail allocation option which can be used
> for performance experiments. HsailAllocBytesPerWorkitem specifies how
> many bytes each workitem expects to allocate. The JVM code before
> invoking the kernel will look at the donor thread tlab free sizes and
> attempt to "close" a tlab and try to allocate a new tlab if the
> existing free space is not large enough. Behavior will be
> functionally correct regardless of this option, there just might be
> more deopts. We intend to explore other ways to reduce the
> probability of deopts.
>
>
>
> Description of source changes in this webrev.
> =============================================
> graal source changes
> ====================
>
> HSAILAssembler
> * support for emitting hsail atomic_add instruction
>
> HSAILLIRGenerator
> * implement IntegerTestBranch (unrelated to allocaton but happened to
> show up in some of the junit tests used)
>
> DonorThreadPool
> * new file for creation of array of donorthreads.
>
> HSAILHotSpotBackend
> * if kernel uses allocation, emit code to setup thread register
>
> HSAILHotSpotLoweringProvider
> * lower NewInstanceNode and NewArrayNode to relevant HSAIL snippets
> * lower AtomicGetAndAddNode
>
> HSAILHotSpotNodeLIRBuilder
> * AtomicGetAndAdd support
> * DirectCompareAndSwap support (used by edenAllocate)
>
> AtomicGetAndAddNode, LoweredAtomicGetAndAddNode, HSAILMove
> * for generating hsail atomic_add instructions (modeled
> after CompareAndSwapNode)
> * at some point in the future, we should be able to use this node
> for the j.u.c.atomic.getAndAdd, etc.
I have a patch in my pipeline for this but a recent push broke it. Will fix it and push this week.
>
> HSAILNewObjectSnippets, HSAILHotSpotReplacementsUtil
> * new files for hsail snippets code
>
> HSAIL.java
> * threadRegister defined
>
>
> hotspot source changes
> ======================
> gpu_hsail.cpp
> * logic for manipulating donor thread tlabs before and after dispatch
>
>
>
More information about the graal-dev
mailing list