webrev for object allocation support in hsail

Christian Thalinger christian.thalinger at oracle.com
Tue Apr 8 21:32:20 UTC 2014


On Apr 8, 2014, at 1:59 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> Hi all --
> 
> I have placed a webrev up at 
> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-allocation 
> which we would like to get checked into the graal trunk.
> 
> This consists of at least the start of support for object allocation
> in HSAIL kernels and builds off the hsail deoptimization support.
> 
> Below I have described
> 
>   * an overview of the codepaths and data structures
>   * java and hotspot source changes
> 
> -- Tom
> 
> 
> Hsail Allocation Overview, Data Structures and Graal Options
> ============================================================
> 
> The HSAIL allocation code uses TLABs but if we had a TLAB for each
> workitem, the number of TLABs would be too large.  Thus multiple
> workitems can allocate from a single TLAB.  To simplify TLAB
> collection by regular GC procedures, the TLABs that the HSAIL kernels
> use are still owned by regular Java threads called "donor threads".
> The graal option, -G:HsailDonorThreads controls how many such donor
> threads (and TLABs) are created and passed to the gpu.
> 
> Since multiple workitems can allocate from a single tlab, the hsail
> instruction atomic_add is used to atomically get and add to the
> tlab.top pointer.  If tlab.top overflows past tlab.end, the first
> overflower (who is detectable because his oldTop is still less than
> end) saves the oldTop as the "last good top".  This "last good top" is
> then restored in the JVM code when the kernel dispatch finishes so the
> tlab invariants are met.
> 
> This allocation logic is specified in HSAILNewObjectSnippets.java and
> in HSAILHotSpotReplacementsUtil.java and is currently implemented for
> NewInstanceNode and NewArrayNode.  The dynamic flavors are not
> supported yet.
> 
> Other than the special treatment of tlab.top mentioned above, the
> other logic in the fastpath allocation path inherits from its
> superclass NewObjectSnippets (formatting object, etc).
> 
> If the fastpath allocation from the workitem's shared tlab fails (or
> if UseTLAB is false), by default we deoptimize to the interpreter
> using the hsail deoptimization logic.  There is an additional graal
> option called HsailUseEdenAllocate which, if set to true, will first
> attempt to allocate from Eden (this eden allocation uses the hsail
> platform atomic instruction atomic_cas).  While eden allocation was
> functionally correct, we saw a performance degradation using eden
> allocation compared to simply deoptimizing and so have turned it off
> by default.  We may explore eden_allocation further in the future.
> 
> There is an additional graal hsail allocation option which can be used
> for performance experiments.  HsailAllocBytesPerWorkitem specifies how
> many bytes each workitem expects to allocate.  The JVM code before
> invoking the kernel will look at the donor thread tlab free sizes and
> attempt to "close" a tlab and try to allocate a new tlab if the
> existing free space is not large enough.  Behavior will be
> functionally correct regardless of this option, there just might be
> more deopts.  We intend to explore other ways to reduce the
> probability of deopts.
> 
> 
> 
> Description of source changes in this webrev.
> ============================================= 
> graal source changes
> ====================
> 
> HSAILAssembler
>   * support for emitting hsail atomic_add instruction
> 
> HSAILLIRGenerator
>   * implement IntegerTestBranch (unrelated to allocaton but happened to
>     show up in some of the junit tests used)
> 
> DonorThreadPool
>   * new file for creation of array of donorthreads.
> 
> HSAILHotSpotBackend
>   * if kernel uses allocation, emit code to setup thread register
> 
> HSAILHotSpotLoweringProvider
>   * lower NewInstanceNode and NewArrayNode to relevant HSAIL snippets
>   * lower AtomicGetAndAddNode
> 
> HSAILHotSpotNodeLIRBuilder
>   * AtomicGetAndAdd support
>   * DirectCompareAndSwap support (used by edenAllocate)
> 
> AtomicGetAndAddNode, LoweredAtomicGetAndAddNode, HSAILMove
>   * for generating hsail atomic_add instructions (modeled
>     after CompareAndSwapNode)
>   * at some point in the future, we should be able to use this node
>     for the j.u.c.atomic.getAndAdd, etc.

I have a patch in my pipeline for this but a recent push broke it.  Will fix it and push this week.

> 
> HSAILNewObjectSnippets, HSAILHotSpotReplacementsUtil
>   * new files for hsail snippets code
> 
> HSAIL.java
>   * threadRegister defined
> 
> 
> hotspot source changes
> ======================
> gpu_hsail.cpp
>   * logic for manipulating donor thread tlabs before and after dispatch
> 
> 
> 



More information about the graal-dev mailing list