webrev for object allocation support in hsail

Doug Simon doug.simon at oracle.com
Fri Apr 11 15:18:22 UTC 2014


I’ve integrated this patch after making the following changes to it:

o Donor thread pool creation uses the standard mechanism in java.lang.ThreadLocal for
  lazy initialization of the value.

o The Java code passes the donor threads to C++ as java.lang.Thread[]. The C++ code
  extracts a JavaThread* array from this and the HSAIL kernel prologue accesses the latter.
  This removes the confusing code in emitCode() to deal with array base offsets.

o Replaced all uses of Kind.Object with the proper word kind in HSAILHotSpotBackend.emitCode().
  I know GC is currently disabled during HSAIL kernel execution but we should still avoid
  using Kind.Object for anything that isn’t really an Object. Are there other parts of the
  HSAIL backend that need cleaning up in this respect?

o Introduced use of symbolic aliases for registers in parts of HSAILHotSpotBackend.emitCode()
  to make the code easier to follow. I recommend doing this for other parts of the code.

o Added copyright headers to the new source files. If you run ‘mx gate -n -j’ or even just
  ‘mx checkheaders’ you can catch this yourself in the future. Once Checkstyle supports
  JDK8, this will once again be caught as you create new files.

o Added a guarantee in Hsail::execute_kernel_void_1d_internal that numDonorThreads > 0 since
  code after that will segfault otherwise.

BTW, if -XX:-UseHSAILDeoptimization is specified, a number of the HSAIL tests fail. Is this
intended? That is, does HSAIL allocation support require HSAIL deopt support?

The changes I’m pushing are reflected at http://cr.openjdk.java.net/~dnsimon/hsail-allocation-v2/.


On Apr 8, 2014, at 10:59 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> Hi all --
> I have placed a webrev up at 
> http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-allocation 
> which we would like to get checked into the graal trunk.
> This consists of at least the start of support for object allocation
> in HSAIL kernels and builds off the hsail deoptimization support.
> Below I have described
>   * an overview of the codepaths and data structures
>   * java and hotspot source changes
> -- Tom
> Hsail Allocation Overview, Data Structures and Graal Options
> ============================================================
> The HSAIL allocation code uses TLABs but if we had a TLAB for each
> workitem, the number of TLABs would be too large.  Thus multiple
> workitems can allocate from a single TLAB.  To simplify TLAB
> collection by regular GC procedures, the TLABs that the HSAIL kernels
> use are still owned by regular Java threads called "donor threads".
> The graal option, -G:HsailDonorThreads controls how many such donor
> threads (and TLABs) are created and passed to the gpu.
> Since multiple workitems can allocate from a single tlab, the hsail
> instruction atomic_add is used to atomically get and add to the
> tlab.top pointer.  If tlab.top overflows past tlab.end, the first
> overflower (who is detectable because his oldTop is still less than
> end) saves the oldTop as the "last good top".  This "last good top" is
> then restored in the JVM code when the kernel dispatch finishes so the
> tlab invariants are met.
> This allocation logic is specified in HSAILNewObjectSnippets.java and
> in HSAILHotSpotReplacementsUtil.java and is currently implemented for
> NewInstanceNode and NewArrayNode.  The dynamic flavors are not
> supported yet.
> Other than the special treatment of tlab.top mentioned above, the
> other logic in the fastpath allocation path inherits from its
> superclass NewObjectSnippets (formatting object, etc).
> If the fastpath allocation from the workitem's shared tlab fails (or
> if UseTLAB is false), by default we deoptimize to the interpreter
> using the hsail deoptimization logic.  There is an additional graal
> option called HsailUseEdenAllocate which, if set to true, will first
> attempt to allocate from Eden (this eden allocation uses the hsail
> platform atomic instruction atomic_cas).  While eden allocation was
> functionally correct, we saw a performance degradation using eden
> allocation compared to simply deoptimizing and so have turned it off
> by default.  We may explore eden_allocation further in the future.
> There is an additional graal hsail allocation option which can be used
> for performance experiments.  HsailAllocBytesPerWorkitem specifies how
> many bytes each workitem expects to allocate.  The JVM code before
> invoking the kernel will look at the donor thread tlab free sizes and
> attempt to "close" a tlab and try to allocate a new tlab if the
> existing free space is not large enough.  Behavior will be
> functionally correct regardless of this option, there just might be
> more deopts.  We intend to explore other ways to reduce the
> probability of deopts.
> Description of source changes in this webrev.
> ============================================= 
> graal source changes
> ====================
> HSAILAssembler
>   * support for emitting hsail atomic_add instruction
> HSAILLIRGenerator
>   * implement IntegerTestBranch (unrelated to allocaton but happened to
>     show up in some of the junit tests used)
> DonorThreadPool
>   * new file for creation of array of donorthreads.
> HSAILHotSpotBackend
>   * if kernel uses allocation, emit code to setup thread register
> HSAILHotSpotLoweringProvider
>   * lower NewInstanceNode and NewArrayNode to relevant HSAIL snippets
>   * lower AtomicGetAndAddNode
> HSAILHotSpotNodeLIRBuilder
>   * AtomicGetAndAdd support
>   * DirectCompareAndSwap support (used by edenAllocate)
> AtomicGetAndAddNode, LoweredAtomicGetAndAddNode, HSAILMove
>   * for generating hsail atomic_add instructions (modeled
>     after CompareAndSwapNode)
>   * at some point in the future, we should be able to use this node
>     for the j.u.c.atomic.getAndAdd, etc.
> HSAILNewObjectSnippets, HSAILHotSpotReplacementsUtil
>   * new files for hsail snippets code
> HSAIL.java
>   * threadRegister defined
> hotspot source changes
> ======================
> gpu_hsail.cpp
>   * logic for manipulating donor thread tlabs before and after dispatch

More information about the graal-dev mailing list