webrev for object allocation support in hsail

Deneau, Tom tom.deneau at amd.com
Fri Apr 11 21:04:26 UTC 2014


Doug --

Thanks for the feedback.  These sound like good improvements.

As for running the junits with -XX:-UseHSAILDeoptimization, I have not tried that, but I am not surprised that some junits would then fail.  And yes, some of the allocation tests that have larger ranges can hit deoptimization.  We will go thru and define an appropriate  supportsRequiredCapabailities() for the ones that do depend on deoptimization.

-- Tom



> -----Original Message-----
> From: Doug Simon [mailto:doug.simon at oracle.com]
> Sent: Friday, April 11, 2014 10:18 AM
> To: Deneau, Tom
> Cc: graal-dev at openjdk.java.net
> Subject: Re: webrev for object allocation support in hsail
> 
> Tom,
> 
> I've integrated this patch after making the following changes to it:
> 
> o Donor thread pool creation uses the standard mechanism in
> java.lang.ThreadLocal for
>   lazy initialization of the value.
> 
> o The Java code passes the donor threads to C++ as java.lang.Thread[].
> The C++ code
>   extracts a JavaThread* array from this and the HSAIL kernel prologue
> accesses the latter.
>   This removes the confusing code in emitCode() to deal with array base
> offsets.
> 
> o Replaced all uses of Kind.Object with the proper word kind in
> HSAILHotSpotBackend.emitCode().
>   I know GC is currently disabled during HSAIL kernel execution but we
> should still avoid
>   using Kind.Object for anything that isn't really an Object. Are there
> other parts of the
>   HSAIL backend that need cleaning up in this respect?
> 
> o Introduced use of symbolic aliases for registers in parts of
> HSAILHotSpotBackend.emitCode()
>   to make the code easier to follow. I recommend doing this for other
> parts of the code.
> 
> o Added copyright headers to the new source files. If you run 'mx gate -
> n -j' or even just
>   'mx checkheaders' you can catch this yourself in the future. Once
> Checkstyle supports
>   JDK8, this will once again be caught as you create new files.
> 
> o Added a guarantee in Hsail::execute_kernel_void_1d_internal that
> numDonorThreads > 0 since
>   code after that will segfault otherwise.
> 
> BTW, if -XX:-UseHSAILDeoptimization is specified, a number of the HSAIL
> tests fail. Is this intended? That is, does HSAIL allocation support
> require HSAIL deopt support?
> 
> The changes I'm pushing are reflected at
> http://cr.openjdk.java.net/~dnsimon/hsail-allocation-v2/.
> 
> -Doug
> 
> On Apr 8, 2014, at 10:59 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> 
> > Hi all --
> >
> > I have placed a webrev up at
> > http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-allocat
> > ion which we would like to get checked into the graal trunk.
> >
> > This consists of at least the start of support for object allocation
> > in HSAIL kernels and builds off the hsail deoptimization support.
> >
> > Below I have described
> >
> >   * an overview of the codepaths and data structures
> >   * java and hotspot source changes
> >
> > -- Tom
> >
> >
> > Hsail Allocation Overview, Data Structures and Graal Options
> > ============================================================
> >
> > The HSAIL allocation code uses TLABs but if we had a TLAB for each
> > workitem, the number of TLABs would be too large.  Thus multiple
> > workitems can allocate from a single TLAB.  To simplify TLAB
> > collection by regular GC procedures, the TLABs that the HSAIL kernels
> > use are still owned by regular Java threads called "donor threads".
> > The graal option, -G:HsailDonorThreads controls how many such donor
> > threads (and TLABs) are created and passed to the gpu.
> >
> > Since multiple workitems can allocate from a single tlab, the hsail
> > instruction atomic_add is used to atomically get and add to the
> > tlab.top pointer.  If tlab.top overflows past tlab.end, the first
> > overflower (who is detectable because his oldTop is still less than
> > end) saves the oldTop as the "last good top".  This "last good top" is
> > then restored in the JVM code when the kernel dispatch finishes so the
> > tlab invariants are met.
> >
> > This allocation logic is specified in HSAILNewObjectSnippets.java and
> > in HSAILHotSpotReplacementsUtil.java and is currently implemented for
> > NewInstanceNode and NewArrayNode.  The dynamic flavors are not
> > supported yet.
> >
> > Other than the special treatment of tlab.top mentioned above, the
> > other logic in the fastpath allocation path inherits from its
> > superclass NewObjectSnippets (formatting object, etc).
> >
> > If the fastpath allocation from the workitem's shared tlab fails (or
> > if UseTLAB is false), by default we deoptimize to the interpreter
> > using the hsail deoptimization logic.  There is an additional graal
> > option called HsailUseEdenAllocate which, if set to true, will first
> > attempt to allocate from Eden (this eden allocation uses the hsail
> > platform atomic instruction atomic_cas).  While eden allocation was
> > functionally correct, we saw a performance degradation using eden
> > allocation compared to simply deoptimizing and so have turned it off
> > by default.  We may explore eden_allocation further in the future.
> >
> > There is an additional graal hsail allocation option which can be used
> > for performance experiments.  HsailAllocBytesPerWorkitem specifies how
> > many bytes each workitem expects to allocate.  The JVM code before
> > invoking the kernel will look at the donor thread tlab free sizes and
> > attempt to "close" a tlab and try to allocate a new tlab if the
> > existing free space is not large enough.  Behavior will be
> > functionally correct regardless of this option, there just might be
> > more deopts.  We intend to explore other ways to reduce the
> > probability of deopts.
> >
> >
> >
> > Description of source changes in this webrev.
> > =============================================
> > graal source changes
> > ====================
> >
> > HSAILAssembler
> >   * support for emitting hsail atomic_add instruction
> >
> > HSAILLIRGenerator
> >   * implement IntegerTestBranch (unrelated to allocaton but happened
> to
> >     show up in some of the junit tests used)
> >
> > DonorThreadPool
> >   * new file for creation of array of donorthreads.
> >
> > HSAILHotSpotBackend
> >   * if kernel uses allocation, emit code to setup thread register
> >
> > HSAILHotSpotLoweringProvider
> >   * lower NewInstanceNode and NewArrayNode to relevant HSAIL snippets
> >   * lower AtomicGetAndAddNode
> >
> > HSAILHotSpotNodeLIRBuilder
> >   * AtomicGetAndAdd support
> >   * DirectCompareAndSwap support (used by edenAllocate)
> >
> > AtomicGetAndAddNode, LoweredAtomicGetAndAddNode, HSAILMove
> >   * for generating hsail atomic_add instructions (modeled
> >     after CompareAndSwapNode)
> >   * at some point in the future, we should be able to use this node
> >     for the j.u.c.atomic.getAndAdd, etc.
> >
> > HSAILNewObjectSnippets, HSAILHotSpotReplacementsUtil
> >   * new files for hsail snippets code
> >
> > HSAIL.java
> >   * threadRegister defined
> >
> >
> > hotspot source changes
> > ======================
> > gpu_hsail.cpp
> >   * logic for manipulating donor thread tlabs before and after
> > dispatch
> >
> >
> >



More information about the graal-dev mailing list