question on TLABs

Tue Jan 21 14:39:15 PST 2014

On Jan 21, 2014, at 1:51 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> A hotspot-related question about java heap buffers...
> 
> A few months ago we had experimented with allowing the hsail kernel to do java heap allocations.
> At that time, we had one or more inactive "donor threads" which were normal Java threads
> who didn't do any allocations but their TLABs were used by the hsail kernel.  (A single
> TLAB can be shared among multiple workitems).
> 
> Since that time our hsail runtime routines have been more tightly integrated into the JVM
> (as opposed to going thru JNI as they used to) so that gives us more flexibility in this area.
> 
> My question is, what is the preferred way to get some heap memory that the hsail kernel
> can allocate into?  Do we need to have a do-nothing java thread backing up each flab?
> Can the GC be made aware of TLABs that are not associated with threads?

It should be relatively easy to create a list of TLABs which aren’t associated with Java threads.  ThreadLocalAllocBuffer contains most of the machinery for managing them so you could create a C++ object containing one of those that can be used from the GPU.  Presumably you’d need more than one so you can just have a global list of those.  CollectedHeap::ensure_parsability would need to be updated to walk over them as is done for the threads before a GC so they behave like regular TLABs.  You’ll also need to refactor CollectedHeap::allocate_from_tlab_slow to build the refill machinery for them.  Presumably you have some mechanism for calling from the GPU to get them refilled already?

The real question is whether there’s something to be gained from doing this.  Apart from the vague sense that it’s not very clean to use donor threads, disassociating them from threads may make things inside the VM a little more confusing.  There’s some logging and tracing machinery that knows about TLABs and creating a new source of TLABs could be a little confusing.  Using a donor thread nicely segregates things from the regular Java threads which may make sense.  You might also want different TLAB sizing with GPU allocation so reusing TLABs may not be the best way to go.  Much of that could probably be handled with a refactoring of the TLAB code to generalize it a bit.

My inclination would be to hold off on changing how TLABs are managed until the needs of GPU Java allocation are understood a little better but maybe things are clear enough?

tom

> 
> -- Tom
>