RFC: TLAB size flapping

Tue Dec 6 17:26:27 UTC 2016

Am Dienstag, den 06.12.2016, 18:17 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> So, if you run allocation tests under -Xlog:gc+tlab, then a funny
> story unfolds.
> The interesting piece of code is below, it is polled by TLAB
> allocation
> machinery to figure what is the max TLAB allocatable without hassle:
> 
> size_t  ShenandoahHeap::unsafe_max_tlab_alloc(Thread *thread) const {
>   size_t idx = _free_regions->current_index();
>   ShenandoahHeapRegion* current = _free_regions->get(idx);
>   if (current == NULL) {
>     return 0;
>   } else if (current->free() > MinTLABSize) {
>     return current->free();
>   } else {
>     return MinTLABSize;
>   }
> }
> 
> This what happens next:
> 
> // Step 1: TLAB request for allocating, polling Shenandoah about the
> next free
> // region. Shenandoah replies there is a current free region with 256
> words
> // busy (hm!). Okay, we claim the rest of the region for a TLAB then.
> [2.328s][trace][gc,tlab] TLAB: fill thread: 0x00007ffb54594800 ...
> [2.328s][trace][gc,tlab] ShenandoahHeap::unsafe_max_tlab_alloc:
> region = 1019,
> capacity = 524288, used = 256, free = 524032
> [2.328s][trace][gc,tlab] ThreadLocalAllocBuffer::compute_size(3)
> returns 524032
> [2.328s][trace][gc,tlab] allocating new tlab of size 524032 at addr
> 0x00000006bec00800
> 
> // Step 2: Another TLAB request. No more space in current region. But
> yeah, we
> // return MinTLABSize (those 256 words!), and shared infra moves on,
> asking us
> // to allocate a new TLAB of 256 words. Now, the current region is
> depleted, so
> // we allocate those 256 words in the *next* region.
> [2.328s][trace][gc,tlab] TLAB: fill thread: 0x00007ffb54594800 ...
> [2.329s][trace][gc,tlab] ShenandoahHeap::unsafe_max_tlab_alloc:
> (failing) region
> = 1019, capacity = 524288, used = 524288, free = 0
> [2.329s][trace][gc,tlab] ThreadLocalAllocBuffer::compute_size(3)
> returns 256
> [2.329s][trace][gc,tlab] allocating new tlab of size 256 at addr
> 0x00000006bf000000
> 
> // Step 1 again. The cycle continues. Another TLAB request, current
> region has
> // 256 words used, claim the rest... goes on and on.
> [2.329s][trace][gc,tlab] TLAB: fill thread: 0x00007ffb54594800 ...
> [2.329s][trace][gc,tlab] ShenandoahHeap::unsafe_max_tlab_alloc:
> region = 1020,
> capacity = 524288, used = 256, free = 524032
> [2.329s][trace][gc,tlab] ThreadLocalAllocBuffer::compute_size(3)
> returns 524032
> [2.329s][trace][gc,tlab] allocating new tlab of size 524032 at addr
> 0x00000006bf000800
> 
> So, this flaps TLAB allocations between the region size and
> MinTLABSize. Oops!

Oops indeed! :-)

> We enter the slow path *twice* per region, instead of doing it once.
> I think
> returning MinTLABSize is wrong in the code above, and we have two
> options:
>   a) Return 0 on MinTLABSize branch. If I read the code right, this
> will bail us
> from TLAB allocation path, which is undesireable;
>   b) Advance to the next free region, and try to poll its free().

Hmm, a seems undesirable. Do we really need to advance to next region?
Can't we simply return region-size here? I mean, it is inherently racy
and it doesn't matter if we advance right now, or a little later when
trying to allocate. Returning X here doesn't somehow magically
guarantee that we can later allocate X without skipping to next region.
Unless it's somehow done atomically. Which we don't. (Shenandoah does
lock-free allocations, maybe other GCs are better off because they
allocate under Heap_lock?)

Roman