RFC: TLAB allocation and garbage-first policy

Wed Sep 20 12:09:44 UTC 2017

Changing our heuristics to compacting mostly empty regions to solve a
one-of sparse region due to a race condition doesn't make sense to me.  The
common case would be compacting a bunch of live data from a bunch of usable
regions for no reason.

Christine

On Wed, Sep 20, 2017 at 7:04 AM, Roman Kennke <roman at kennke.org> wrote:

> Am 20.09.2017 um 12:10 schrieb Aleksey Shipilev:
>
>> Hi,
>>
>> We do have a problem on the edge of TLAB allocation machinery and
>> garbage-first collection policy.
>> Current TLAB machinery in Hotspot polls the GC about the next TLAB size
>> with
>> CollectedHeap::unsafe_max_tlab_alloc. The trouble is that call is
>> inherently racy, and is expected
>> to provide the best guess that GC can do under the circumstances.
>>
>> The race unfolds like this:
>>   1. <Thread 1> GC looks around and realizes there is a fully-empty
>> region in the allocation list,
>> and happily reports $region_size to VM
>>   2. <Thread 2> Another allocation comes and does the
>> smaller-than-region-size allocation in that
>> fully-empty region.
>>   3. <Thread 1> Resumes, tries to allocate full TLAB, but the region that
>> was deemed empty at step 1
>> is fragmented already, so it retires the region from the freeset (it does
>> so to optimize allocation
>> path), and proceeds to allocate at the one past that.
>>
>> End result: we have the fragmented region that gets immediately retired
>> and becomes not available
>> for any further allocations. Granted, the actual issue is the raciness in
>> TLAB machinery.
>> Unfortunately, it would be hard to fix without redoing CollectedHeap API.
>>
>> Now to the fun part about our collection policy. Our collector policy
>> selects the regions by least
>> garbage, where garbage = used - live. So, if you have the fragmented 16M
>> region with used=128K, and
>> live=128K, it is exactly 0K garbage -- the least probable candidate. So
>> the region that become
>> fragmented due to the race in TLAB machinery is also never considered for
>> collection, because it is
>> below the ShenandoahGarbageThreshold!
>>
>> This race further widens when we bias the TLAB and GCLAB allocations to
>> different sides of the heap,
>> and GCLABs take the most hit. You can clearly see the anomaly in
>> Visualizer after 10+ minutes of
>> LRUFragger run with 50 GB LDS on 100 GB heap (...and it drives into Full
>> GC shortly afterwards,
>> because free set got depleted due to fragmentation!):
>>    http://cr.openjdk.java.net/~shade/shenandoah/wip-tlab-race/
>> baseline-1.png
>>
>> Therefore, I propose we choose the regions by *live size*, not by
>> *garbage*, so that we can recover
>> by collecting (and evacuating) the regions with low live, not exactly
>> with high garbage. This should
>> help to recuperate from TLAB losses better. For full regions, both
>> metrics yield the same result.
>> For half-full regions, we would have a chance to compact them into
>> mostly-full, leaving more
>> fully-empty regions around.
>>
>> I mused about this on IRC yesterday, and today I see G1 does the same, see
>> CollectionSetChooser::should_add:
>>
>>    bool should_add(HeapRegion* hr) {
>>      assert(hr->is_marked(), "pre-condition");
>>      assert(!hr->is_young(), "should never consider young regions");
>>      return !hr->is_pinned() &&
>>              hr->live_bytes() < _region_live_threshold_bytes;  // <-----
>> here
>>    }
>>
>> ...and probably with the same rationale? Found these bugs:
>>    https://bugs.openjdk.java.net/browse/JDK-7132029
>>    https://bugs.openjdk.java.net/browse/JDK-7146242
>>
>> Prototype fix:
>>
>>    virtual bool region_in_collection_set(ShenandoahHeapRegion* r, size_t
>> immediate_garbage) {
>>      size_t threshold = ShenandoahHeapRegion::region_size_bytes() *
>> ShenandoahGarbageThreshold / 100;
>> -   return r->garbage() > threshold;
>> +   if (UseNewCode) {
>> +     return (ShenandoahHeapRegion::region_size_bytes() -
>> r->get_live_data_bytes()) > threshold;
>> +   } else {
>> +     return r->garbage() > threshold;
>> +   }
>>    }
>>
>> ...makes the issue disappear on the same workload running for 30+ minutes
>> (and no Full GCs!):
>>   http://cr.openjdk.java.net/~shade/shenandoah/wip-tlab-race/
>> patched-1.png
>>
>> Thoughts?
>>
>> Thanks,
>> -Aleksey
>>
>>
>>
>> Sounds reasonable.
>
>