RFC: TLAB allocation and garbage-first policy

Wed Sep 20 11:04:12 UTC 2017

Am 20.09.2017 um 12:10 schrieb Aleksey Shipilev:
> Hi,
>
> We do have a problem on the edge of TLAB allocation machinery and garbage-first collection policy.
> Current TLAB machinery in Hotspot polls the GC about the next TLAB size with
> CollectedHeap::unsafe_max_tlab_alloc. The trouble is that call is inherently racy, and is expected
> to provide the best guess that GC can do under the circumstances.
>
> The race unfolds like this:
>   1. <Thread 1> GC looks around and realizes there is a fully-empty region in the allocation list,
> and happily reports $region_size to VM
>   2. <Thread 2> Another allocation comes and does the smaller-than-region-size allocation in that
> fully-empty region.
>   3. <Thread 1> Resumes, tries to allocate full TLAB, but the region that was deemed empty at step 1
> is fragmented already, so it retires the region from the freeset (it does so to optimize allocation
> path), and proceeds to allocate at the one past that.
>
> End result: we have the fragmented region that gets immediately retired and becomes not available
> for any further allocations. Granted, the actual issue is the raciness in TLAB machinery.
> Unfortunately, it would be hard to fix without redoing CollectedHeap API.
>
> Now to the fun part about our collection policy. Our collector policy selects the regions by least
> garbage, where garbage = used - live. So, if you have the fragmented 16M region with used=128K, and
> live=128K, it is exactly 0K garbage -- the least probable candidate. So the region that become
> fragmented due to the race in TLAB machinery is also never considered for collection, because it is
> below the ShenandoahGarbageThreshold!
>
> This race further widens when we bias the TLAB and GCLAB allocations to different sides of the heap,
> and GCLABs take the most hit. You can clearly see the anomaly in Visualizer after 10+ minutes of
> LRUFragger run with 50 GB LDS on 100 GB heap (...and it drives into Full GC shortly afterwards,
> because free set got depleted due to fragmentation!):
>    http://cr.openjdk.java.net/~shade/shenandoah/wip-tlab-race/baseline-1.png
>
> Therefore, I propose we choose the regions by *live size*, not by *garbage*, so that we can recover
> by collecting (and evacuating) the regions with low live, not exactly with high garbage. This should
> help to recuperate from TLAB losses better. For full regions, both metrics yield the same result.
> For half-full regions, we would have a chance to compact them into mostly-full, leaving more
> fully-empty regions around.
>
> I mused about this on IRC yesterday, and today I see G1 does the same, see
> CollectionSetChooser::should_add:
>
>    bool should_add(HeapRegion* hr) {
>      assert(hr->is_marked(), "pre-condition");
>      assert(!hr->is_young(), "should never consider young regions");
>      return !hr->is_pinned() &&
>              hr->live_bytes() < _region_live_threshold_bytes;  // <----- here
>    }
>
> ...and probably with the same rationale? Found these bugs:
>    https://bugs.openjdk.java.net/browse/JDK-7132029
>    https://bugs.openjdk.java.net/browse/JDK-7146242
>
> Prototype fix:
>
>    virtual bool region_in_collection_set(ShenandoahHeapRegion* r, size_t immediate_garbage) {
>      size_t threshold = ShenandoahHeapRegion::region_size_bytes() * ShenandoahGarbageThreshold / 100;
> -   return r->garbage() > threshold;
> +   if (UseNewCode) {
> +     return (ShenandoahHeapRegion::region_size_bytes() - r->get_live_data_bytes()) > threshold;
> +   } else {
> +     return r->garbage() > threshold;
> +   }
>    }
>
> ...makes the issue disappear on the same workload running for 30+ minutes (and no Full GCs!):
>   http://cr.openjdk.java.net/~shade/shenandoah/wip-tlab-race/patched-1.png
>
> Thoughts?
>
> Thanks,
> -Aleksey
>
>
>
Sounds reasonable.