RFC: TLAB allocation and garbage-first policy
Zhengyu Gu
zgu at redhat.com
Wed Sep 20 12:18:11 UTC 2017
> Now to the fun part about our collection policy. Our collector policy selects the regions by least
> garbage, where garbage = used - live. So, if you have the fragmented 16M region with used=128K, and
> live=128K, it is exactly 0K garbage -- the least probable candidate. So the region that become
> fragmented due to the race in TLAB machinery is also never considered for collection, because it is
> below the ShenandoahGarbageThreshold!
Should this region be added back to free set after GC and be reused?
-Zhengyu
>
> This race further widens when we bias the TLAB and GCLAB allocations to different sides of the heap,
> and GCLABs take the most hit. You can clearly see the anomaly in Visualizer after 10+ minutes of
> LRUFragger run with 50 GB LDS on 100 GB heap (...and it drives into Full GC shortly afterwards,
> because free set got depleted due to fragmentation!):
> http://cr.openjdk.java.net/~shade/shenandoah/wip-tlab-race/baseline-1.png
>
> Therefore, I propose we choose the regions by *live size*, not by *garbage*, so that we can recover
> by collecting (and evacuating) the regions with low live, not exactly with high garbage. This should
> help to recuperate from TLAB losses better. For full regions, both metrics yield the same result.
> For half-full regions, we would have a chance to compact them into mostly-full, leaving more
> fully-empty regions around.
>
> I mused about this on IRC yesterday, and today I see G1 does the same, see
> CollectionSetChooser::should_add:
>
> bool should_add(HeapRegion* hr) {
> assert(hr->is_marked(), "pre-condition");
> assert(!hr->is_young(), "should never consider young regions");
> return !hr->is_pinned() &&
> hr->live_bytes() < _region_live_threshold_bytes; // <----- here
> }
>
> ...and probably with the same rationale? Found these bugs:
> https://bugs.openjdk.java.net/browse/JDK-7132029
> https://bugs.openjdk.java.net/browse/JDK-7146242
>
> Prototype fix:
>
> virtual bool region_in_collection_set(ShenandoahHeapRegion* r, size_t immediate_garbage) {
> size_t threshold = ShenandoahHeapRegion::region_size_bytes() * ShenandoahGarbageThreshold / 100;
> - return r->garbage() > threshold;
> + if (UseNewCode) {
> + return (ShenandoahHeapRegion::region_size_bytes() - r->get_live_data_bytes()) > threshold;
> + } else {
> + return r->garbage() > threshold;
> + }
> }
>
> ...makes the issue disappear on the same workload running for 30+ minutes (and no Full GCs!):
> http://cr.openjdk.java.net/~shade/shenandoah/wip-tlab-race/patched-1.png
>
> Thoughts?
>
> Thanks,
> -Aleksey
>
>
More information about the shenandoah-dev
mailing list