RFC: TLAB allocation and garbage-first policy

Wed Sep 20 10:10:39 UTC 2017

Hi,

We do have a problem on the edge of TLAB allocation machinery and garbage-first collection policy.
Current TLAB machinery in Hotspot polls the GC about the next TLAB size with
CollectedHeap::unsafe_max_tlab_alloc. The trouble is that call is inherently racy, and is expected
to provide the best guess that GC can do under the circumstances.

The race unfolds like this:
 1. <Thread 1> GC looks around and realizes there is a fully-empty region in the allocation list,
and happily reports $region_size to VM
 2. <Thread 2> Another allocation comes and does the smaller-than-region-size allocation in that
fully-empty region.
 3. <Thread 1> Resumes, tries to allocate full TLAB, but the region that was deemed empty at step 1
is fragmented already, so it retires the region from the freeset (it does so to optimize allocation
path), and proceeds to allocate at the one past that.

End result: we have the fragmented region that gets immediately retired and becomes not available
for any further allocations. Granted, the actual issue is the raciness in TLAB machinery.
Unfortunately, it would be hard to fix without redoing CollectedHeap API.

Now to the fun part about our collection policy. Our collector policy selects the regions by least
garbage, where garbage = used - live. So, if you have the fragmented 16M region with used=128K, and
live=128K, it is exactly 0K garbage -- the least probable candidate. So the region that become
fragmented due to the race in TLAB machinery is also never considered for collection, because it is
below the ShenandoahGarbageThreshold!

This race further widens when we bias the TLAB and GCLAB allocations to different sides of the heap,
and GCLABs take the most hit. You can clearly see the anomaly in Visualizer after 10+ minutes of
LRUFragger run with 50 GB LDS on 100 GB heap (...and it drives into Full GC shortly afterwards,
because free set got depleted due to fragmentation!):
  http://cr.openjdk.java.net/~shade/shenandoah/wip-tlab-race/baseline-1.png

Therefore, I propose we choose the regions by *live size*, not by *garbage*, so that we can recover
by collecting (and evacuating) the regions with low live, not exactly with high garbage. This should
help to recuperate from TLAB losses better. For full regions, both metrics yield the same result.
For half-full regions, we would have a chance to compact them into mostly-full, leaving more
fully-empty regions around.

I mused about this on IRC yesterday, and today I see G1 does the same, see
CollectionSetChooser::should_add:

  bool should_add(HeapRegion* hr) {
    assert(hr->is_marked(), "pre-condition");
    assert(!hr->is_young(), "should never consider young regions");
    return !hr->is_pinned() &&
            hr->live_bytes() < _region_live_threshold_bytes;  // <----- here
  }

...and probably with the same rationale? Found these bugs:
  https://bugs.openjdk.java.net/browse/JDK-7132029
  https://bugs.openjdk.java.net/browse/JDK-7146242

Prototype fix:

  virtual bool region_in_collection_set(ShenandoahHeapRegion* r, size_t immediate_garbage) {
    size_t threshold = ShenandoahHeapRegion::region_size_bytes() * ShenandoahGarbageThreshold / 100;
-   return r->garbage() > threshold;
+   if (UseNewCode) {
+     return (ShenandoahHeapRegion::region_size_bytes() - r->get_live_data_bytes()) > threshold;
+   } else {
+     return r->garbage() > threshold;
+   }
  }

...makes the issue disappear on the same workload running for 30+ minutes (and no Full GCs!):
 http://cr.openjdk.java.net/~shade/shenandoah/wip-tlab-race/patched-1.png

Thoughts?

Thanks,
-Aleksey