RFR (M): 8027959: Investigate early reclamation of large objects in G1

Thu Jul 17 14:52:22 UTC 2014

Hi Thomas,

On Tuesday 15 July 2014 11.10.53 Thomas Schatzl wrote:
> Hi all,
> 
>   could I have reviews for the following change that allows G1 to
> eagerly/early reclaim humongous objects on every GC?
> 
> Problem:
> 
> In G1 large objects are always allocated in the old generation,
> currently requiring a complete heap liveness analysis (full gc, marking)
> to reclaim them.
> 
> This is far from ideal for many transaction based enterprise
> applications that create large objects that are only live until a
> (typically short-lived) transaction has been completed (e.g. in a
> ResultSet of a JDBC query that generates a large result,
> byteoutputstreams, etc).
> This results in the heap filling up relatively quickly, typically
> leading to unnecessary marking cycles just to reclaim them.
> 
> The solution implemented here is to directly target these types of
> objects by using remembered set and reachability information from any GC
> to make (conservatively) sure that we can reclaim the space.
> 
> You can quickly determine this if there are no references from the roots
> or young gen to that object, and if there are no remembered set entries
> to that object. This is sufficient because:
>  - g1 root processing walks over all roots and young gen always which
> are sources for potential references.
>  - the remembered set contains potential locations that reference this
> object. These are all locations, as humongous objects are always
> allocated into their own regions (so there can be no intra-region
> references).
> 
> These are all potential reference locations during GC pause because GC
> pause makes sure that the remembered set is current at pause time.
> 
> We can also reclaim if the region is considered live by the marking if
> it has been allocated during marking (and the other conditions hold). At
> reclaim time, if something referenced that object, there either must
> have been a remembered set entry or a reference from the roots/or young
> gen if it is actually live so nobody can install a reference from it any
> more.
> (If there has once been a reference from another old region, it must
> have had a remembered set entry).
> When marking continues after GC, it will simply notice that the region
> has been freed, and skip over it during continue.
> 
> After putting the humongous region into the collection set, liveness
> detection occurs by intercepting the slow path for allocation of space
> for that humongous object. As it is humongous, we always end up there.
> 
> The change includes some minor optimizations:
>  - during registering the humongous regions for inclusion into the
> collection set, we already check whether that humongous object is
> actually one we can potentially remove. E.g. has no remembered set. This
> makes it a "humongous candidate" (note there is no actual state for
> this, just a name for these regions)
>  - after finding out that the region is live once, remove that humongous
> region from the collection set so that further references to it do not
> cause use to go into the slow path. This is to avoid going into the slow
> path too often if that object is referenced a lot. (Most likely, if that
> object had many references it would not be a "humongous candidate" btw)
>  - if there were no candidates at the start of the GC, then do not
> bother trying to reclaim later.
> 
> In total I found no particular slowdown when enabling this feature by
> default. I.e. if there are no humongous candidate objects, there will be
> no change to the current code path at all because none will be added to
> the collection set.
> 
> The feature can be disabled completely by disabling
> G1ReclaimDeadHumongousObjectsAtYoungGC.
> 
> There is a new log line "Humongous Reclaim" measuring reclaim time, and
> if with G1LogLevel=finest is set it prints some statistics about total,
> candidate and reclaimed humongous objects on the heap.
> 
> The CR contains a graph showing large improvements on average humongous
> object reclamation delay. In total we have seen some benchmarks
> reclaiming GBs of heap space over time using this functionality (instead
> of waiting for the marking/full GC). This improves throughput
> significantly as there is more space available for the young gen on
> average now.
> 
> Also it might avoid users to manually increase heap region sizes just to
> avoid humongous object troubles.
> 
> CR:
>  https://bugs.openjdk.java.net/browse/JDK-8027959
> 
> Webrev:
>  http://cr.openjdk.java.net/~tschatzl/8027959/webrev/

g1CollectedHeap.hpp:

Most of the new functions you add for humongous regions operate on region 
indices, except for
+  bool humongous_region_is_always_live(HeapRegion* region);
Is there any reason for it not operating on a region index as well? I think 
that would make the API nice and consistent.

g1CollectedHeap.cpp:

This comment is pretty confusing to me, can you try to rephrase it?
3798     if (is_candidate) {
3799       // Do not even try to reclaim a humongous object that we already 
know will
3800       // not be treated as live later. A young collection will not 
decrease the
3801       // amount of remembered set entries for that region.

forwardee can only be null here for humongous objects, correct?
Would it make sense to add an else-clause asserting that to keep some of the 
old assert functionality?
4674     if (forwardee != NULL) {
4675       oopDesc::encode_store_heap_oop(p, forwardee);
4676       if (do_mark_object != G1MarkNone && forwardee != obj) {
4677         // If the object is self-forwarded we don't need to explicitly
4678         // mark it, the evacuation failure protocol will do so.
4679         mark_forwarded_object(obj, forwardee);
4680       }
4681 
4682       if (barrier == G1BarrierKlass) {
4683         do_klass_barrier(p, forwardee);
4684       }
4685       needs_marking = false;
4686     }

g1OopClosures.inline.hpp

Are you sure that the change to is_in_cset_or_humongous is correct here?
I think that when FilterIntoCSClosure is used for card refinement we are not 
actually interested in references to humongous objects since they don't move. 
So this could cause us to needlessly add cards to the into_cset_dcq.

However in the case of scanning remembered sets it's the new check is probably 
exactly what you are looking for.
  43 template <class T>
  44 inline void FilterIntoCSClosure::do_oop_nv(T* p) {
  45   T heap_oop = oopDesc::load_heap_oop(p);
  46   if (!oopDesc::is_null(heap_oop) &&
  47       _g1-
>is_in_cset_or_humongous(oopDesc::decode_heap_oop_not_null(heap_oop))) {
  48     _oc->do_oop(p);
  49   }
  50 }

Perhaps this should be left for a further cleanup, one approach could be to 
fold the logic using is_in_cset_or_humongous into HeapRegion_DCTOC since 
scanCard seems to be the only place where that closure is used.

Previsously the else-clause here would cause a remembered set entry of the 
region containing "obj" to be added, to remember the reference "p"
Now that you're removing humongous objects from this consideration, can we 
miss remembered set entries for humongous regions?
  65 inline void G1ParScanClosure::do_oop_nv(T* p) {
  66   T heap_oop = oopDesc::load_heap_oop(p);
  67 
  68   if (!oopDesc::is_null(heap_oop)) {
  69     oop obj = oopDesc::decode_heap_oop_not_null(heap_oop);
  70     if (_g1->is_in_cset_or_humongous(obj)) {
  71       // We're not going to even bother checking whether the object is
  72       // already forwarded or not, as this usually causes an immediate
  73       // stall. We'll try to prefetch the object (for write, given that
  74       // we might need to install the forwarding reference) and we'll
  75       // get back to it when pop it from the queue
  76       Prefetch::write(obj->mark_addr(), 0);
  77       Prefetch::read(obj->mark_addr(), (HeapWordSize*2));
  78 
  79       // slightly paranoid test; I'm trying to catch potential
  80       // problems before we go into push_on_queue to know where the
  81       // problem is coming from
  82       assert((obj == oopDesc::load_decode_heap_oop(p)) ||
  83              (obj->is_forwarded() &&
  84                  obj->forwardee() == oopDesc::load_decode_heap_oop(p)),
  85              "p should still be pointing to obj or to its forwardee");
  86 
  87       _par_scan_state->push_on_queue(p);
  88     } else {
  89       _par_scan_state->update_rs(_from, p, _worker_id);
  90     }

I also somewhat agree with Bengt's opinion about explicitly checking for 
humongous objects.

/Mikael

> 
> Testing:
>  jprt, aurora adhoc, various internal benchmarks
> 
> Thanks,
>   Thomas