RFC: Adding generational support to ShenandoahHeapRegion

Wed Aug 19 00:11:24 UTC 2020

Thanks for the continued conversation.  I'll remove points of "established agreement" and respond below to areas that we are still "exploring together".

On 8/17/20, 1:53 PM, "Roman Kennke" <rkennke at redhat.com> wrote:

    >     d) Our experimentation with Shenandoah is that it does not
    > perform as well when the heap is highly utilized (in fact, Shenandoah
    > seems to perform best when less than 30% of heap is live).

    That is because marking (especially through 'static' old-space stuff)
    takes too long and eats up CPU resources, is that right?

Actually, there's more going on here than just the overhead of marking through static old-space stuff.  We don't yet fully understand all of the causes for performance degradation.  Some specific behaviors that we have observed are:

 i) When "lots" of critical data has to be evacuated, the threads that need access to this data are essentially forced to wait for all this data to be copied before they can access it.  When there are many threads waiting for the same "large amounts of data" to be copied, Shenandoah seems to be much less efficient than, for example, parallel gc, because multiple mutator threads redundantly copy the same data in parallel, and then for each object copied, all but one thread abandons its copy after the copy-commit race is won by a different thread.

 ii) When the GC effort is "large" because a large fraction of the heap needs to be marked and evacuated, there is a higher probability that allocating mutators will exhaust the allocation pool before the GC threads replenish it.  When this happens, Shenandoah reverts to "degenerated GC", which is stop-the-world.  While a degenerated GC may actually collect garbage more efficiently than a concurrent GC, the long pauses associated with degenerated GC are highly undesirable.  Delivering on the Shenandoah promise of pause-free operation requires that we avoid degenerated GC.

    >     e) Stable, highly utilized, long-term memory, derives less
    > benefit from repeated defragmentation copying.  In general, we plan
    > to use concurrent mark and sweep for old-gen memory.

    I don't really agree with that conclusion ;-)

I should have said that concurrent mark and sweep for old-gen memory is one of the options that we are planning to explore.  We are also open to alternative approaches.

    >     f)  Our planned region-based approach to old-gen memory allows
    > the old-gen collector to identify certain regions as ready to benefit
    > from defragmentation.  When this is detected, the old-gen collector
    > will re-introduce the region to Shenandoah, with all objects already
    > tagged as having max age.  Shenandoah will evacuate the live objects
    > from this region, immediately promoting them back into more
    > efficiently packed regions of old-gen memory.

    My idea to solve the problem of having to coordinate 2 relocating
    collectors is to not do that, but instead take advantage of the fact
    that all collections are concurrent anyway, and so it doesn't matter
    all that much if we piggy-back some extra-work on a normal young-gen
    cycle and defragment one or more old-gen regions too - depending on
    some heuristic, so that it eventually defragments old-gen regularily.
    (Failing that, we'd run a full-concurrent Shenandoah cycle over all of
    the heap.) This requires card-marking to also track old->old and young-
    >old references, though. It's basically G1's mixed collections, but
    fully concurrent. (Or, put differently, kinda what you suggested with
    adding-back old regions to young-gen so that it gets defragmented.) The
    advantage is that old- and young- collections do not interleave and
    thus don't require extra coordination.

    WDYT?

Maybe we are already in agreement on this...  Maybe it all depends on how we define the "some heuristic" that tells us when to defragment which portions of old-gen memory.  One question to be determined based on experimentation with real-world workloads:
  If concurrent marking of an old-gen region discovers the region to be 80% utilized, will we choose to ignore its garbage, sweep up the garbage, or evacuate the region?
  What if it is 60% utilized?

Some other "considerations" that influence our design choices:

 1. It is critical that concurrent GC replenish the free pool on pace with mutator allocation rates.  Generally, this requires that all garbage in the young-gen be found with high urgency.  

 2. Building on real-time scheduling discipline, we want to avoid creating priority inversion between the lower priority activity of reclaiming the relatively slow accumulation of garbage in old-gen memory and the much higher priority activity of reclaiming the anticipated high rate of garbage accumulation in young-gen memory.