RFR: JDK-8307314: Implementation: Generational Shenandoah (Experimental) [v5]

Tue Jun 6 13:48:18 UTC 2023

On Sun, 4 Jun 2023 21:39:58 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

>> OpenJDK Colleagues:
>> 
>> Please review this proposed integration of Generational mode for Shenandoah GC under https://bugs.openjdk.org/browse/JDK-8307314.
>> 
>> Generational mode of Shenandoah is enabled by adding `-XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational` to a command line that already specifies ` -XX:+UseShenandoahGC`.  The implementation automatically adjusts the sizes of old generation and young generation to efficiently utilize the entire heap capacity.  Generational mode of Shenandoah resembles G1 in the following regards:
>> 
>> 1. Old-generation marking runs concurrently during the time that multiple young generation collections run to completion.
>> 2. After old-generation marking completes, we perform a sequence of mixed collections.  Each mixed collection combines collection of young generation with evacuation of a portion of the old-generation regions identified for collection based on old-generation marking information.
>> 3. Unlike G1, young-generation collections and evacuations are entirely concurrent, as with single-generation Shenandoah.
>> 4. As with single-generation Shenandoah, there is no explicit notion of eden and survivor space within the young generation.  In practice, regions that were most recently allocated tend to have large amounts of garbage and these regions tend to be collected with very little effort.  Young-generation objects that survive garbage collection tend to accumulate in regions that hold survivor objects.  These regions tend to have smaller amounts of garbage, and are less likely to be collected.  If they survive a sufficient number of young-generation collections, the “survivor” regions are promoted into the old generation.
>> 
>> We expect to refine heuristics as we gain experience with more production workloads.  In the future, we plan to remove the “experimental” qualifier from generational mode, at which time we expect that generational mode will become the default mode for Shenandoah.
>> 
>> **Testing**: We continuously run jtreg tiers 1-4 + hotspot_gc_shenandoah, gcstress, jck compiler, jck runtime, Dacapo, SpecJBB, SpecVM, Extremem, HyperAlloc, and multiple AWS production workload simulators. We test on Linux x64 and aarch64, Alpine x64 and aarch64, macOS x64 and aarch64, and Windows x64.
>
> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove three asserts making comparisons between atomic volatile variables
>   
>   Though changes to the volatile variables are individually protected by
>   Atomic load and store operations, these asserts were not assuring
>   atomic access to multiple volatile variables, each of which could be
>   modified independently of the others.  The asserts were therefore not
>   trustworthy, as has been confirmed by more extensive testing.

Thanks Thomas for the feedback:

These proposed changes represent improvements to both Generational and Non-generational modes of operation.  We can revert if that is desired, or we can specialize Generational versions of these parameters so that they can have different values in different modes, but here is a bit of background.  We've done considerable testing on a variety of synthetic workloads and some limited testing on production workloads.  As we move towards upstream integration, we expect this will help us gain exposure to more production workloads.  The following changes were based on results of this testing:

* Decrease ShenandoahLearningSteps to 5 (from 10): For some workloads, we observed that there were "way too many" learning cycles being triggered.  We also observed that the learning achieved during learning cycles was not as trustworthy as the learning achieved during actual operation, because these learning cycles typically trigger during initialization phases which are not representative of real-world operation and because they usually trigger so prematurely that there has not been enough time for allocated objects to die before we garbage collect.
* Change ShenandoahImmediateThreshold to 70 from 90: We discovered during experiments with settings on certain real production workloads that reducing the threshold for abbreviated cycles significantly improved throughput, reduced degenerated cycles, and reduced high percentile end-to-end latency on the relevant services.  These experiments were based on single-generation Shenandoah.  We saw no negative impact of making this change on our various workloads.
 * I'll let @earthling-amzn comment on the change to ShenandoahAdaptiveDecayFactor.  My recollection is that this change was also motivated by experience with single-generation Shenandoah on a real production workload.
 * The change of ShenandoahFullGCThreshold from 3 to 64 was motivated by some observations with specjbb performance as it ratchets up the workload to determine MaxJOPS.  We observed that for both single-generation Shenandoah and generational Shenandoah, the typical behavior was that a single Full GC trigger causes an "infinite" sequence of Full GC, even though we may have only lost the concurrent GC race by a small amount.  This is because (1) Full GC discards all the incremental work of the concurrent GC that was just interrupted, (2) STW Full GC creates a situation in which pent up demand for execution and allocation accumulates during the STW pause so there's a huge demand for allocation immediately following the end of Full GC, (3) The concurrent GC that triggers immediately after Full GC completes is "destined" to fail because no garbage has been introduced since Full GC finished and since SATB does not collect floating garbage that accumulates after the start of concurrent GC a
 nd since the allocation spike is so high immediately following the Full GC (e.g. 11GB/s instead of 3GB/s normally).  This change allows a sequence of degenerated GCs to manage slow evolution and sudden bursts of allocation rate much more effectively than the original code.  This is accompanied by a change in how we detect and throw OOM.  We wait for at least one Full GC but we don't force ShenandoahFullGCThreshold allocation failures before thowing OOM.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14185#issuecomment-1578800487