RFR: 8327000: GenShen: Integrate updated Shenandoah implementation of FreeSet into GenShen [v8]

Wed Jun 19 01:01:30 UTC 2024

On Thu, 13 Jun 2024 22:22:01 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

>> The mainline implementation of ShenandoahFreeSet was recently updated.  This PR integrates the upstream changes
>> into Generational Shenandoah.
>
> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Minor refinements to test programs
>   
>   TestAllocIntArrays: comments to explain behavior.
>   TestOldGrowthTriggers: reduce the number of loop iterations so this test
>   will not time out on less powerful test platforms.

src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 757:

> 755:     heap->free_set()->prepare_to_rebuild(young_cset_regions, old_cset_regions, first_old, last_old, num_old);
> 756:     // Free set construction uses reserve quantities, because they are known to be valid here
> 757:     heap->free_set()->finish_rebuild(young_cset_regions, old_cset_regions, num_old, true);

is `num_old` different than the size of the `old_cset_regions` set?

src/hotspot/share/gc/shenandoah/shenandoahGeneration.hpp line 150:

> 148:   void increase_allocated(size_t bytes);
> 149: 
> 150:   // These methods change the capacity of the region by adding or subtracting the given number of bytes from the current

I would say `generation`, instead of `region` here to avoid a little bit of unnecessary terminology confusion with the `heap region` object.

Also may be add something like: `Return the new capacity of the generation following the change.`

src/hotspot/share/gc/shenandoah/shenandoahGeneration.hpp line 155:

> 153:   size_t decrease_capacity(size_t decrement);
> 154: 
> 155:   size_t establish_capacity(size_t byte_size);

Why not simply `set_capacity` ? Also a 1-line comment.

// set the capacity of the generation, returning the value set.

src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 992:

> 990:     // gc_no_progress_count is incremented following each degen or full GC that fails to achieve is_good_progress().
> 991:     // Note that Generational Shenandoah may increment no_progress_count faster than traditional Shenandoah because young
> 992:     // GCs, which may degenerate, typically occur more frequently than single-generation Global GCs.

Should this documentation comment about the expected difference in rates of incrementation occur in the vicinity of any differences (or otherwise) in the setting of the default value of `ShenandoahNoProgressThreshold` in each case -- or if the default values are the same, then where the option is documented?

I also wonder if `is_good_progress()` and `gc_no_progress_count` should apply uniformly in both cases to whole heap collections rather than to young collections (in the generational case). Otherwise, don't we run the risk of declaring no progress prematurely without a global collection in the case of generational, thus breaking in spirit the requirement of "best efforts attempt to reclaim unused memory"?

src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1029:

> 1027:     //           the better remediation than repeated attempts to allocate following repeated GC cycles.
> 1028: 
> 1029:     while ((result == nullptr) && (original_count == shenandoah_policy()->full_gc_count())) {

I'd go a bot farther in the direction of being conservative and declare best effort only after a full/global GC (whether concurrent or stop-world, although stop-world would be better). I realize this may be much worse latency-wise than customers would care for, but I think the notion of latency being poor for a concurrent collector should probably be separated from the notion of cpu overhead for GC and space remaining after a best effort to reclaim unused memory. I realize that in the concurrent collector case where low latency is a design requirement, this would make the notion of OOM for exceeding gc overhead unlikely to ever engage, with latency already being very poor.

If we want to control that separately, I'd have some notion of frequency (or time density if you will) of latency-violating events that would trigger an OOM, although that would be new behaviour. Most JVM users typically use JMX/introspection to self-destruct or to raise an alarm about malfunction. It can of course be argued that that would cover the existing OOM upon gc overhead too, because ultimately a very slow JVM because of GC overhead would translate to worsening latency for user code.

Still, I would keep the notion of gc overhead and space available separate from the _latency_ of a stop-world or concurrent event and always ensure that we look at available space only following a global/full gc to decide if an OOM due to GC overhead should be thrown. Can discuss this more offline in person.

src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 121:

> 119: 
> 120: // Change affiliation to YOUNG_GENERATION if _state is not _pinned_cset, _regular, or _pinned.  This implements
> 121: // behavior previously performed as a side effect of make_regular_bypass().  This is used by Full GC

For this and other `make_*` methods of `ShenandoahHeapRegion`, I wonder if it makes sense to provide documentation in the header file. (Replicating it in the implementation is fine too, but not necessary.) What for example is the semantics of the `_bypass` suffix in the names of some of the methods? I was not able to easily tell from reading the code in the methods, so thought a documentation comment might be useful for at least the non-obvious cases.

src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 132:

> 130: 
> 131: 
> 132: 

delete. The 3 blank lines seem to be a mismerge?

src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 78:

> 76:     assert(new_top <= end(), "PLAB cannot span end of heap region");
> 77:     set_top(new_top);
> 78:     // We do not req.set_actual_size() here.  The caller sets it.

Not needed now, but I would be much happier with some regular structure in the allocation paths of when the actual size of an allocation request is populated. I am guessing it is in the method in the call chain where the allocation first succeeds, or some other natural structure to the allocation paths. I'll try and see about getting a clearer mental map of those paths in the future.

src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 429:

> 427:     size_t generation_capacity = generation->max_capacity();
> 428:     guarantee(stats.span() <= generation_capacity,
> 429:               "%s: generation (%s) size spanned by regions (" SIZE_FORMAT ") * region size: " PROPERFMT

For consistency, place the region size value also in circular braces `( ... )` like for the other values?

test/hotspot/jtreg/gc/shenandoah/oom/TestThreadFailure.java line 93:

> 91:             ProcessBuilder pb = ProcessTools.createLimitedTestJavaProcessBuilder(
> 92:                     "-Xmx32m",
> 93:                     "-XX:+UnlockExperimentalVMOptions", "-XX:ShenandoahNoProgressThreshold=24",

See my previous comment about when `ShenandoahNoProgressThreshold` should increment earlier in code.

-------------

PR Review Comment: https://git.openjdk.org/shenandoah/pull/440#discussion_r1643630442
PR Review Comment: https://git.openjdk.org/shenandoah/pull/440#discussion_r1643625733
PR Review Comment: https://git.openjdk.org/shenandoah/pull/440#discussion_r1643624102
PR Review Comment: https://git.openjdk.org/shenandoah/pull/440#discussion_r1643584901
PR Review Comment: https://git.openjdk.org/shenandoah/pull/440#discussion_r1643592499
PR Review Comment: https://git.openjdk.org/shenandoah/pull/440#discussion_r1643614533
PR Review Comment: https://git.openjdk.org/shenandoah/pull/440#discussion_r1643604825
PR Review Comment: https://git.openjdk.org/shenandoah/pull/440#discussion_r1643616071
PR Review Comment: https://git.openjdk.org/shenandoah/pull/440#discussion_r1643617762
PR Review Comment: https://git.openjdk.org/shenandoah/pull/440#discussion_r1643619067