RFR: Enforce max regions [v3]

Wed Dec 7 21:15:06 UTC 2022

On Wed, 7 Dec 2022 16:14:18 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

>> This commit enforces upper bounds on the number of ShenandoahHeapRegions affiliated with each generation.  Prior to this change, enforcement of generation sizes was by usage alone.  This allowed situations in which so many sparsely populated regions were affiliated with old-gen that there were insufficient FREE regions available to satisfy legitimate young-gen allocation requests.  This was resulting in excessive TLAB allocation failures and degenerated collections.
>
> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix white space and add an assertion

Overall looks great to the extent that I understood it. 

I left a few questions/comments in a few places, typically because I may lack the complete picture of the design and its rationale in a few places.

Are there performance numbers to share with these changes? Those could be added either here in the pull request, or in the associated JBS ticket, which can be linked to the PR.

Thanks!

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 103:

> 101:   switch (req.affiliation()) {
> 102:     case ShenandoahRegionAffiliation::OLD_GENERATION:
> 103:       if (_heap->old_generation()->adjusted_unaffiliated_regions() <= 0) {

Re "<=" : I am guessing this is because adjusted unaffiliated_regions can go negative for periods of time while GC is in progress in a tight heap situation?

Unfortunately, the signature of this is a size_t (unsigned), so a "<=" comparison with "0" should have been flagged by the compiler? Or does the compiler silently treated as "==", without issuing a warning about the comparison? In any case, worth thinking about a related question in the definition of adjusted_unaffiliated_count(), and adjusting accordingly.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 126:

> 124:       for (size_t idx = _mutator_leftmost; idx <= _mutator_rightmost; idx++) {
> 125:         ShenandoahHeapRegion* r = _heap->get_region(idx);
> 126:         if (is_mutator_free(idx) && (allow_new_region || r->affiliation() != ShenandoahRegionAffiliation::FREE)) {

Aside: Does Shanandoah have a concept of an allocation cursor per mutator in shared space independent of its TLAB? This is because firstly it might make first fit searches more efficient, and secondly we might end up with spatial locality of allocations that are temporally in close proximity from the same mutator, which might help reduce fragmentation and potentially evacuation costs.

One might consider resetting the cursors following each minor gc.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 171:

> 169:             ShenandoahHeapRegion* r = _heap->get_region(idx);
> 170:             if (can_allocate_from(r)) {
> 171:               flip_to_gc(r);

Does the flipping have to strictly precede the allocation attempt? Otherwise the flip is futile and we steal space from mutators but to no advantage.

I also notice the asymmetry in the existence of `flip_to_gc()` but no corresponding `flip_to_mutator()`. I suppose that's because regions freed by GC as a result of evacuation will be available to mutators, so the flipping to GC may be considered temporary in that sense. However, I suspect futile flipping may strand space in GC territory for no good reason.

In any case, take my comments here with the right grain of salt because I am lacking the philosophical foundations of the need for this mutator & collector view dichotomy here. It would be good if in the `.hpp` file we expended a few sentences listing the rationale for that design choice; e.g. the allocate from left and allocate from right could still hold without necessarily having strict collector/mutator affiliations (as indicated by the `flip` above)?

src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 198:

> 196: 
> 197:   if (heap->mode()->is_generational()) {
> 198:     // Since we probably have not yet reclaimed the most recently selected collection set, we have to defer

I'd make the comment less tentative, and state:

// Since the most recently selected collection set may not have been reclaimed at this stage,
// we'll defer unadjust_avaliable() until after the full gc is completed.

Question: is the adjusted available value (modulo the loaned size) used by full gc for any purpose, or is it to satisfy assertion checks / verification in some of the methods invoked during full gc work below?

src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 924:

> 922: size_t ShenandoahGeneration::decrement_affiliated_region_count() {
> 923:   _affiliated_region_count--;
> 924:   return _affiliated_region_count;

Both these seem fine and probably more readable, but you'd save a line by returning the pre-{in,de}cremented result, e.g.:

`return  --_affiliated_region_count;` 

Would it be useful to assert that the region count is always non-zero?

src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 986:

> 984: }
> 985: 
> 986: size_t ShenandoahGeneration::adjusted_unaffiliated_regions() {

You can const this method too.

src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 988:

> 986: size_t ShenandoahGeneration::adjusted_unaffiliated_regions() {
> 987:   assert(adjusted_capacity() > used_regions_size(), "adjusted_unaffiliated_regions() cannot return negative");
> 988:   return (adjusted_capacity() - used_regions_size()) / ShenandoahHeapRegion::region_size_bytes();

So, just to be clear, this is the number of unaffiliated regions that can _potentially_ be affiliated with this region. I assume it isn't the case that that number of unaffiliated free regions actually exist?

If the answer is "no, that number of unaffiliated free regions do exist" would it be worth asserting that invariant here (or may be because this is all concurrent with allocations, no such guarantees will ever hold anyway, so it's futile to assert such invariants?).

Indeed this question ties in with my comment further up where you do a "<=" comparison with 0 on the return value from here.

src/hotspot/share/gc/shenandoah/shenandoahGeneration.hpp line 169:

> 167:   void scan_remembered_set(bool is_concurrent);
> 168: 
> 169:   size_t increment_affiliated_region_count();

Add a single line comment in the header file describing what a method returns:

// Returns the affiliated region count following the operation.

src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1491:

> 1489:         // doing this work during a safepoint.  We cannot put humongous regions into the collection set because that
> 1490:         // triggers the load-reference barrier (LRB) to copy on reference fetch.
> 1491:         if (r->promote_humongous() == 0) {

See my comment in ::promote_humongous().

I think that method could directly call the requisite expansion code under those circumstances, so this code can move there, with (as I noted there) promotion always succeeding for humongous object arrays at least, but in general for all humongous objects that are deemed eligible for promotion by other criteria (see my note in ::promote_humongous() on potentially treating humongous primitive type arrays differently from humongous object arrays).

src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 1040:

> 1038:       // Then fall through to finish the promotion after releasing the heap lock.
> 1039:     } else {
> 1040:       return 0;

This is interesting. Doing some thinking out loud here.

I realize we want to very strictly enforce the generation sizes (indicated by the affiliation of regions to generations in a formal sense of generation sizes), but I do wonder if humongous regions should not enter into that calculus at all? In this case, the reason we would typically want to designate a humongous object as old (via promotion via this method) is because we don't want to have to spend effort scanning its contents. After all we never spend any time copying it when it survives a minor collection. Under the circumstances, it appears as if we would always want humongous objects that are primitive type arrays to stay in young (never be promoted, although I admit that it might make sense to not pay even the cost of marking it if it's been around forever per generational hypothesis), and if a humongous object that has references (i.e. ages into the old generation) then it's affiliated with old and is "promoted" even if there aren't any available regions in old. In other words
 , humongous objects, because they are never copied, have affiliations that do not affect the promotion calculus in a strict manner.

For these reasons, I'd think that humongous object promotions should be treated specially and old generation size should not be a criterion for determining generational affiliation of humongous regions.

src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 343:

> 341: };
> 342: 
> 343: class ShenandoahCalculateRegionStatsClosure : public ShenandoahHeapRegionClosure {

A one-line documentation spec here would be useful:

// A closure used to accumulate the net used, committed, and garbage bytes, and number of regions;
// typically associated with a generation in generational mode.

-------------

Marked as reviewed by ysr (Author).

PR: https://git.openjdk.org/shenandoah/pull/179