RFR: 8324649: Shenandoah: refactor implementation of free set [v6]

Y. Srinivas Ramakrishna ysr at openjdk.org
Thu Feb 1 07:21:11 UTC 2024


On Wed, 31 Jan 2024 16:28:15 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

>> Several objectives:
>> 1. Reduce humongous allocation failures by segregating regular regions from humongous regions
>> 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB
>> 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations
>> 4. Treat collector reserves as available for Mutator allocations after evacuation completes
>> 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah
>> 
>> On internal performance pipelines, this change shows:
>> 
>>  1. some Increase in page faults and rss_max with certain workloads, presumably because of "segregation" of humongous from regular regions.
>>  2. An increase in System CPU time on certain benchmarks: sunflow (+165%), scimark.sparse.large (+50%), lusearch (+43%).  This system CPU time increase appears to correlate with increased page faults and/or rss.
>>  3. An increase in trigger_failure for the hyperalloc_a2048_o4096 experiment (not yet understood)
>>  4. 2-30x improvements on multiple metrics of the Extremem phased workload latencies (most likely resulting from fewer degenerated or full GCs)
>> 
>> Shenandoah
>> -------------------------------------------------------------------------------------------------------
>> +166.55% scimark.sparse.large/minor_page_fault_count p=0.00000
>>   Control: 819938.875   (+/-5724.56  )         40
>>   Test:    2185552.625   (+/-26378.64  )         20
>> 
>> +166.16% scimark.sparse.large/rss_max p=0.00000
>>   Control: 3285226.375   (+/-22812.93  )         40
>>   Test:    8743881.500   (+/-104906.69  )         20
>> 
>> +164.78% sunflow/cpu_system p=0.00000
>>   Control:      1.280s  (+/-  0.10s )         40
>>   Test:         3.390s  (+/-  0.13s )         20
>> 
>> +149.29% hyperalloc_a2048_o4096/trigger_failure p=0.00000
>>   Control:      3.259   (+/-  1.46  )         33
>>   Test:         8.125   (+/-  2.05  )         20
>> 
>> +143.75% pmd/major_page_fault_count p=0.03622
>>   Control:      1.000   (+/-  0.00  )         40
>>   Test:         2.438   (+/-  2.59  )         20
>> 
>> +80.22% lusearch/minor_page_fault_count p=0.00000
>>   Control: 2043930.938   (+/-4777.14  )         40
>>   Test:    3683477.625   (+/-5650.29  )         20
>> 
>> +50.50% scimark.sparse.small/minor_page_fault_count p=0.00000
>>   Control: 697899.156   (+/-3457.82  )         40
>>   Test:    1050363.812   (+/-175...
>
> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename and comments for _capacity_of and _used_by

A few more comments. I expect my next round to be the last one. I think we are almost there. Sorry for the delay and for the length of the review comments.

src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1001:

> 999: 
> 1000:   if (VerifyAfterGC) {
> 1001:     Universe::verify();

This line deletion seems to be the only change now in this file. So this file can be removed from the diffs.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 114:

> 112: }
> 113: 
> 114: inline void ShenandoahRegionPartition::shrink_interval_if_boundary_modified(ShenandoahFreeSetPartitionId partition, size_t idx) {

const both the parameters.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 36:

> 34: enum ShenandoahFreeSetPartitionId : uint8_t {
> 35:   NotFree,                      // Region has been retired and is not in any free set: there is no available memory.
> 36:   Mutator,                      // Region is in the Mutator free set: available memory is available to mutators.

Just want to make sure: "available to mutators" -- is this both for object allocation as well as for possible evacuation as part of the mutator LRB?

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 37:

> 35:   NotFree,                      // Region has been retired and is not in any free set: there is no available memory.
> 36:   Mutator,                      // Region is in the Mutator free set: available memory is available to mutators.
> 37:   Collector,                    // Region is in the Collector free set: available memory is reserved for evacuations.

When mutators evacuate the target of an LRB, do they use `Mutator` or `Collector`. I assume the former? In that case, I'd say for Collector: `available memory is reserved for collector threads for evacuation`.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 44:

> 42: 
> 43: // This class implements partitioning of regions into distinct sets.  Each ShenandoahHeapRegion is either in the Mutator free set,
> 44: // the Collector free set, or in neither free set (NotFree).

I noticed that you use the term "free partition" quite a lot later, I'd just start using that term early on when talking about these sets. You could, for example, say:

// Whenever we say "free partition", we mean any partition other than the "NotFree" partition.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 50:

> 48:   const size_t _max;            // The maximum number of heap regions
> 49:   const size_t _region_size_bytes;
> 50:   const ShenandoahFreeSet* _free_set;

Interesting: why does the partitioning need a reference to its containing free set?

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 54:

> 52: 
> 53:   // For each type, we track an interval outside of which a region affiliated with that partition is guaranteed
> 54:   // not to be found. This makes searches for free space more efficient.  For each partition p, _leftmosts[p]

I am being a bit pedantic here.
Partition is usually identified with the _set of equivalence classes_. Thus a partition is an equivalence relation, and each equivalence class in the partition has, in this case, a distinct partition id (i.e. each region is either in the Mutator equivalence class aka Mutator free set, the Collector equivalence class aka Collector free set, or the NotFree equivalence class aka NotFree set). In your terminology, each equivalence class is a "free set".

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 60:

> 58:   size_t _rightmosts[NumPartitions];
> 59: 
> 60:   // Allocation for humongous objects needs to find regions that are entirely empty.  For each partion p, _leftmosts[p]

`_leftmosts_empty` and, similarly, `_rightmosts_empty`.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 63:

> 61:   // represents the first region belonging to this partition that is completely empty and _rightmosts[p] represents the
> 62:   // last region that is completely empty.  If there are no completely empty regions in this partition, this is represented
> 63:   // by canonical [_max, 0].

... is no completely empty region in this partition id, ...



... the canonical ...

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 68:

> 66: 
> 67:   // For each partition p, _capacity[p] represents the total amount of memory within the partition at the time
> 68:   // of the most recent rebuild, _used[p] represents the total amount of memory that has been consumed within this

instead of consumed, can we just say used (or allocated)?

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 74:

> 72:   // and _used[p], even though the region may have been removed from the free set.
> 73:   size_t _capacity[NumPartitions];
> 74:   size_t _used[NumPartitions];

In light of your earlier documentation of leftmost/righmost/empty/available etc. then, would it be fair to say that the following statement is always true:

for p = NotFree:
1. leftmosts[p] = leftmosts_empty[p] = _max
2. rightmosts_empty[p] = rightmosts_empty[p] = 0
3. capacity[p] = used[p] = region_size

Are the "NotFree" entries for these arrays ever used?

If not, is there any point in keeping them in a product build? Is there any point in keeping them in a non-product build? Does it have some other role that makes it important to keep it, anyway?

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 75:

> 73:   size_t _capacity[NumPartitions];
> 74:   size_t _used[NumPartitions];
> 75:   size_t _region_counts[NumPartitions];

If tracked, is this an invariant of these fields?

- region_counts[NotFree] == _max - (region_counts[Mutator] + region_counts[Collector])

(This would also make the region_counts[NotFree] unnecessary? See my previous comment.)

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 95:

> 93:   void make_free(size_t idx, ShenandoahFreeSetPartitionId which_partition, size_t region_capacity);
> 94: 
> 95:   // Place region idx into free partition new_partition.  Requires that idx is currently not NotFree.

Include semantics of region_capacity in comment, e.g.:


// Move region idx, with region_capacity bytes of available free space,
// from the NotFree partition to the free partition new_partition.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 99:

> 97: 
> 98:   // Returns the ShenandoahFreeSetPartitionId affiliation of region idx, NotFree if this region is not currently free.
> 99:   // This does not enforce that free_set membership implies allocation capacity.

I think "NotFree if this region is not currently free" is unnecessary and frankly confusing (why are we mentioning membership in the NotFree partition specially?)

I also do not understand (am confused by) the second sentence in the comment.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 103:

> 101: 
> 102:   // Returns true iff region idx is in the test_set free_set.  Before returning true, asserts that the free
> 103:   // set is not empty.  Requires that test_set != NotFree or NumPartitions.

This comment probably needs to be updated. Something simple like:

// Is the region, idx, part of which_partition?

As it stands, the comment is pretty confusing. In general concise statements of specification are best for documenting APIs. If the presence of APIs needs to be motivated, that should all be done early on in a block comment that motivates the class and why it is what it is. It makes for much terser, clearer, and maintainable  documentation.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 112:

> 110:   // In other words:
> 111:   //   if the requested which_partition is empty:
> 112:   //     leftmost() and leftmost_empty() return _max, rightmost() and rightmost_empty() return 0

There are mutually contradictory statements in the highlighted portion of the documentation above. I suspect the earlier reference to -1 is obsolete and needs to be deleted.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 132:

> 130:     assert (which_partition > NotFree && which_partition < NumPartitions, "selected free set must be valid");
> 131:     return _used[which_partition];
> 132:   }

The assertions here indicate to me that it is likely my earlier suspicion that many of these fields are not needed for NotFree is true. (See my earlier comment about many of these fields for NotFree.) I feel it may be best to let the enum type system enforce this, rather than use these assertions. NotFree then becomes a sentinel value that is not part of the legal index set that can be passed in here.

May be that can be done later, as I realize the type contagion might necessitate more changes (although I think it will conceptually simplify this) by maintaining the disctinction between the tags in the _membership[] array, and the types used for the so-called free partitions.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 142:

> 140:     assert (which_partition > NotFree && which_partition < NumPartitions, "selected free set must be valid");
> 141:     _used[which_partition] = value;
> 142:   }

Same for these assertions.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 171:

> 169: };
> 170: 
> 171: class ShenandoahFreeSet : public CHeapObj<mtGC> {

It would be good to have a block comment here motivating this class.
It seems (from looking at some of its public APIs) as if it publicly exports only the "mutator view", which I find interesting.

The other partitions in `ShenandoahRegionPartition` appears to be for efficiency of the implementation in service of the public APIs for ShenandoahFreeSet.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 174:

> 172: private:
> 173:   ShenandoahHeap* const _heap;
> 174:   ShenandoahRegionPartition _partitions;

I think the use of a plural for the field illustrates the English language interpretation of partition. To be consistent, I'd rename the class name also to the plural `ShenandoahRegionPartitions` as remarked earlier.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 184:

> 182:   HeapWord* allocate_single(ShenandoahAllocRequest& req, bool& in_new_region);
> 183: 
> 184:   // While holding the heap lock, allocate memory for a humongous object which will span multiple contiguous heap

`which will` or `which may`? (Is a humongous object allowed to span just a single region as well?)

Or are objects humongous only if they won't fit in a region? In which case the "will" is correct.

I was confused by tests that use `ShenandoahHumongousThreshold=50` , `=90`, etc.

May be in those cases, we go through the `allocate_single()` despite allocating an object (or block) bigger than `ShenandoahHeapRegion::humongous_threshold_words()` ? (That would make the pre-condition of the previous method suspect, though.)

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 187:

> 185:   // regions.
> 186:   //
> 187:   // Precondition: req.size() > ShenandoahHeapRegion::humongous_threshold_words().

`>` or `>=` ?

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 221:

> 219:   //
> 220:   // Note that we plan to replenish the Collector reserve at the end of update refs, at which time all
> 221:   // of the regions recycled from the collection set will be available.

I see that you are trying to motivate this API. I feel that these comments belong in the caller. The API should not need to motivate where the caller must call this from. The API came about because there was a need for this in its clients. A good API spec should state its actions.

Motivating its uses drags in context that detracts from clarity of the class and method.

I realize this is a somewhat subjective stance but from experience it makes for better documentation and more maintainable/readable code.

The place for such documentation is usually in a block comment motivating the general design of the class and why it offers the APIs that it does, and who its clients are.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 231:

> 229:   inline size_t available() const {
> 230:     assert(used() <= capacity(), "must use less than capacity");
> 231:     return capacity() - used();

So `ShenandoahFreeSet` publicly exports only the mutator view?

src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 1069:

> 1067:     heap->collection_set()->clear();
> 1068: 
> 1069:     // Since Full GC directly manipulates top of certain regions, certain ShenandoahFreeSet abstractions may have been corrupted.

Instead of "may have been corrupted", which can be alarming and confusing, I'd state this as:

// Full GC doesn't use or maintain the ShenandoahFreeSet abstractions,
// so we rebuild the free set from scratch following a Full GC.

-------------

PR Review: https://git.openjdk.org/jdk/pull/17561#pullrequestreview-1855229569
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473702305
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473831169
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473809906
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473809223
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473835542
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473810650
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473709592
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473710297
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473712934
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473721984
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473826640
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473828523
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473842682
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473845614
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473848754
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473851167
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473859958
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473860619
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473739467
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473725085
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473726454
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473728080
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473883437
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473738298
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1473899033


More information about the shenandoah-dev mailing list