RFR: 8324649: Shenandoah: refactor implementation of free set [v4]

Mon Jan 29 23:05:28 UTC 2024

On Sat, 27 Jan 2024 00:42:46 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

>> Several objectives:
>> 1. Reduce humongous allocation failures by segregating regular regions from humongous regions
>> 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB
>> 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations
>> 4. Treat collector reserves as available for Mutator allocations after evacuation completes
>> 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah
>> 
>> On internal performance pipelines, this change shows:
>> 
>>  1. some Increase in page faults and rss_max with certain workloads, presumably because of "segregation" of humongous from regular regions.
>>  2. An increase in System CPU time on certain benchmarks: sunflow (+165%), scimark.sparse.large (+50%), lusearch (+43%).  This system CPU time increase appears to correlate with increased page faults and/or rss.
>>  3. An increase in trigger_failure for the hyperalloc_a2048_o4096 experiment (not yet understood)
>>  4. 2-30x improvements on multiple metrics of the Extremem phased workload latencies (most likely resulting from fewer degenerated or full GCs)
>> 
>> Shenandoah
>> -------------------------------------------------------------------------------------------------------
>> +166.55% scimark.sparse.large/minor_page_fault_count p=0.00000
>>   Control: 819938.875   (+/-5724.56  )         40
>>   Test:    2185552.625   (+/-26378.64  )         20
>> 
>> +166.16% scimark.sparse.large/rss_max p=0.00000
>>   Control: 3285226.375   (+/-22812.93  )         40
>>   Test:    8743881.500   (+/-104906.69  )         20
>> 
>> +164.78% sunflow/cpu_system p=0.00000
>>   Control:      1.280s  (+/-  0.10s )         40
>>   Test:         3.390s  (+/-  0.13s )         20
>> 
>> +149.29% hyperalloc_a2048_o4096/trigger_failure p=0.00000
>>   Control:      3.259   (+/-  1.46  )         33
>>   Test:         8.125   (+/-  2.05  )         20
>> 
>> +143.75% pmd/major_page_fault_count p=0.03622
>>   Control:      1.000   (+/-  0.00  )         40
>>   Test:         2.438   (+/-  2.59  )         20
>> 
>> +80.22% lusearch/minor_page_fault_count p=0.00000
>>   Control: 2043930.938   (+/-4777.14  )         40
>>   Test:    3683477.625   (+/-5650.29  )         20
>> 
>> +50.50% scimark.sparse.small/minor_page_fault_count p=0.00000
>>   Control: 697899.156   (+/-3457.82  )         40
>>   Test:    1050363.812   (+/-175...
>
> Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Fix typo in comment
>  - Remove unnecessary include

A few more feedback comments. Will try and finish in next round, but flushing the review comments buffer for now until I get back to this later.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 98:

> 96:     _rightmosts[idx] = 0;
> 97:     _leftmosts_empty[idx] = _max;
> 98:     _rightmosts_empty[idx] = 0;

// Make all regions NotFree

Would this read easier if one used `type` or similar for the loop variable name & used multiple assignment statements:

_leftmosts[type] = _leftmosts_empty[type] = _max;
_rightmosts[type] = _rightmosts_empty[type] = 0;
_used_by_[type] = _capacity_of[type] = 0;

Then

_region_counts[Mutator] = _region_counts[Collector] = 0;
_region_counts[NotFree] = _max;

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 108:

> 106: }
> 107: 
> 108: void ShenandoahSetsOfFree::clear_all() {

I'd opt for something that uses a term other than `clear`, since `clear` has no specific semantics since you are also including `NotFree` as a type of set you track here (this goes back to the question of whether you really need to track that separately at all -- it could be relegated to a default sentinel type for any memory region, and may or may not need tracking the intervals here? (I haven't looked to see if you ever need to use the [left, right] of the NotFree interval for anything other than assertion checking. You could say `clear_mutator_and_collector`, or `unfree()`. Something that more clearly conveys the mental model of the class more clearly in its API names.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 120:

> 118: }
> 119: 
> 120: inline void ShenandoahSetsOfFree::shrink_bounds_if_touched(ShenandoahFreeMemoryType set, size_t idx) {

`if_touched` isn't as clear as `if_extremal` -- as far as I can tell you shrink when the region in question is extremal (left or right), at which point you walk right or left, respectively, to get to the next member in the set where you plan the end of that interval.

Or you could just say `shrink_interval_if_at_edge`.

In other words, there's a notion of `touched` here that isn't clear (or at least not obvious to me).

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 50:

> 48:   size_t _rightmosts[NumFreeSets];
> 49:   size_t _leftmosts_empty[NumFreeSets];
> 50:   size_t _rightmosts_empty[NumFreeSets];

I'd motivate the `.._empty` notion and document the role of each of these fields (for an arbitrary type).

Just a line is enough, e.g. a brief block comment such as:

//  For each type, we track an interval outside of which a region of that type
// is guaranteed not to be found. This makes searches for free space more efficient.
//  For any given type, its _leftmosts represents its least index, and its _rightmosts its
// greatest index. Empty intervals are indicated by the canconical [_max, 0].

In this description, what does `_leftmosts_empty` represent?

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 55:

> 53:   size_t _capacity_of[NumFreeSets];
> 54:   size_t _used_by[NumFreeSets];
> 55:   size_t _region_counts[NumFreeSets];

Similarly, document what these values hold at any time; just a line is enough, e.g.

// For any given type, its _capacity_of is its net capacity, and _used_by the amount used.
// _region_counts is the number of regions of that type.

Why the extra `_of` and `_by`? Wouldn't it be more consistent with naming elsewhere
to just say `capacity` and `used`?

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 58:

> 56: 
> 57:   inline void shrink_bounds_if_touched(ShenandoahFreeMemoryType set, size_t idx);
> 58:   inline void expand_bounds_maybe(ShenandoahFreeMemoryType set, size_t idx, size_t capacity);

To my comment elsewhere of the meaning of `_if_touched` (for the shrink), couldn't you just say `_maybe` like for the expand for the shrink, just as you did for expand?

I personally (subjectively) prefer `interval`, but `bounds` is fine if there is precedence elsewhere in the code.

-------------

PR Review: https://git.openjdk.org/jdk/pull/17561#pullrequestreview-1849900986
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1470313179
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1470318421
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1470324976
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1470332313
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1470335344
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1470337981