RFR: 8324649: Shenandoah: refactor implementation of free set [v14]

Aleksey Shipilev shade at openjdk.org
Fri Mar 1 18:11:59 UTC 2024


On Thu, 29 Feb 2024 19:10:21 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

>> Several objectives:
>> 1. Reduce humongous allocation failures by segregating regular regions from humongous regions
>> 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB
>> 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations
>> 4. Treat collector reserves as available for Mutator allocations after evacuation completes
>> 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah
>> 
>> We have compared performance of existing FreeSet implementation with the proposed PR over a broad set of performance workloads and see that the impact is mostly neutral.
>> 
>> Comparing 105235.0 metrics from control, 220638.0 from experiment.
>> Compare: 0.589s
>>                           Most impacted benchmarks |                              Most impacted metrics
>> -------------------------------------------------------------------------------------------------------
>>                                  Shenandoah/jython |                                          cwr_total
>> 
>> 
>>                                 Only in experiment |                                    Only in control
>> -------------------------------------------------------------------------------------------------------
>>                  crypto.signverify/trigger_failure |                        crypto.rsa/cmr_thread_roots
>>                 extremem-large-31g/adjust_pointers |       scimark.sparse.small/concurrent_thread_roots
>>             extremem-large-31g/calculate_addresses |              xml.transform/concurrent_thread_roots
>>       crypto.signverify/class_unloading_rendezvous |                    mpegaudio/concurrent_weak_roots
>>                                   serial/cmr_total |                        crypto.rsa/ctr_thread_roots
>> 
>> Shenandoah
>> -------------------------------------------------------------------------------------------------------
>> +5.64% jython/cwr_total p=0.00037
>>   Control:      1.928ms (+/-272.40us)        170
>>   Test:         2.037ms (+/-322.73us)        344
>
> Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 60 commits:
> 
>  - Remove instrumentation and cleanup magic numbers
>    
>    Two places, I had used 63 when I should have used
>    _bits_per_array_element -1.
>  - Address 32-bit compile issues
>  - Fix 32-bit size formatting in log messages
>  - Fix white space
>  - Merge remote-tracking branch 'origin/master' into restructure-free-set
>  - Merge branch 'openjdk:master' into master
>  - Two bug fixes for better performance
>    
>    1. Collector reserve size is based on memory available within regions
>       rather than the region size (oops).
>    
>    2. If an attempt to allocate within a region fails and the region has
>       already provided the percentage presumed by ShenandoahEvacWaste,
>       retire this region.  This is motivated by observations that
>       otherwise, we end up with large numbers of regions that have only
>       a small amount of memory within them (e.g. 4k) and every allocation
>       request has to wade through all of these regions before it eventually
>       finds a region that has a sufficiently large amount of available
>       memory.  In the original Shenandoah free-set implementation, the
>       behavior was to retire the first time an allocation within that
>       regions fails, regardless of whether the region has already reached
>       the ShenandoahEvacWaste threshold.
>  - Fix off-by-one error in is_forward_consecutive_ones()
>  - Fix whitespace
>  - Bug fixes and performance improvements
>    
>    1. Correct off-b-one-error in count of trailingones
>    2. Speed up search for contiguous regions (for humongous allocations) by
>       sliding window instead of initiating new search each time
>    3. Bias regular region allocations to favor regions that are already
>       partially consumed
>    4. Fix bug in move_regions_from_collector_to_mutator which caused some
>       non-empty regions to be ignored.
>  - ... and 50 more: https://git.openjdk.org/jdk/compare/be2b92bd...1aa5a3e6

It is a very interesting piece of work, but it requires much more polishing before integrating.

I think at least `ShenandoahSimpleBitMap` needs unit tests, as its implementation is not obvious. See bitmap gtests as the example. Or maybe we can polish it enough to be obvious.

More cursory review comments:

src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 420:

> 418:                  req.type_string(),
> 419:                  byte_size_in_proper_unit(req.size() * HeapWordSize), proper_unit_for_byte_size(req.size() * HeapWordSize));
> 420: 

No need for this change?

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 50:

> 48:   size_t element_bits = _bitmap[array_idx];
> 49:   size_t bit_number = start_idx % _bits_per_array_element;
> 50:   size_t the_bit = ((size_t) 0x01) << bit_number;

Here and later, we have a special macro for this, `nth_bit`.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 52:

> 50:   size_t the_bit = ((size_t) 0x01) << bit_number;
> 51:   size_t omit_mask = the_bit - 1;
> 52:   size_t mask = ((size_t) ((ssize_t) -1)) & omit_mask;

Same, we have `right_n_bits` to generate the masks like this.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 59:

> 57:   } else {
> 58:     size_t counted_ones;
> 59:     for (counted_ones = 0; element_bits & the_bit; counted_ones++) {

Here and later: this would likely trigger an implicit conversion warning. Do `(element_bits & the_bit) != 0`?

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 67:

> 65: 
> 66: // Count consecutive ones in reverse order, starting from last_idx.  Requires that there is at least one zero
> 67: // between last_idx and index value zero, inclusive.

Here and later: This duplicates the documentation in header. No need to put it here, saves us from desyncs between duplicate comments.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 76:

> 74: 
> 75:   // All ones from bit 0 to the_bit
> 76:   size_t mask = (the_bit * 2) - 1;

It would be much cleaner with `right_n_bits`, I think.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 103:

> 101:     size_t exclude_mask = ((size_t) 0x1 << bit_number) - 1;
> 102:     size_t exact_mask = overreach_mask & ~exclude_mask;
> 103:     return (element_bits & exact_mask) == exact_mask? true: false;

Here and later: this is just `return (element_bits & exact_mask) == exact_mask;`

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 162:

> 160:           ssize_t candidate_result = (array_idx * _bits_per_array_element) + bit_number;
> 161:           if (candidate_result < boundary_idx) return candidate_result;
> 162:           else return boundary_idx;

Here and later: Odd style for branch. It looks like this is just `MIN2(candidate_result, boundary_idx)`.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 168:

> 166:         }
> 167:       }
> 168:       assert(false, "should not reach here");

Just `ShouldNotReachHere()` then?

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 319:

> 317: #else
> 318:   log_info(gc)("%6s: %10s %10s %10s", "index", "Mutator Bits", "Collector Bits", "NotFree Bits");
> 319: #endif

There is no need for `_LP64` here, just print with `%18s` unconditionally.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 388:

> 386: }
> 387: 
> 388: inline ssize_t ShenandoahRegionPartitions:: leftmost(ShenandoahFreeSetPartitionId which_partition) const {

Excess space after `::`.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 394:

> 392:     return _max;
> 393:   } else {
> 394:     // _membership[which_partition].is_set(idx) may not be true if we are shrinking the interval

...and? This comment is really confusing. What does it have to do with `leftmost`?

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 892:

> 890:         //  1. Eventual collection set has fewer regions because we have packed newly allocated objects into fewer regions
> 891:         //  2. We preserve the "empty" regions longer into the GC cycle, reducing likelihood of allocation failures
> 892:         //     late in the GC cycle.

Be more terse about this. Maybe even add it to the description above the code. The large comments should not interrupt the flow of the code.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1152:

> 1150:   _partitions.retire_range_from_partition(Mutator, beg, end);
> 1151: 
> 1152:   size_t total_humongous_size = ShenandoahHeapRegion::region_size_bytes() * num;

Lost the original comment. Also, where did the call to `notify_mutator_alloc_words` went?

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1372:

> 1370:     if (move_to_collector) {
> 1371:       // Note: In a previous implementation, regions were only placed into the survivor space (collector_is_free) if
> 1372:       // they were entirely empty.  I'm not sure I understand the rationale for that.  That alternative behavior would

We should not leave "I'm not sure..." comments in the code. I believe the reason is exactly what you stated below: we do not want to put evacuated objects (implicitly live) to the region that might be mostly garbage, thus prolonging the garbage lifetime.

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1541:

> 1539:   shenandoah_assert_heaplocked();
> 1540: 
> 1541:   // Allocation request is known to satisfy all memory budgeting constraints.

What does it mean?

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 68:

> 66:   // Count consecutive ones in forward order, starting from start_idx.  Requires that there is at least one zero
> 67:   // between start_idx and index value (_num_bits - 1), inclusive.
> 68:   size_t count_leading_ones(ssize_t start_idx) const;

Does the argument have to me `ssize_t`, not `size_t`? Is this because you want it to be signed?

src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 1089:

> 1087:     heap->heap_region_iterate(&post_compact);
> 1088:     heap->set_used(post_compact.get_live());
> 1089: 

Is this change necessary?

-------------

PR Review: https://git.openjdk.org/jdk/pull/17561#pullrequestreview-1911398216
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509144390
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509151821
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509153942
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509305137
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509301710
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509306601
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509318105
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509321462
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509322005
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509324302
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509326323
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509328606
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509336524
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509341993
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509350825
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509353034
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509148199
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1509356010


More information about the shenandoah-dev mailing list