RFR: 8324649: Shenandoah: replace implementation of free set [v53]

Roman Kennke rkennke at openjdk.org
Fri May 3 16:29:13 UTC 2024


On Tue, 30 Apr 2024 17:40:47 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

>> Several objectives:
>> 1. Reduce humongous allocation failures by segregating regular regions from humongous regions
>> 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB
>> 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations
>> 4. Treat collector reserves as available for Mutator allocations after evacuation completes
>> 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah
>> 
>> We have compared performance of existing FreeSet implementation with the proposed PR over a broad set of performance workloads and see that the impact is mostly neutral.
>> 
>> Comparing 105235.0 metrics from control, 220638.0 from experiment.
>> Compare: 0.589s
>>                           Most impacted benchmarks |                              Most impacted metrics
>> -------------------------------------------------------------------------------------------------------
>>                                  Shenandoah/jython |                                          cwr_total
>> 
>> 
>>                                 Only in experiment |                                    Only in control
>> -------------------------------------------------------------------------------------------------------
>>                  crypto.signverify/trigger_failure |                        crypto.rsa/cmr_thread_roots
>>                 extremem-large-31g/adjust_pointers |       scimark.sparse.small/concurrent_thread_roots
>>             extremem-large-31g/calculate_addresses |              xml.transform/concurrent_thread_roots
>>       crypto.signverify/class_unloading_rendezvous |                    mpegaudio/concurrent_weak_roots
>>                                   serial/cmr_total |                        crypto.rsa/ctr_thread_roots
>> 
>> Shenandoah
>> -------------------------------------------------------------------------------------------------------
>> +5.64% jython/cwr_total p=0.00037
>>   Control:      1.928ms (+/-272.40us)        170
>>   Test:         2.037ms (+/-322.73us)        344
>
> Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits:
> 
>  - Merge remote-tracking branch 'origin/master' into restructure-free-set
>  - Merge branch 'openjdk:master' into master
>  - Merge branch 'openjdk:master' into master
>  - Remove unnecessary call to update_watermark
>  - Assert progress on find_next and find_prev
>  - Simplify partition_membership_name by code reuse
>  - Simplify by combining implemnetations of shrink_interval functions
>  - Fix NumPartition type
>    
>    Beautify the code by changing type of NumPartitions and adding Int and
>    UInt forms of NumPartitions.
>  - Refinements to support zero-build compiles
>  - Fix whitespace
>  - ... and 104 more: https://git.openjdk.org/jdk/compare/a863ef5d...d6e3546c

This is much better.
I only looked at shenandoahSimpleBitMap.cpp, and only skimmed over the find-consecutive* methods. I'm not sure, but it looks like some of the search loops might result in quadratic behaviour?

src/hotspot/share/gc/shenandoah/shenandoahSimpleBitMap.cpp line 31:

> 29: ShenandoahSimpleBitMap::ShenandoahSimpleBitMap(size_t num_bits) :
> 30:     _num_bits(num_bits),
> 31:     _num_words((num_bits + (BitsPerWord - 1)) / BitsPerWord),

I think it would be easier to read as `_num_words(align_up(num_bits, BitsPerWord) / BitsPerWord)`

src/hotspot/share/gc/shenandoah/shenandoahSimpleBitMap.cpp line 49:

> 47:   uintx bit_number = start_idx & right_n_bits(LogBitsPerWord);
> 48:   uintx omit_mask = right_n_bits(bit_number);
> 49:   uintx mask = ((uintx) 0 - 1) & ~omit_mask;

Isn't that simply `mask = ~omit_mask`? If so, you could omit the line and fold it into the above line like `mask = ~right_n_bits(bit_number)`.

src/hotspot/share/gc/shenandoah/shenandoahSimpleBitMap.cpp line 51:

> 49:   uintx mask = ((uintx) 0 - 1) & ~omit_mask;
> 50:   if ((element_bits & mask) == mask) {
> 51:     size_t counted_ones = BitsPerWord - bit_number;

So this is the case where all relevant bits are set, right? You might want to add a comment that says so, it took me quite a while to figure this out (including the mask-calculation that leads to it).

src/hotspot/share/gc/shenandoah/shenandoahSimpleBitMap.cpp line 52:

> 50:   if ((element_bits & mask) == mask) {
> 51:     size_t counted_ones = BitsPerWord - bit_number;
> 52:     return counted_ones + count_leading_ones(start_idx - counted_ones);

If you have a long string of 1s, then recursively calling itself might blow up the stack. I guess writing it in an iterative way might be better (and perhaps perform better).

src/hotspot/share/gc/shenandoah/shenandoahSimpleBitMap.cpp line 56:

> 54:     // Return number of consecutive ones starting with the_bit and including more significant bits.
> 55:     uintx aligned = element_bits >> bit_number;
> 56:     uintx complement = ~aligned;;

There is one too many ; at the end of the line.

src/hotspot/share/gc/shenandoah/shenandoahSimpleBitMap.cpp line 79:

> 77: }
> 78: 
> 79: bool ShenandoahSimpleBitMap::is_forward_consecutive_ones(idx_t start_idx, idx_t count) const {

Is this a more general version of count_trailing_ones(), i.e. which takes a count (e.g. end-index) instead of implicitely counting until the end of the bitmap? If so, maybe write one to call the other? Also, it looks like here you are using an iterative approach - good!

src/hotspot/share/gc/shenandoah/shenandoahSimpleBitMap.cpp line 109:

> 107: }
> 108: 
> 109: bool ShenandoahSimpleBitMap::is_backward_consecutive_ones(idx_t last_idx, idx_t count) const {

Same here.

src/hotspot/share/gc/shenandoah/shenandoahSimpleBitMap.cpp line 167:

> 165:       return beg;
> 166:     } else {
> 167:       // There is at least one non-zero bit within the masked element_bits.  Find it.

I think you meant to say 'There is at least one zero bit...' ?

-------------

Changes requested by rkennke (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/17561#pullrequestreview-2038435936
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1589376266
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1589397374
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1589404576
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1589406131
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1589399565
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1589415216
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1589415470
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1589420905


More information about the shenandoah-dev mailing list