RFR: 8324649: Shenandoah: refactor implementation of free set [v2]

Fri Jan 26 18:47:36 UTC 2024

On Thu, 25 Jan 2024 20:45:50 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

>> Several objectives:
>> 1. Reduce humongous allocation failures by segregating regular regions from humongous regions
>> 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB
>> 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations
>> 4. Treat collector reserves as available for Mutator allocations after evacuation completes
>> 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah
>> 
>> On internal performance pipelines, this change shows:
>> 
>>  1. some Increase in page faults and rss_max with certain workloads, presumably because of "segregation" of humongous from regular regions.
>>  2. An increase in System CPU time on certain benchmarks: sunflow (+165%), scimark.sparse.large (+50%), lusearch (+43%).  This system CPU time increase appears to correlate with increased page faults and/or rss.
>>  3. An increase in trigger_failure for the hyperalloc_a2048_o4096 experiment (not yet understood)
>>  4. 2-30x improvements on multiple metrics of the Extremem phased workload latencies (most likely resulting from fewer degenerated or full GCs)
>> 
>> Shenandoah
>> -------------------------------------------------------------------------------------------------------
>> +166.55% scimark.sparse.large/minor_page_fault_count p=0.00000
>>   Control: 819938.875   (+/-5724.56  )         40
>>   Test:    2185552.625   (+/-26378.64  )         20
>> 
>> +166.16% scimark.sparse.large/rss_max p=0.00000
>>   Control: 3285226.375   (+/-22812.93  )         40
>>   Test:    8743881.500   (+/-104906.69  )         20
>> 
>> +164.78% sunflow/cpu_system p=0.00000
>>   Control:      1.280s  (+/-  0.10s )         40
>>   Test:         3.390s  (+/-  0.13s )         20
>> 
>> +149.29% hyperalloc_a2048_o4096/trigger_failure p=0.00000
>>   Control:      3.259   (+/-  1.46  )         33
>>   Test:         8.125   (+/-  2.05  )         20
>> 
>> +143.75% pmd/major_page_fault_count p=0.03622
>>   Control:      1.000   (+/-  0.00  )         40
>>   Test:         2.438   (+/-  2.59  )         20
>> 
>> +80.22% lusearch/minor_page_fault_count p=0.00000
>>   Control: 2043930.938   (+/-4777.14  )         40
>>   Test:    3683477.625   (+/-5650.29  )         20
>> 
>> +50.50% scimark.sparse.small/minor_page_fault_count p=0.00000
>>   Control: 697899.156   (+/-3457.82  )         40
>>   Test:    1050363.812   (+/-175...
>
> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove unnecessary change related to debugging

src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 166:

> 164:   // equal to ShenandoahHeapRegion::humongous_threshold_words().  The caller of allocate_single is responsible
> 165:   // for registering the resulting object and setting the remembered set card values as appropriate.  The
> 166:   // most common case is that we are allocating a PLAB in which case object registering and card dirtying

Nit pick, but isn't this called for all allocations that are not humongous? Why do we say that PLAB is the most common? Also, much of this comment only applies to the generational mode.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1468017114