RFR: 8324649: Shenandoah: refactor implementation of free set

William Kemper wkemper at openjdk.org
Wed Jan 24 18:37:32 UTC 2024


On Wed, 24 Jan 2024 16:10:02 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

> Several objectives:
> 1. Reduce humongous allocation failures by segregating regular regions from humongous regions
> 2. Do not retire regions just because an allocation failed within the region if the memory remaining within the region is large enough to represent a LAB
> 3. Track range of empty regions in addition to range of available regions in order to expedite humongous allocations
> 4. Treat collector reserves as available for Mutator allocations after evacuation completes
> 5. Improve encapsulation so as to enable an OldCollector reserve for future integration of generational Shenandoah
> 
> On internal performance pipelines, this change shows:
> 
>  1. some Increase in page faults and rss_max with certain workloads, presumably because of "segregation" of humongous from regular regions.
>  2. An increase in CPU time on certain benchmarks: sunflow (+165%), scimark.sparse.small (+51%), scimark.sparse.large (+50%), lusearch (+43%).  This may result from potentially smaller tlabs and/or more effort required to allocate tlabs in the presence of failed shared allocation requests (which no longer immediately retire regions).
>  3. An increase in trigger_failure for the hyperalloc_a2048_o4096 experiment (not yet understood)
>  4. 2-30x improvements on multiple metrics of the Extremem phased workload latencies (most likely resulting from fewer degenerated or full GCs)
> 
> Shenandoah
> -------------------------------------------------------------------------------------------------------
> +166.55% scimark.sparse.large/minor_page_fault_count p=0.00000
>   Control: 819938.875   (+/-5724.56  )         40
>   Test:    2185552.625   (+/-26378.64  )         20
> 
> +166.16% scimark.sparse.large/rss_max p=0.00000
>   Control: 3285226.375   (+/-22812.93  )         40
>   Test:    8743881.500   (+/-104906.69  )         20
> 
> +164.78% sunflow/cpu_system p=0.00000
>   Control:      1.280s  (+/-  0.10s )         40
>   Test:         3.390s  (+/-  0.13s )         20
> 
> +149.29% hyperalloc_a2048_o4096/trigger_failure p=0.00000
>   Control:      3.259   (+/-  1.46  )         33
>   Test:         8.125   (+/-  2.05  )         20
> 
> +143.75% pmd/major_page_fault_count p=0.03622
>   Control:      1.000   (+/-  0.00  )         40
>   Test:         2.438   (+/-  2.59  )         20
> 
> +80.22% lusearch/minor_page_fault_count p=0.00000
>   Control: 2043930.938   (+/-4777.14  )         40
>   Test:    3683477.625   (+/-5650.29  )         20
> 
> +50.50% scimark.sparse.small/minor_page_fault_count p=0.00000
>   Control: 697899.156   (+/-3457.82  )      ...

Minor suggestions, questions.

src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2071:

> 2069:   void do_work(uint worker_id) {
> 2070:     T cl;
> 2071:     if (CONCURRENT && (worker_id == 0)) {

Could move this work to happen on the control thread? Not sure if there is an advantage to having a single worker do this. Does it matter when other workers observe the effect of `move_collector_sets_to_mutator`?

src/hotspot/share/gc/shenandoah/shenandoahMarkingContext.cpp line 90:

> 88: 
> 89: bool ShenandoahMarkingContext::is_complete() {
> 90:   bool result = _is_complete.is_set();

Unnecessary change? Also seems unrelated to freeset changes.

-------------

Changes requested by wkemper (Author).

PR Review: https://git.openjdk.org/jdk/pull/17561#pullrequestreview-1842096900
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1465380785
PR Review Comment: https://git.openjdk.org/jdk/pull/17561#discussion_r1465374470


More information about the hotspot-gc-dev mailing list