Integrated: 8260591: Shenandoah: improve parallelism for concurrent thread root scans

Aleksey Shipilev shade at openjdk.java.net
Mon Feb 1 08:52:41 UTC 2021


On Thu, 28 Jan 2021 14:04:07 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> Following JDK-8256298, there are a few minor performance issues with the implementation.
> 
> First, in the spirit of JDK-8246100, we should be scanning the Java threads the last, as they have the most parallelism. Less parallel, or lightweight roots should be scanned before them to improve overall parallelism.
> 
> Second, claiming each thread dominates the per-thread processing cost. We should really be doing chunked processing.
> 
> Motivating example is SPECjvm2008 serial, which has very fast concurrent cycles, and thread root scan speed is important.
> 
> Before:
> # Baseline
> [56.176s][info][gc,stats] Concurrent Mark Roots          =    0.308 s (a =     1452 us) (n =   212) (lvls, us =      305,      398,      457,      719,    11216)
> [56.176s][info][gc,stats]   CMR: <total>                 =    1.236 s (a =     5832 us) (n =   212) (lvls, us =     2676,     3535,     4199,     5391,    54522)
> [56.176s][info][gc,stats]   CMR: Thread Roots            =    1.179 s (a =     5563 us) (n =   212) (lvls, us =     2441,     3242,     3945,     5156,    54288)
> [56.176s][info][gc,stats]   CMR: VM Strong Roots         =    0.005 s (a =       23 us) (n =   212) (lvls, us =       12,       19,       21,       23,      204)
> [56.176s][info][gc,stats]   CMR: CLDG Roots              =    0.052 s (a =      247 us) (n =   212) (lvls, us =       73,      203,      252,      293,      562)
> 
> ...
> [56.176s][info][gc,stats] Concurrent Stack Processing    =    0.124 s (a =     5149 us) (n =    24) (lvls, us =      535,      607,      885,     6387,    27177)
> [56.176s][info][gc,stats]   Threads                      =    0.632 s (a =    26345 us) (n =    24) (lvls, us =     6465,     8086,    10742,    39453,   145679)
> [56.176s][info][gc,stats]     CT: <total>                =    0.632 s (a =    26345 us) (n =    24) (lvls, us =     6465,     8086,    10742,    39453,   145679)
> 
> After:
> [56.010s][info][gc,stats] Concurrent Mark Roots          =    0.116 s (a =      587 us) (n =   198) (lvls, us =      312,      371,      400,      502,     4316)
> [56.010s][info][gc,stats]   CMR: <total>                 =    0.931 s (a =     4703 us) (n =   198) (lvls, us =     2402,     3438,     3770,     4453,    62629)
> [56.010s][info][gc,stats]   CMR: Thread Roots            =    0.864 s (a =     4366 us) (n =   198) (lvls, us =     1914,     3125,     3477,     4199,    54075)
> [56.010s][info][gc,stats]   CMR: VM Strong Roots         =    0.015 s (a =       76 us) (n =   198) (lvls, us =       20,       31,       35,       38,     4693)
> [56.010s][info][gc,stats]   CMR: CLDG Roots              =    0.052 s (a =      261 us) (n =   198) (lvls, us =       61,      172,      256,      299,     3861)
> ...
> [56.010s][info][gc,stats] Concurrent Stack Processing    =    0.081 s (a =     3671 us) (n =    22) (lvls, us =      457,      537,      770,     3359,    24003)
> [56.010s][info][gc,stats]   Threads                      =    0.469 s (a =    21309 us) (n =    22) (lvls, us =     6016,     6855,     8711,    18945,   103939)
> [56.010s][info][gc,stats]     CT: <total>                =    0.469 s (a =    21309 us) (n =    22) (lvls, us =     6016,     6855,     8711,    18945,   103939)

This pull request has now been integrated.

Changeset: ab727f0a
Author:    Aleksey Shipilev <shade at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/ab727f0a
Stats:     39 lines in 3 files changed: 20 ins; 7 del; 12 mod

8260591: Shenandoah: improve parallelism for concurrent thread root scans

Reviewed-by: zgu, rkennke

-------------

PR: https://git.openjdk.java.net/jdk/pull/2290



More information about the hotspot-gc-dev mailing list