RFR: 8260591: Shenandoah: improve parallelism for concurrent thread root scans

Aleksey Shipilev shade at openjdk.java.net
Thu Jan 28 14:09:52 UTC 2021


Following JDK-8256298, there are a few minor performance issues with the implementation.

First, in the spirit of JDK-8246100, we should be scanning the Java threads the last, as they have the most parallelism. Less parallel, or lightweight roots should be scanned before them to improve overall parallelism.

Second, claiming each thread dominates the per-thread processing cost. We should really be doing chunked processing.

Motivating example is SPECjvm2008 serial, which has very fast concurrent cycles, and thread root scan speed is important.

Before:
# Baseline
[56.176s][info][gc,stats] Concurrent Mark Roots          =    0.308 s (a =     1452 us) (n =   212) (lvls, us =      305,      398,      457,      719,    11216)
[56.176s][info][gc,stats]   CMR: <total>                 =    1.236 s (a =     5832 us) (n =   212) (lvls, us =     2676,     3535,     4199,     5391,    54522)
[56.176s][info][gc,stats]   CMR: Thread Roots            =    1.179 s (a =     5563 us) (n =   212) (lvls, us =     2441,     3242,     3945,     5156,    54288)
[56.176s][info][gc,stats]   CMR: VM Strong Roots         =    0.005 s (a =       23 us) (n =   212) (lvls, us =       12,       19,       21,       23,      204)
[56.176s][info][gc,stats]   CMR: CLDG Roots              =    0.052 s (a =      247 us) (n =   212) (lvls, us =       73,      203,      252,      293,      562)

...
[56.176s][info][gc,stats] Concurrent Stack Processing    =    0.124 s (a =     5149 us) (n =    24) (lvls, us =      535,      607,      885,     6387,    27177)
[56.176s][info][gc,stats]   Threads                      =    0.632 s (a =    26345 us) (n =    24) (lvls, us =     6465,     8086,    10742,    39453,   145679)
[56.176s][info][gc,stats]     CT: <total>                =    0.632 s (a =    26345 us) (n =    24) (lvls, us =     6465,     8086,    10742,    39453,   145679)

After:
[56.010s][info][gc,stats] Concurrent Mark Roots          =    0.116 s (a =      587 us) (n =   198) (lvls, us =      312,      371,      400,      502,     4316)
[56.010s][info][gc,stats]   CMR: <total>                 =    0.931 s (a =     4703 us) (n =   198) (lvls, us =     2402,     3438,     3770,     4453,    62629)
[56.010s][info][gc,stats]   CMR: Thread Roots            =    0.864 s (a =     4366 us) (n =   198) (lvls, us =     1914,     3125,     3477,     4199,    54075)
[56.010s][info][gc,stats]   CMR: VM Strong Roots         =    0.015 s (a =       76 us) (n =   198) (lvls, us =       20,       31,       35,       38,     4693)
[56.010s][info][gc,stats]   CMR: CLDG Roots              =    0.052 s (a =      261 us) (n =   198) (lvls, us =       61,      172,      256,      299,     3861)
...
[56.010s][info][gc,stats] Concurrent Stack Processing    =    0.081 s (a =     3671 us) (n =    22) (lvls, us =      457,      537,      770,     3359,    24003)
[56.010s][info][gc,stats]   Threads                      =    0.469 s (a =    21309 us) (n =    22) (lvls, us =     6016,     6855,     8711,    18945,   103939)
[56.010s][info][gc,stats]     CT: <total>                =    0.469 s (a =    21309 us) (n =    22) (lvls, us =     6016,     6855,     8711,    18945,   103939)

-------------

Commit messages:
 - 8260591: Shenandoah: improve parallelism for concurrent thread root scans

Changes: https://git.openjdk.java.net/jdk/pull/2290/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2290&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8260591
  Stats: 39 lines in 3 files changed: 20 ins; 7 del; 12 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2290.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2290/head:pull/2290

PR: https://git.openjdk.java.net/jdk/pull/2290



More information about the hotspot-gc-dev mailing list