Integrated: 8260591: Shenandoah: improve parallelism for concurrent thread root scans
Aleksey Shipilev
shade at openjdk.java.net
Mon Feb 1 08:52:41 UTC 2021
On Thu, 28 Jan 2021 14:04:07 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> Following JDK-8256298, there are a few minor performance issues with the implementation.
>
> First, in the spirit of JDK-8246100, we should be scanning the Java threads the last, as they have the most parallelism. Less parallel, or lightweight roots should be scanned before them to improve overall parallelism.
>
> Second, claiming each thread dominates the per-thread processing cost. We should really be doing chunked processing.
>
> Motivating example is SPECjvm2008 serial, which has very fast concurrent cycles, and thread root scan speed is important.
>
> Before:
> # Baseline
> [56.176s][info][gc,stats] Concurrent Mark Roots = 0.308 s (a = 1452 us) (n = 212) (lvls, us = 305, 398, 457, 719, 11216)
> [56.176s][info][gc,stats] CMR: <total> = 1.236 s (a = 5832 us) (n = 212) (lvls, us = 2676, 3535, 4199, 5391, 54522)
> [56.176s][info][gc,stats] CMR: Thread Roots = 1.179 s (a = 5563 us) (n = 212) (lvls, us = 2441, 3242, 3945, 5156, 54288)
> [56.176s][info][gc,stats] CMR: VM Strong Roots = 0.005 s (a = 23 us) (n = 212) (lvls, us = 12, 19, 21, 23, 204)
> [56.176s][info][gc,stats] CMR: CLDG Roots = 0.052 s (a = 247 us) (n = 212) (lvls, us = 73, 203, 252, 293, 562)
>
> ...
> [56.176s][info][gc,stats] Concurrent Stack Processing = 0.124 s (a = 5149 us) (n = 24) (lvls, us = 535, 607, 885, 6387, 27177)
> [56.176s][info][gc,stats] Threads = 0.632 s (a = 26345 us) (n = 24) (lvls, us = 6465, 8086, 10742, 39453, 145679)
> [56.176s][info][gc,stats] CT: <total> = 0.632 s (a = 26345 us) (n = 24) (lvls, us = 6465, 8086, 10742, 39453, 145679)
>
> After:
> [56.010s][info][gc,stats] Concurrent Mark Roots = 0.116 s (a = 587 us) (n = 198) (lvls, us = 312, 371, 400, 502, 4316)
> [56.010s][info][gc,stats] CMR: <total> = 0.931 s (a = 4703 us) (n = 198) (lvls, us = 2402, 3438, 3770, 4453, 62629)
> [56.010s][info][gc,stats] CMR: Thread Roots = 0.864 s (a = 4366 us) (n = 198) (lvls, us = 1914, 3125, 3477, 4199, 54075)
> [56.010s][info][gc,stats] CMR: VM Strong Roots = 0.015 s (a = 76 us) (n = 198) (lvls, us = 20, 31, 35, 38, 4693)
> [56.010s][info][gc,stats] CMR: CLDG Roots = 0.052 s (a = 261 us) (n = 198) (lvls, us = 61, 172, 256, 299, 3861)
> ...
> [56.010s][info][gc,stats] Concurrent Stack Processing = 0.081 s (a = 3671 us) (n = 22) (lvls, us = 457, 537, 770, 3359, 24003)
> [56.010s][info][gc,stats] Threads = 0.469 s (a = 21309 us) (n = 22) (lvls, us = 6016, 6855, 8711, 18945, 103939)
> [56.010s][info][gc,stats] CT: <total> = 0.469 s (a = 21309 us) (n = 22) (lvls, us = 6016, 6855, 8711, 18945, 103939)
This pull request has now been integrated.
Changeset: ab727f0a
Author: Aleksey Shipilev <shade at openjdk.org>
URL: https://git.openjdk.java.net/jdk/commit/ab727f0a
Stats: 39 lines in 3 files changed: 20 ins; 7 del; 12 mod
8260591: Shenandoah: improve parallelism for concurrent thread root scans
Reviewed-by: zgu, rkennke
-------------
PR: https://git.openjdk.java.net/jdk/pull/2290
More information about the hotspot-gc-dev
mailing list