RFR: 8260591: Shenandoah: improve parallelism for concurrent thread root scans
Aleksey Shipilev
shade at openjdk.java.net
Thu Jan 28 14:09:52 UTC 2021
Following JDK-8256298, there are a few minor performance issues with the implementation.
First, in the spirit of JDK-8246100, we should be scanning the Java threads the last, as they have the most parallelism. Less parallel, or lightweight roots should be scanned before them to improve overall parallelism.
Second, claiming each thread dominates the per-thread processing cost. We should really be doing chunked processing.
Motivating example is SPECjvm2008 serial, which has very fast concurrent cycles, and thread root scan speed is important.
Before:
# Baseline
[56.176s][info][gc,stats] Concurrent Mark Roots = 0.308 s (a = 1452 us) (n = 212) (lvls, us = 305, 398, 457, 719, 11216)
[56.176s][info][gc,stats] CMR: <total> = 1.236 s (a = 5832 us) (n = 212) (lvls, us = 2676, 3535, 4199, 5391, 54522)
[56.176s][info][gc,stats] CMR: Thread Roots = 1.179 s (a = 5563 us) (n = 212) (lvls, us = 2441, 3242, 3945, 5156, 54288)
[56.176s][info][gc,stats] CMR: VM Strong Roots = 0.005 s (a = 23 us) (n = 212) (lvls, us = 12, 19, 21, 23, 204)
[56.176s][info][gc,stats] CMR: CLDG Roots = 0.052 s (a = 247 us) (n = 212) (lvls, us = 73, 203, 252, 293, 562)
...
[56.176s][info][gc,stats] Concurrent Stack Processing = 0.124 s (a = 5149 us) (n = 24) (lvls, us = 535, 607, 885, 6387, 27177)
[56.176s][info][gc,stats] Threads = 0.632 s (a = 26345 us) (n = 24) (lvls, us = 6465, 8086, 10742, 39453, 145679)
[56.176s][info][gc,stats] CT: <total> = 0.632 s (a = 26345 us) (n = 24) (lvls, us = 6465, 8086, 10742, 39453, 145679)
After:
[56.010s][info][gc,stats] Concurrent Mark Roots = 0.116 s (a = 587 us) (n = 198) (lvls, us = 312, 371, 400, 502, 4316)
[56.010s][info][gc,stats] CMR: <total> = 0.931 s (a = 4703 us) (n = 198) (lvls, us = 2402, 3438, 3770, 4453, 62629)
[56.010s][info][gc,stats] CMR: Thread Roots = 0.864 s (a = 4366 us) (n = 198) (lvls, us = 1914, 3125, 3477, 4199, 54075)
[56.010s][info][gc,stats] CMR: VM Strong Roots = 0.015 s (a = 76 us) (n = 198) (lvls, us = 20, 31, 35, 38, 4693)
[56.010s][info][gc,stats] CMR: CLDG Roots = 0.052 s (a = 261 us) (n = 198) (lvls, us = 61, 172, 256, 299, 3861)
...
[56.010s][info][gc,stats] Concurrent Stack Processing = 0.081 s (a = 3671 us) (n = 22) (lvls, us = 457, 537, 770, 3359, 24003)
[56.010s][info][gc,stats] Threads = 0.469 s (a = 21309 us) (n = 22) (lvls, us = 6016, 6855, 8711, 18945, 103939)
[56.010s][info][gc,stats] CT: <total> = 0.469 s (a = 21309 us) (n = 22) (lvls, us = 6016, 6855, 8711, 18945, 103939)
-------------
Commit messages:
- 8260591: Shenandoah: improve parallelism for concurrent thread root scans
Changes: https://git.openjdk.java.net/jdk/pull/2290/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2290&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8260591
Stats: 39 lines in 3 files changed: 20 ins; 7 del; 12 mod
Patch: https://git.openjdk.java.net/jdk/pull/2290.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/2290/head:pull/2290
PR: https://git.openjdk.java.net/jdk/pull/2290
More information about the hotspot-gc-dev
mailing list