RFR: 8076584: Parallelism used for redirty logged cards needs better control.
Ivan Walulya
iwalulya at openjdk.org
Thu Feb 22 13:07:13 UTC 2024
Please review this change that provides better scaling of the `RedirtyLoggedCardsTask` with high number of worker threads. In the current implementation, the threads contend for access to the Log Buffers through a single BufferNode resulting in a bottleneck as we increase the threads. The cost per card is pretty low, thus the work distribution overhead dominates the task.
The new approach preserves the BufferNodeList states from G1ParScanThreadState, which effectively act as fingers (short cuts) into the list of buffers within `G1RedirtyCardsQueueSet`. We use these BufferNodeList states to distribute BufferNodes for "redirtying" to the worker threads. By creating multiple points of access to the buffers, this method significantly reduces synchronization overheads and eliminates the bottleneck.
I have attached results from the Big Ram Tester microbenchmark on a server that spins up 163 worker threads for the `RedirtyLoggedCardsTask`




Testing: Tier 1-3
-------------
Commit messages:
- init
Changes: https://git.openjdk.org/jdk/pull/17963/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17963&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8076584
Stats: 41 lines in 5 files changed: 22 ins; 1 del; 18 mod
Patch: https://git.openjdk.org/jdk/pull/17963.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/17963/head:pull/17963
PR: https://git.openjdk.org/jdk/pull/17963
More information about the hotspot-gc-dev
mailing list