RFR: 8327042: G1: Parallelism used for redirty logged cards needs better control. [v2]

Albert Mingkun Yang ayang at openjdk.org
Fri Mar 1 13:03:55 UTC 2024


On Fri, 1 Mar 2024 09:14:15 GMT, Ivan Walulya <iwalulya at openjdk.org> wrote:

>> Please review this change that provides better scaling of the `RedirtyLoggedCardsTask` with high number of worker threads. In the current implementation, the threads  contend for access to the Log Buffers through a single BufferNode resulting in a bottleneck as we increase the threads. The cost per card is pretty low, thus the work distribution overhead dominates the task. 
>> 
>> The new approach preserves the BufferNodeList states from G1ParScanThreadState, which effectively act as fingers (short cuts) into the list of buffers within `G1RedirtyCardsQueueSet`. We use these  BufferNodeList states to distribute BufferNodes for "redirtying" to the worker threads. By creating multiple points of access to the buffers, this method significantly reduces synchronization overheads and eliminates the bottleneck.
>> 
>> I have attached results from the Big Ram Tester microbenchmark on a server that spins up 163 worker threads for the `RedirtyLoggedCardsTask`
>> 
>> ![prepare_mixed](https://github.com/openjdk/jdk/assets/69453999/7120aa59-9b84-4e32-ac58-129ff31bb668)
>> ![normal](https://github.com/openjdk/jdk/assets/69453999/7aafb72e-8f18-410d-9f1b-0c0adcd78b03)
>> ![mixed](https://github.com/openjdk/jdk/assets/69453999/8a115d9c-56d3-4654-931e-1fa84987915d)
>> ![concurrent_start](https://github.com/openjdk/jdk/assets/69453999/783d8804-b7b8-42a7-976c-4c217636157b)
>> 
>> 
>> Testing: Tier 1-3
>
> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review

src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 602:

> 600:     G1AbstractSubTask(G1GCPhaseTimes::RedirtyCards),
> 601:     _rdcqs(rdcqs),
> 602:     _rdc_buffers(rdc_buffers),

As I understand it, `_rdcqs`  and `_rdc_buffers` contain more or less the same thing. Can the former be replaced by the list of buffers? IOW, merging per-worker buffers in Task2 instead of Task1.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17963#discussion_r1508955873


More information about the hotspot-gc-dev mailing list