RFR: 8327452: G1: Improve scalability of Merge Log Buffers [v2]

Fri Mar 8 09:49:55 UTC 2024

On Thu, 7 Mar 2024 17:55:04 GMT, Ivan Walulya <iwalulya at openjdk.org> wrote:

>> Hi all,
>> 
>> Please review this change to reduce contention on the DCQS. The parallel work done by the worker threads is very small, therefore contention on the DCQS dominates the Merge Log Buffers phase. In this change, we add a sequential phase to distribute the Log Buffers to the worker threads. This removes the DCQS bottleneck in the highly contended case, at a small cost to the cases with low contention.
>> 
>> Testing Tier 1-3
>> 
>> The graphs below are using Bigramtester at 20gb, "Distribute and Avg Log Buffers" combines the duration of the sequential distribute phase and the avg duration of the parallel `Merge Log Buffers` phase.
>> | Time (ms)  | # Cards |
>> | ------------- | ------------- |
>> | ![128_concurrent_start](https://github.com/openjdk/jdk/assets/69453999/c6663ed9-6b81-4e32-bb1e-17627591406f) | ![128_concurrent_start_cards](https://github.com/openjdk/jdk/assets/69453999/1909564b-2482-4767-a334-9f5612d5cb8c)|
>> | ![128_mixed](https://github.com/openjdk/jdk/assets/69453999/d8dd8933-3aae-4795-b426-b4022bcf1ff2)| ![128_mixed_cards](https://github.com/openjdk/jdk/assets/69453999/ed5b8279-8d70-4dc6-b53d-59bf04b6e12e)|
>> | ![64_concurrent_start](https://github.com/openjdk/jdk/assets/69453999/5fb03cc5-2b86-4b9d-83ff-caa0eb482859) | ![64_concurrent_start_cards](https://github.com/openjdk/jdk/assets/69453999/c62fb05a-dca1-4b3f-895c-62837f7af2f8)|
>> | ![64_mixed](https://github.com/openjdk/jdk/assets/69453999/896c4138-a26b-457d-a5b6-d4b4285c4b8f) |  ![64_mixed_cards](https://github.com/openjdk/jdk/assets/69453999/2a6a234e-f180-4f02-9ad6-3867c5fd1881) |
>> | ![32_mixed](https://github.com/openjdk/jdk/assets/69453999/25a45fa6-f1dd-48e9-85af-702be67f7b9b) | ![32_mixed_cards](https://github.com/openjdk/jdk/assets/69453999/f5d80538-efde-47e4-860f-9b15409085a7) |
>> |  ![8_mixed](https://github.com/openjdk/jdk/assets/69453999/112b23c5-f048-4678-8d04-2c682696b357) | ![8_mixed_cards](https://github.com/openjdk/jdk/assets/69453999/0fdb9b6b-242c-424c-83ef-3d05703f5c42) |
>
> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
> 
>   peel off multiple buffers

Changes requested by tschatzl (Reviewer).

src/hotspot/share/gc/g1/g1Policy.cpp line 792:

> 790:   return (all_cards_processing_time * logged_dirty_cards / scan_heap_roots_cards) +
> 791:          average_time_ms(G1GCPhaseTimes::MergeLB) +
> 792:          phase_times()->cur_distribute_log_buffers_time_ms();

Maybe factor out this term because the important part which is also explained in detail in the comment above seems to be the first term. Not sure.

src/hotspot/share/gc/g1/g1RemSet.cpp line 1331:

> 1329:   ~G1MergeHeapRootsTask() {
> 1330:     if (_dirty_card_buffers != nullptr) {
> 1331:       FREE_C_HEAP_ARRAY(BufferNode::Stack, _dirty_card_buffers);

This does not call the destructors of the contents, doesn't it? Please change the code accordingly if not. The current destructor of `LockFreeStack` isn't completely empty...

-------------

PR Review: https://git.openjdk.org/jdk/pull/18134#pullrequestreview-1924484748
PR Review Comment: https://git.openjdk.org/jdk/pull/18134#discussion_r1517485207
PR Review Comment: https://git.openjdk.org/jdk/pull/18134#discussion_r1517489366