RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v3]

Wed Feb 26 02:17:36 UTC 2025

On Wed, 26 Feb 2025 00:01:12 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> The change is to improve the observability of  Shenandoah GC, basically there are three changes for  Shenandoah GC timings in this PR:
>> 
>> 1. Net GC pause timings include the time to propagate GC state to Java threads
>> 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs
>> 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. 
>> 
>> With the change, the new GC timing log will be like:
>> 
>> [11.056s][info][gc,stats    ] Concurrent Reset                     89 us
>> [11.056s][info][gc,stats    ] Pause Init Mark (G)                 257 us
>> [11.056s][info][gc,stats    ] Pause Init Mark (N)                  17 us
>> [11.056s][info][gc,stats    ]   Update Region States                3 us
>> [11.056s][info][gc,stats    ]   Propagate GC state                  1 us
>> [11.056s][info][gc,stats    ] Concurrent Mark Roots               232 us, parallelism: 1.96x
>> [11.056s][info][gc,stats    ]   CMR: <total>                      456 us
>> [11.056s][info][gc,stats    ]   CMR: Thread Roots                 429 us, workers (us): 139, 148, 142, ---, ---, ---, 
>> [11.057s][info][gc,stats    ]   CMR: VM Strong Roots               11 us, workers (us):   8,   3,   0, ---, ---, ---, 
>> [11.057s][info][gc,stats    ]   CMR: CLDG Roots                    16 us, workers (us):  16, ---, ---, ---, ---, ---, 
>> [11.057s][info][gc,stats    ] Concurrent Marking                 1304 us, parallelism: 2.33x
>> [11.057s][info][gc,stats    ]   CM: <total>                      3043 us
>> [11.057s][info][gc,stats    ]   CM: Parallel Mark                3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, 
>> [11.057s][info][gc,stats    ]   Flush SATB                        204 us
>> [11.057s][info][gc,stats    ] Pause Final Mark (G)                865 us
>> [11.057s][info][gc,stats    ] Pause Final Mark (N)                234 us
>> [11.057s][info][gc,stats    ]   Finish Mark                       129 us, parallelism: 0.01x
>> [11.057s][info][gc,stats    ]   Propagate GC state                  2 us
>> [11.057s][info][gc,stats    ]   Update Region States               12 us
>> [11.057s][info][gc,stats    ]   Choose Collection Set              25 us
>> [11.057s][info][gc,stats    ]   Rebuild Free Set                   29 us
>> [11.057s][info][gc,stats    ] Concurrent Weak References           67 us, parallelism: 0.25x
>> [11.057s][info][gc,stats    ]   CWRF: <total>                      17 us
>> [11.057s][info][gc,...
>
> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove propagate_gc_state_to_all_threads call from op_init_update_refs

Changes are fine.

This jumped out in yr sample output:

...
[11.057s][info][gc,stats    ]   Finish Mark                       129 us, parallelism: 0.01x
...
[11.057s][info][gc,stats    ] Concurrent Weak References           67 us, parallelism: 0.25x

which seemed kinda interesting. I assume this is just a consequence of the very little work (and extremely brief time in this phase) here, and can be ignored in this sample output from likely a toy GC, or one where you may have artificially boosted the number of worker threads. Still I thought I'd ask in case you've seen this with bigger timings or more work in any of these phases with low fractional speed-ups.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23759#issuecomment-2683721058