RFR: 8256265: G1: Improve parallelism in regions that failed evacuation
Hamlin Li
mli at openjdk.java.net
Thu Jan 27 09:26:34 UTC 2022
On Wed, 26 Jan 2022 12:23:05 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:
> Regarding the log messages: We might want to fix up them a bit, I did not look at our recent email discussion on what we came up with, and their level.
>
> Some other initial thoughts worth considering:
>
> *) What I already noticed yesterday on some tests, and can also be seen in your log snippet, is that the "Remove self-forwards in chunks" takes a lot of time, unexpectedly much to me actually. I want to look further into this to understand the reason(s).
>
In fact, normally most of time of "Post Evacuate Cleanup 1" is spent on "Restore Retained Regions" in baseline version. In parallel version, the proportion of "Restore Retained Regions" in "Post Evacuate Cleanup 1" is reduced. e.g. following is the "Post Evacuate Cleanup 1"/"Restore Retained Regions" time comparison between baseline and parallel:
baseline:
[3.169s][info ][gc,phases] GC(0) Post Evacuate Collection Set: 10.0ms
[3.169s][debug][gc,phases] GC(0) Post Evacuate Cleanup 1: 9.5ms
parallel
[3.105s][info ][gc,phases] GC(0) Post Evacuate Collection Set: 2.5ms
[3.106s][debug][gc,phases] GC(0) Post Evacuate Cleanup 1: 2.0ms
the difference between "Post Evacuate Cleanup 1" and "Restore Retained Regions" is the same between baseline and parallel version, which is spent on other subphases in "Post Evacuate Cleanup 1".
> *) The other concern I have is whether we really need (or can avoid) the need for the "Wait for Ready In Retained Regions" phase. It looks a bit unfortunate to actually have a busy-loop in there; this should definitely use proper synchronization or something to wait on if it is really needed. What of the retained region preparation do we really need? On a first look, maybe just the BOT reset, which we might be able to put somewhere else (I may be totally wrong). Also, if so, the Prepare Retained regions should probably be split out to be started before all other tasks in this "Post Evacuate Cleanup 1" phase.
>
> I can see that from a timing perspective "Wait For Ready" is not a problem in all of my tests so far.
Yes, currently seems "Wait For Ready" does not cost much time, as "Prepared Retained Regions" is quick, not sure if synchronization will help any more.
But I will investigate if we can omit "Prepared Retained Regions" and "Wait For Ready" subphases totally to simplify the logic. [TODO]
>
> *) The "Prepared Retained Regions" phase stores the amount of live data into the `HeapRegion`; for this reason the change adds these `G1RegionMarkStats` data gathering via the `G1RegionMarkStatsCache`; I think the same information could be provided while iterating over the chunks (just do an `Atomic::add` here) instead. A single `Atomic::add` per thread per retained region at most seems to be okay. That would also remove the `Evac Fail Merge Live` phase afaict.
I will do this refactor soon.
>
> *) Not too happy that the `G1HeapRegionChunk` constructor does surprisingly much work, which surprisingly takes very little time.
>
> *) I was wondering whether it would be somewhat more efficient for the `Prepare Chunks` phase to collect some of the information needed there somehow else. Something is bubbling up in my mind, but nothing specific yet, and as mentioned, it might not be worth doing given its (lack of) cost.
I will put it on backlog to see if it can be simplied. [TODO]
-------------
PR: https://git.openjdk.java.net/jdk/pull/7047
More information about the hotspot-gc-dev
mailing list