RFR: 8272083: G1: Record iterated range for BOT performance during card scan [v5]
Yude Lin
duke at openjdk.java.net
Wed Oct 27 08:47:15 UTC 2021
On Tue, 26 Oct 2021 14:04:58 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:
> I've looked through the patch but won't focus on reviewing this right now. I've instead spent time on running some testing on it comparing it to my approach of doing the work inside the pause.
>
> Some observations:
>
> * Both our approaches touches the hot-path and add code to `do_copy_to_survivor_space(...)`, I don't see any clear regression in object copy times for either approach which is very good.
> * I also see a clear reduction in scan times for both approaches, but not as big when doing the work concurrently. This could be because not everything gets updated between the pauses.
> * The "Total refinement" time (`-Xlog:gc+refine+stats`) also goes down with both approaches, quite significantly, but again the decrease is bigger when doing the work in the pause. This is not so surprising since no additional work is added to the refinement threads for my approach.
>
> I have also done a very hacky PoC that updates all new old regions concurrently using the G1 service thread. This approach doesn't need to touch the object copy path but instead just records that any new old region needs to be "fixed". This approach looks very good from a pause time perspective, but just using one thread doesn't scale very well.
>
> One problem I see with using the refinement threads (to allow scaling better) is that it will probably make the heuristic for scaling the number of threads a bit more complicated, because there are two types of work that should be handled. Have you thought anything about that?
>
I admit this is a bit hacky too. I noticed it breaks the heuristics, but just wanted to put this out for more discussion before I make the heuristics more complicated. Chances are that this solution has other unresolved problems (like mutator stall) and we won't go down this path and then I don't have to work on the heuristics. Sorry for being lazy here ; )
> One way forward would be to first go with a solution doing the work inside the pause and then continue investigating how to move it to concurrent threads in an efficient and maintainable way.
>From these results, it looks like maintaining a precise BOT(that is, every entry points to an actual object start, not plab start) during evacuation isn't very costly in the first place. Maybe the pause-time solution is the best move for now.
So if the direction is set, how do we go about making it happen? Do we change this pr for that purpose or do you want to submit another pr? I'm happy to contribute just so you know.
-------------
PR: https://git.openjdk.java.net/jdk/pull/5039
More information about the hotspot-gc-dev
mailing list