RFR: 8272083: G1: Record iterated range for BOT performance during card scan [v4]

Wed Oct 13 08:38:48 UTC 2021

On Wed, 13 Oct 2021 05:46:12 GMT, Yude Lin <duke at openjdk.java.net> wrote:

> Hi Stefan,
> 
> I looked at the PoC code. My understanding is you're updating the BOT as objects that cross card boundaries are allocated in a PLAB. I haven't try this particular approach. But my first reaction when I found this issue is also to process the plabs in the pause. (I chose a lazier approach, that is, during gc pause, update BOT for plabs allocated in the last gc pause. Lazy or not, I think there is little difference. The lazy approach needs an additional phase, and some code to coordinate parallel BOT update, which has overhead; whereas updating BOT as objects are allocated into a plab, might waste some work, because in mixed gc, there might be some old regions we never ever need to scan?)
> 

Yes, my approach would generate a complete BOT for all old regions and it is true that parts of it might never be scanned (if there are no outgoing pointers). On the other hand no additional book-keeping is needed and we have the objects crossing thresholds at hand.

> Anyway, I like the idea of removing all cost from the pause time, which is what the current approach tries to achieve. I don't think there will be lot more additional concurrent work than there currently is. Because if we don't update BOT concurrently, the refinement threads still has to update a large part of BOT. So in effect it transfers the work from concurrent refine to concurrent BOT update. As you can see in an earlier graph, the concurrent refinement rates actually increased. But this is to compare concurrent BOT update vs no BOT update at all. If we were to compare concurrent BOT update vs paused BOT update, yes, there will be additional concurrent work. But I think concurrent work should be favored over pause-time work, generally speaking. By the way, I'm working on an update on the patch. It will reuse the concurrent refinement threads and dirty card queue infrastructure, as suggested in an earlier discussion. The patch looks less scary without the additional threads and
  card set data structures. I hope that will lessen your worry about this solution. Thanks!
> 

I also like removing time from the GC pause, but we need to keep a balance. If the additional work outside the pause is significant larger it is not as clear of a win. Not saying we don't want to do this but we should carefully look at the options. 

I also believe that this will be more efficient than what we currently have, just want to make sure it is worth the additional complexity. So hearing that you plan to simplify this quite a bit is really good. One question, how do we make sure that as much as possible of the BOT is updated concurrently before the next GC?

Thanks,
Stefan

> Regards, Yude

-------------

PR: https://git.openjdk.java.net/jdk/pull/5039