RFR: 8272083: G1: Record iterated range for BOT performance during card scan [v4]

Wed Oct 13 09:02:49 UTC 2021

On Sat, 9 Oct 2021 10:05:07 GMT, Yude Lin <duke at openjdk.java.net> wrote:

>> A fix to the problem in 8272083 is to use a per-worker pointer to indicate where the worker has scanned up to, similar to the _scanned_to variable. The difference is this pointer (I call it _iterated_to) records the end of the object and _scanned_to records the end of the scan. Since we always scan with increasing addresses, the end of the latest object scanned is also the address where BOT has fixed up to. So we avoid having to fix below this address when calling block_start(). This implementation approximately reduce the number of calls to set_offset_array() during scan_heap_roots() 2-10 times (in my casual test with -XX:G1ConcRefinementGreenZone=1000000).
>> 
>> What this approach not solving is random access to BOT. So far I haven't found anything having this pattern.
>
> Yude Lin has updated the pull request incrementally with four additional commits since the last revision:
> 
>  - Fixed a bug in array claim
>  - Fixed a bug that causes premature deactivate
>  - Renaming variables
>  - Renaming

> > Hi Stefan,
> > Anyway, I like the idea of removing all cost from the pause time, which is what the current approach tries to achieve. I don't think there will be lot more additional concurrent work than there currently is. Because if we don't update BOT concurrently, the refinement threads still has to update a large part of BOT. So in effect it transfers the work from concurrent refine to concurrent BOT update. As you can see in an earlier graph, the concurrent refinement rates actually increased. But this is to compare concurrent BOT update vs no BOT update at all. If we were to compare concurrent BOT update vs paused BOT update, yes, there will be additional concurrent work. But I think concurrent work should be favored over pause-time work, generally speaking. By the 

👍 

There should be some balance though; I particularly like the idea about only doing refinement work for areas that we actually collected remembered set entries. 

> 
> I also like removing time from the GC pause, but we need to keep a balance. If the additional work outside the pause is significant larger it is not as clear of a win. Not saying we don't want to do this but we should carefully look at the options.
> 
> I also believe that this will be more efficient than what we currently have, just want to make sure it is worth the additional complexity. So hearing that you plan to simplify this quite a bit is really good. 

👍 

> One question, how do we make sure that as much as possible of the BOT is updated concurrently before the next GC?
> 

The idea is to make sure that new cards into areas not yet updated are processed in an expedited way (i.e. "immediately") - since card processing automatically updates the BOTs, we only need to make sure they are processed without waiting for the regular mechanism to kick in.

One could also enqueue the "first" card in newly allocated PLABs in special queues that are processed with priority. There are a few options as usual, and we would certainly like to find a good one wrt to several aspects.

Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/5039