RFR: 8272083: G1: Record iterated range for BOT performance during card scan

Tue Aug 10 03:18:29 UTC 2021

On Sat, 7 Aug 2021 04:56:08 GMT, Yude Lin <github.com+16811675+linade at openjdk.org> wrote:

> A fix to the problem in 8272083 is to use a per-worker pointer to indicate where the worker has scanned up to, similar to the _scanned_to variable. The difference is this pointer (I call it _iterated_to) records the end of the object and _scanned_to records the end of the scan. Since we always scan with increasing addresses, the end of the latest object scanned is also the address where BOT has fixed up to. So we avoid having to fix below this address when calling block_start(). This implementation approximately reduce the number of calls to set_offset_array() during scan_heap_roots() 2-10 times (in my casual test with -XX:G1ConcRefinementGreenZone=1000000).
> 
> What this approach not solving is random access to BOT. So far I haven't found anything having this pattern.

I had an implementation that does a backward scan. It ended up not as clean as the '_iterated_to' solution. Yes, the '_iterated_to' solution still leaves some excess work. I think it only happens when (correct me if I'm wrong):
1. Worker1 is working on chunk1 and it's about to finish, worker2 starts working on chunk2 that is just behind chunk1; and
2. There is a gc-allocated block spans chunk1 and chunk2.
If worker2 starts with a card that falls into this gc-allocated block, it will have to fix some BOT entries that might have been fixed by worker 1. It only happens once, since after this fix, worker 2 has its own '_iterated_to' hint.

I considered how to do concurrent fixing. Suppose we use a per-region pointer to record where BOT has fixed up to and all workers update it. Still using the above scenario, if worker1 is in the middle of processing chunk1 and worker2 gets to fix something when processing chunk2. Then the pointer will be updated to something > chunk1.end(). Then worker1 will not benefit from this pointer anymore since it's not pointing into chunk1.

Furthermore, it's hard to guarantee updates to be visible by worker2 in time. Worker1 only has useful information for worker2 when worker1 is about to finish chunk1. Worker2 only benefits from this information when it starts on chunk2 before worker1 claims it:

                                  worker2 claims chunk2
worker1 about to finish chunk1 [                         ] worker1 tries to claim chunk2

The window between '[ ]' is small, making worker2 hard to benefit.

In summary, I think the per-worker '_iterated_to' solution has a somewhat small excess work; I don't see the concurrent fixing offers a lot more; the backward scan solution solves all problem, but the reason I like it a little less is that it's less straightforward, and essentially it's requiring the user (scanner) to make adjustment for a BOT problem.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5039