RFR: 8272083: G1: Record iterated range for BOT performance during card scan [v3]

Fri Oct 8 11:22:10 UTC 2021

On Fri, 8 Oct 2021 10:17:14 GMT, Yude Lin <github.com+16811675+linade at openjdk.org> wrote:

> > Not sure about whether the complexity for using the bitmap level as storage is worth the effort: in my testing I have never even come close to 512 PLABs per region. In that case (or even earlier), probably just bail out, drop the whole task and do nothing as with that many PLABs the amount of overlap during gc is likely to be small. I need to do some more testing and thinking about this though.
> 
> Usually I observe small plabs (~2k in size) at the beginning of my specjbb tests. Maybe it's because the heuristics are yet to adjust the plab size. I guess my only concern is, if this is the case, we will find ourselves allocating a lot of large c-heap arrays, which might be bad for pause times. But I think bailing out is also a good choice, considering it simplifies the card set design a lot.

That complexity reduction is the goal. Otherwise, just do a per-region bitmap: we are unlikely to expect a large part of the heap to be copied over at once, and even for the whole heap this would be (if I calculated correctly) around 0.02% of the heap (assuming it is worth doing). Particularly when recycling these bitmaps, allocation costs shouldn't be that bad either.

> > Actually I have seen only mid single digit number of plabs per region whatever I have been running; so I even kind of think it might be useful to decrease the maximum PLAB size to have more of those so that more threads can work on these and the individual BOT fixup is faster (to abort faster). I have no particular guidance here at this time of how large is too large; but something like half or a third of a region for 32m regions is quite a bit to chew on :) This of course affects the storage needs, but this limit should always be so that we would never want to use the bitmap.
> 
> I imagine imposing a max plab size will affect evacuation efficiency right? I'm not sure about how this weighs in..

Idk :) However in steady state the PLABs are expected to be quite large.

> > There is another option for storing whether this part of the BOT is unrefined yet: take a bit from the BOT values themselves to encode that.
> > I did not look whether this is actually possible with the current encoding, but is an option if that does not take away too much of the range of the backskip - at the moment we use the values 0-63 (*8 = 512) to encode the offsets within the card, all higher values are backskip values after all.
> > I.e. it might even be that extremely high backskip values which would not be used in all but huge arrays (if at all) are available for such a thing.
> > Just some weird idea that came to my mind...
> 
> I think the backskip value is log_16(number_of_plabs_to_skip). So I don't think there is that big an array to require a large backskip value. A back_skip=4 with plab_size=512 means a 32m skip, already bigger than the biggest non-humongous object. So maybe the highest bit is always available.
> 
> Nontheless, this mean we change a lot of BOT code. It becomes more complicated and slightly slower, e.g., when doing atomic updates to an entry, originally we do `Atomic::store()` and now we do a looped `Atomic::cmpxchg()` to preserve the special bit. Can we afford that?

Idk since I did not measure, but it sounds so. Probably just an extra bitmap per region per card is better.

Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/5039