RFR: 8306920: G1: Calculate garbage bytes for evacuation failed regions from marked live bytes

Thu May 25 13:55:57 UTC 2023

On Thu, 25 May 2023 11:57:23 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

> > but currently is calculated in post evacuation phase 2
> 
> Which task specifically in phase 2? I thought it's `RestoreRetainedRegionsTask` that calculates garbage-bytes/live-bytes, which lives in phase 1.

Apologies, my bad. This change series has been on my plate for so long now, and since then I've already been working on so many different things, partially far ahead, that I forget things (the following should also be written down in the comment for [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326) referenced in that PR).

It is indeed `RestoreRetainedRegionsTask` in phase 1 that does that work. The problem is completely different, with a dependency clearing the card table (in post phase 1) and redirtying cards (in post phase 2), that is problematic in [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326) where during gc g1 is going to collect cards into the evacuation failed regions unconditionally. These cards (at least the ones that are kept) need to be redirtied like others.

So depending on liveness g1 is going to decide whether to keep these in post phase 1 (during gc there is too little information) and push them into the global dcqs (in post phase 1, when merging the PSS). Redirtying in post phase 2 needs the DCQs complete. That means, that liveness information is required to be complete before phase 1.

There is the option which separates merging and redirtying of the selected local DCQs'es into phase 2 (i.e. a phase that implements both selecting interesting evacuation failure regions and redirtying at the same time), butI did not want to do redirtying in two separate phases depending on source.

Maybe I should implement that, and basically drop this change?

(I still like this change in some way because it avoids the fairly complicated incremental liveness calculation before in `RestoreRetainedRgionsTask`, but ymmv)

Implemented is determining liveness in that (serial) phase 0. The reason is that determining liveness is very short and fairly constant time (flushing the merge stats cache basically, i.e. the mentioned O(# gc threads)). Merging the PSS can be a bit slow now (merging the local DCQs into the whole, particularly dropping unneeded DCQs'es - but it is nicely hidden in post phase 1; I think Kim has been talking about improving DCQ management a few times in the past), and moving out redirtying into a new post phase 3 can be even slower.

This seemed the best tradeoff between code complexity and pause time impact.

This PR already mentions that this change is fairly closely tied to the other. I already mentioned (internally) that I think it may be too small now (it was part of a more general improvement to bitmap/tams/live bytes management, i.e. "always" clearing them together), but I posted it anyway separately. I can easily merge them; [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326) is done and tested already).

> 
> > there is no (problematic) overlap with its use during marking
> 
> Even if it's correct, I find it a bit hacky to reuse marking-related data structure like this, `add_to_liveness` uses `_mark_stats_cache`.
> 
> (`G1CMBitMap` is also used outside conc-mark, but I'd say that it should be brought up to the heap-level; bitmap, mirroring the whole heap, is not really tied to conc-mark.)

I think both the sharing of `G1CMBitMap` and the live-bytes machinery (including `mark_stats_cache`) are on the same level of ugliness/hackiness, both of them are mainly marking data structures that are (mis-)used for evacuation failure rarely (even with region pinning this will only be a small fraction).

We've discussed this already when using the bitmap for this recovery mechanism and agreed to hold our noses then (actually I remember that I've been fairly if not most vocal about this being really ugly).

Wrt to the mark bitmap not really tied to conc-mark:
Since `G1CMBitMap` is a single large reservation (ease of implementation), it has historically been managed by `G1CollectedHeap`. Large data structures always end up there :) Imho conceptually the bitmap is a marking data structure and should be "owned" by marking (at least the parts it uses). It is also called mark bitmap for a reason.

I even think that tying bitmaps 1:1 to the region memory (again, done because of ease of implementation) is not advantageous apart from a simplicity POV. A large part of the heap (and bitmap) is unlikely to be marked through (e.g. young gen, large parts of humongous regions), so I think the current way of managing it is of a waste of (committed) memory on longer running applications.
What helps wrt to memory consumption currently is the fact that committing does not actually back the reservation with physical memory, so this saves a little, but once an area has ever been touched, it's not going away at the moment unless the corresponding heap region is uncommitted.

One option that looks good to me would be some global bitmap manager where both the evacuation failure and marking are users of, making both of them temporary owners of whatever parts they use. That component could also take care of (concurrent) clearing, keeping "enough" bitmap pages committed etc. Also dropping that 1:1 mapping.

I could either
* duplicate the live bytes/mark stats cache for evacuation failure if you prefer (for not much gain).
* or try above suggested modification of [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326) to have split card redirtying for the gc global DCQS and the per-thread local ones (that thinking about it, seems to be the best option, but may have issues with work distribution and/or unforeseen other issues).

What do you think?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14118#issuecomment-1562950742