RFR: 8276098: Do precise BOT updates in G1 evacuation phase

Wed Nov 3 16:12:25 UTC 2021

On Fri, 29 Oct 2021 08:23:30 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

> Please review this change to do precise BOT updates in the G1 evacuation phase.
> 
> **Summary**
> In G1 young collections the BOT is updated for objects copied to old generation regions. Prior to this fix the BOT updates are very crude and only done for each new PLAB and for direct allocations (large allocation outside the PLABs).
> 
> The BOT is then updated to be more precise during concurrent refinement and when scanning the heap in later GCs. This leads to both more time spent doing concurrent refinement as well as prolonged "scan heap" phases in the following GCs.
> 
> With this change we instead update the BOT to be complete and precise while doing the copy. This way we can reduce the time in the following phases quite significantly. This comes with a slight regression in object copy times, but from my measurements the overall gain is worth the complexity and extra time spent in object copy. 
> 
> Doing this more precise BOT updating requires us to not rely on a global threshold for updating the BOT but instead calculate where the updates are done, this allows us to remove a lock in the old generation allocation path which is only present to guard this threshold. So with this change we can remove the different allocation paths used for young and old regions.
> 
> **Testing**
> All testing look good:
> - [x] Mach5 tier1-5
> - [x] Local stress testing
> - [x] Performance testing and pause time comparisons

This is a first cut of ideas to improve the change a bit - I really like it from a technical point of view. However it feels kind of patched in instead of some part of a whole.

Particularly one issue I would like to discuss in detail is that we now do bot updates at different levels (`G1PLABAllocator`, `G1AllocRegion`). The suggestion to move the BOT update for the waste into `G1PLAB::retire_internal` could fix that though. Could you look into this refactoring suggestion in more detail please?

Other comments may be outdated if/after this change.

src/hotspot/share/gc/g1/g1Allocator.hpp line 155:

> 153:   bool is_allocated();
> 154:   HeapWord* get_filler();
> 155:   size_t get_filler_size();

Maybe instead of two methods that return the remaining space, one that returns a `MemRegion` would be nicer? Also call it something like `remainder` or so.

What about making `PLAB::retire_internal` virtual and override here, so that the explicit call in `G1PLABAllocator::allocate_direct_or_new_plab` goes away? (and these two helpers, and probably also `is_allocated()`).

Also the explict call in `G1PLABAllocator::flush_and_retire_stats` could maybe be hidden too. You could store the `G1PLABAllocator` in `G1PLAB` so that it can call the `update_bot....` method in `retire_internal`.

src/hotspot/share/gc/g1/g1Allocator.hpp line 170:

> 168: 
> 169:   // Region where the current old generation PLAB is allocated. Used to do BOT updates.
> 170:   HeapRegion* _bot_plab_region;

I think it is a breakage of abstraction if we only store this information for old gen - we after all allocate the `G1PLAB` for all `G1HeapRegionAttr::num` destination areas.
What about putting all that stuff into `G1PLAB` and initializing it appropriately?

src/hotspot/share/gc/g1/g1Allocator.hpp line 173:

> 171:   // Current BOT threshold, a PLAB allocation crossing this threshold will cause a BOT
> 172:   // update.
> 173:   HeapWord* _bot_plab_threshold;

This is also only interesting for the old generation, isn't it? Same issue as above.

src/hotspot/share/gc/g1/g1Allocator.inline.hpp line 137:

> 135: 
> 136: inline void G1PLABAllocator::update_bot_for_plab_waste(G1HeapRegionAttr attr, G1PLAB* plab) {
> 137:   if (!attr.is_old()) {

I would prefer to make an extra predicate like `needs_bot_update()` for this instead of explictly replicating the code in a few places (and always adding the comment).

src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 503:

> 501:     } else {
> 502:       assert(dest_attr.is_old(), "Only update bot for allocations in old");
> 503:       _plab_allocator->update_bot_for_object(obj_ptr, word_sz);

Maybe it would be good if the `PLABAllocator` returned a struct instead of just a pointer that contains
- the HeapWord
- word_sz
- whether it was a direct allocation
?

Then this struct could be passed in here again instead of the code for `update_bot_for_object` trying to reconstruct whether it has been an out-of-plab allocation or not.

Or just skip the `word_sz` in that struct. Alternatively add a return value whether this allocation has been out-of-plab. But this method already has lots of locals (is too long?), so starting to group them might be a good idea.

I think explicitly carrying this information around would be much much cleaner to understand than trying to reconstruct that information later in `update_bot_for_object´ via

  if (!alloc_buffer(G1HeapRegionAttr::Old, 0)->contains(obj_start)) {
    // Out of PLAB allocation, BOT already updated.
    return;
  }

-------------

Changes requested by tschatzl (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6166