RFR: 8276098: Do precise BOT updates in G1 evacuation phase [v3]

Mon Nov 8 15:34:37 UTC 2021

On Mon, 8 Nov 2021 14:47:08 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

>> Please review this change to do precise BOT updates in the G1 evacuation phase.
>> 
>> **Summary**
>> In G1 young collections the BOT is updated for objects copied to old generation regions. Prior to this fix the BOT updates are very crude and only done for each new PLAB and for direct allocations (large allocation outside the PLABs).
>> 
>> The BOT is then updated to be more precise during concurrent refinement and when scanning the heap in later GCs. This leads to both more time spent doing concurrent refinement as well as prolonged "scan heap" phases in the following GCs.
>> 
>> With this change we instead update the BOT to be complete and precise while doing the copy. This way we can reduce the time in the following phases quite significantly. This comes with a slight regression in object copy times, but from my measurements the overall gain is worth the complexity and extra time spent in object copy. 
>> 
>> Doing this more precise BOT updating requires us to not rely on a global threshold for updating the BOT but instead calculate where the updates are done, this allows us to remove a lock in the old generation allocation path which is only present to guard this threshold. So with this change we can remove the different allocation paths used for young and old regions.
>> 
>> **Testing**
>> All testing look good:
>> - [x] Mach5 tier1-5
>> - [x] Local stress testing
>> - [x] Performance testing and pause time comparisons
>
> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Thomas review

Just some minor comments/suggestions.

src/hotspot/share/gc/g1/g1Allocator.cpp line 308:

> 306:       } else {
> 307:         _alloc_buffers[state][node_index] = new PLAB(_g1h->desired_plab_sz(state));
> 308:       }

I think ternary operator can be used here:

word_sz = _g1h->desired_plab_sz(state);
// ...
_alloc_buffers[state][node_index] = (state == G1HeapRegionAttr::Old)
                                  ? new G1BotUpdatingPLAB(word_sz)
                                  : new PLAB(word_sz);

src/hotspot/share/gc/g1/g1Allocator.hpp line 159:

> 157:   G1BotUpdatingPLAB(size_t word_sz) : PLAB(word_sz) { }
> 158:   // Sets the new PLAB buffer as well as updates the threshold and region.
> 159:   virtual void set_buf(HeapWord* buf, size_t word_sz);

Is `override` better here?

src/hotspot/share/gc/g1/heapRegion.inline.hpp line 248:

> 246:          "Should only calculate BOT threshold for old regions. addr: " PTR_FORMAT " region:" HR_FORMAT,
> 247:          p2i(addr), HR_FORMAT_PARAMS(this));
> 248:   return _bot_part.threshold_for_addr(addr);

Instead of calling `_bot_part.threshold_for_addr` multiple times, storing the result in a local var could be cleaner.

src/hotspot/share/gc/shared/plab.hpp line 122:

> 120: 
> 121:   // Sets the space of the buffer to be [buf, space+word_sz()).
> 122:   virtual void set_buf(HeapWord* buf, size_t new_word_sz) {

I wonder if `override` is better.

-------------

Changes requested by ayang (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6166