RFR: 8258431: Provide a JFR event with live set size estimate [v9]

Wed Mar 3 12:19:50 UTC 2021

On Tue, 2 Mar 2021 14:33:14 GMT, Jaroslav Bachorik <jbachorik at openjdk.org> wrote:

>> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>> 
>> ## Introducing new JFR event
>> 
>> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. 
>> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>> 
>> ## Implementation
>> 
>> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>> 
>> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>> 
>> ### Epsilon GC
>> 
>> Trivial implementation - just return `used()` instead.
>> 
>> ### Serial GC
>> 
>> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>> 
>> ### Parallel GC
>> 
>> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>> 
>> ### G1 GC
>> 
>> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>> 
>> ### Shenandoah
>> 
>> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
>> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>> 
>> ### ZGC
>> 
>> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.
>
> Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add tests for the heap usage summary event

Fwiw, the change still does not capture G1 full gc `live_estimate()`.

src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 1070:

> 1068: 
> 1069:     uint num_selected_for_rebuild() const { return _num_regions_selected_for_rebuild; }
> 1070:     size_t live_estimate() const { return _live; }

Please sync the member name with the getter name. I.e. `_live` -> `_live_estimate`

src/hotspot/share/gc/parallel/psAdaptiveSizePolicy.hpp line 60:

> 58: class PSAdaptiveSizePolicy : public AdaptiveSizePolicy {
> 59:  friend class PSGCAdaptivePolicyCounters;
> 60:  friend class ParallelScavengeHeap;

Delete this apparently unneeded friend declaration (compiled successfully without here)

src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 87:

> 85: 
> 86:   // in order to provide accurate estimate this method must be called only when the heap has just been collected and compacted
> 87:   inline void capture_live();

Sentences should start with upper case in the comment. Also I'd prefer to name the method `update_live_estimate()` instead.

src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 182:

> 180:   G1BlockOffsetTable* _bot;
> 181: 
> 182:   volatile size_t _live;

I'm not happy with naming this `_live`, better use `_live_estimate`. The contents are not continuously updated and basically out of date after the first following allocation.
This includes the naming in all other instances too.

src/hotspot/share/gc/serial/serialHeap.hpp line 44:

> 42:   MemoryPool* _old_pool;
> 43: 
> 44:   size_t _live_size;

Please rename to `_live_estimate` like the others. Avoid having different names in different collectors for the same thing.

src/hotspot/share/gc/shared/space.inline.hpp line 128:

> 126:           p2i(dead_start), p2i(dead_end), dead_length * HeapWordSize);
> 127: 
> 128:       _dead_space += dead_length;

I do not think adding this to the counter here instead of the other method for every object makes a difference performance-wise.

As mentioned before, `_allowed_deadspace_words` counts *down* from `(space->capacity() * ratio / 100) / HeapWordSize;` to whatever end value.

So at the end of collection, `(space->capacity() * ratio / 100) / HeapWordSize - _allowed_deadspace_words` should be equal to what `_dead_space` is now.

Please add a getter to `DeadSpacer` that calculates this (factoring out the calculation of the maximum allowed deadspace).

src/hotspot/share/gc/shared/space.hpp line 553:

> 551:   size_t capacity() const        { return byte_size(bottom(), end()); }
> 552:   size_t used() const            { return byte_size(bottom(), top()); }
> 553:   size_t live() const            {

The code for serial gc, contrary to others, tries to give some resemblance of tracking actual liveness. I.e. calculating this anew every call to `SerialHeap::live()`.
However if calling an `update_live_estimate()` in parallel and G1 (and the other collectors) is fine at certain places, this should be as good for serial gc.
Doing so would reduce the footprint of this change quite a bit (for serial gc)

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579