Can GC implementations provide a cheap estimation of live set size?

Mon Feb 15 10:47:01 UTC 2021

On Mon, Feb 15, 2021 at 11:24 AM Per Liden <per.liden at oracle.com> wrote:
>
> Hi,
>
> On 2/15/21 10:44 AM, Jaroslav Bachorík wrote:
> > Hi again,
> >
> > I continued experimenting with Shenandoah and ZGC which already are
> > tracking liveness. I am emitting a (partially filled) GCHeapSummary
> > JFR event to capture used/live sizes.
> > For Shenandoah the event is emitted at the very end of the
> > `ShenandoahConcurrentGC::op_final_mark()` method and for ZGC it is the
> > `ZMark::end()` method. The exact changes can be checked via branch
> > comparison (https://github.com/openjdk/jdk/compare/master...DataDog:jb/live_set_1)
> > but bear in mind that this is just an experimental code with no
> > intention being checked in in its current form.
> >
> > Unfortunately, when I run an application on such modified JVM and
> > collect a JFR recording the live set size numbers seem a bit 'low' -
> > eg. on both ZGC and Shenandoah (using an already available liveness
> > info) the reported liveness is ~50% of the reported usage. Is there a
> > good explanation for this?
>
> When you create the GCHeapSummary, the "live" value reflects what was
> live after marking, while the "used" value reflects the usage when the
> GC cycle ended. So, after marking ended, some amount of garbage was
> likely reclaimed, but then new objects were also allocated. For ZGC
> (don't know if Shenandoah shows this), you can see details of how much
> was reclaimed and how much was allocated in the GC log.

Definitely - it's just that a diff of >100MB (eg. for ZGC 350MB used
vs. 170MB live) struck me as a bit suspicious. But maybe it is
expected.

-JB-

>
> /Per
>
> >
> > Thanks!
> >
> > -JB-
> >
> > On Thu, Feb 11, 2021 at 7:09 PM Jaroslav Bachorík
> > <jaroslav.bachorik at datadoghq.com> wrote:
> >>
> >> On Thu, Feb 11, 2021 at 6:55 PM Roman Kennke <rkennke at redhat.com> wrote:
> >>>
> >>> Notice that liveness information is only somewhat reliable right after
> >>> marking. In Shenandoah, this is in the final-mark pause, and then the
> >>
> >> Yes, I understand this. What I am looking at is to have something like
> >> 'last known liveness' value - captured at a well defined point and
> >> providing an estimate within the bounds of GC implementation.
> >>
> >>> program is at a safepoint already. This is where you'd want to emit a
> >>> JMX event or something similar. You can't simply query a counter and
> >>> assume it represents current liveness in the middle or outside of GC
> >>> cycle. This should be true for all GCs.
> >>>
> >>> For Serial and Parallel I am not sure at all that you can do this.
> >>> AFAIK, they don't count liveness at all.
> >>>
> >>> Roman
> >>>
> >>>> Hi Roman,
> >>>>
> >>>> Thanks for your response. I checked ZGC implementation and, indeed, it
> >>>> is very easy to get the liveness information just by extending
> >>>> `ZStatHeap` class to report the last valid value of
> >>>> `_at_mark_end.live`.
> >>>>
> >>>> I am also able to get this info from Shenandoah, although my first
> >>>> attempt still involves a safepointing VM operation since I need to
> >>>> iterate over regions to get the liveness info for each of them and sum
> >>>> it up. I think it is still an acceptable trade-off, though.
> >>>>
> >>>> The next one in the queue is the Serial GC. My assumptions, based on
> >>>> reading the code, are that for young gen 'live = used' at the end of
> >>>> DefNewGeneration::collect() method and for old gen 'live = used -
> >>>> slack' (slack is the cumulative size of objects considered to be alive
> >>>> for the purpose of compaction although they are really dead - see
> >>>> CompactibleSpace::scan_and_forward()). Does this sound reasonable?
> >>>>
> >>>> I will post my findings for Parallel GC and G1 GC later.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> -JB-
> >>>>
> >>>> On Wed, Feb 10, 2021 at 11:34 AM Roman Kennke <rkennke at redhat.com> wrote:
> >>>>>
> >>>>> Hello Jaroslav,
> >>>>>
> >>>>>> In connection with https://bugs.openjdk.java.net/browse/JDK-8258431 I
> >>>>>> am trying to figure out whether providing a cheap estimation of live
> >>>>>> set size is something actually achievable across various GC
> >>>>>> implementations.
> >>>>>>
> >>>>>> What I am looking at is piggy-backing on a concurrent mark task to get
> >>>>>> the summary size of live objects - using the 'straight-forward'
> >>>>>> heap-inspection like approach is prohibitively expensive.
> >>>>>
> >>>>> In Shenandoah, this information is already collected during concurrent
> >>>>> marking. We currently don't print it directly, but we could certainly do
> >>>>> that. I'll look into implementing it. I'll also look into exposing
> >>>>> liveness info via JMX.
> >>>>>
> >>>>> I'm not quite sure about G1: that information would only be collected
> >>>>> during mixed or full collections. I am not sure if G1 prints it, though.
> >>>>>
> >>>>> ZGC prints this under -Xlog:gc+heap:
> >>>>>
> >>>>> [6,502s][info][gc,heap     ] GC(0)                Mark Start
> >>>>> Mark End        Relocate Start      Relocate End           High
> >>>>>          Low
> >>>>> [6,502s][info][gc,heap     ] GC(0)  Capacity:      834M (10%)
> >>>>> 1076M (13%)        1092M (14%)        1092M (14%)        1092M (14%)
> >>>>>         834M (10%)
> >>>>> [6,502s][info][gc,heap     ] GC(0)      Free:     7154M (90%)
> >>>>> 6912M (87%)        6916M (87%)        7388M (92%)        7388M (92%)
> >>>>>        6896M (86%)
> >>>>> [6,502s][info][gc,heap     ] GC(0)      Used:      834M (10%)
> >>>>> 1076M (13%)        1072M (13%)         600M (8%)         1092M (14%)
> >>>>>         600M (8%)
> >>>>> [6,502s][info][gc,heap     ] GC(0)      Live:         -
> >>>>> 195M (2%)          195M (2%)          195M (2%)             -
> >>>>>           -
> >>>>> [6,502s][info][gc,heap     ] GC(0) Allocated:         -
> >>>>> 242M (3%)          270M (3%)          380M (5%)             -
> >>>>>           -
> >>>>> [6,502s][info][gc,heap     ] GC(0)   Garbage:         -
> >>>>> 638M (8%)          606M (8%)           24M (0%)             -
> >>>>>           -
> >>>>> [6,502s][info][gc,heap     ] GC(0) Reclaimed:         -
> >>>>>     -                32M (0%)          614M (8%)             -
> >>>>>          -
> >>>>>
> >>>>> I hope that is useful?
> >>>>>
> >>>>> Thanks,
> >>>>> Roman
> >>>>>
> >>>>
> >>>