Can GC implementations provide a cheap estimation of live set size?

Per Liden per.liden at oracle.com
Mon Feb 15 10:58:52 UTC 2021


On 2/15/21 11:47 AM, Jaroslav Bachorík wrote:
> On Mon, Feb 15, 2021 at 11:24 AM Per Liden <per.liden at oracle.com> wrote:
>>
>> Hi,
>>
>> On 2/15/21 10:44 AM, Jaroslav Bachorík wrote:
>>> Hi again,
>>>
>>> I continued experimenting with Shenandoah and ZGC which already are
>>> tracking liveness. I am emitting a (partially filled) GCHeapSummary
>>> JFR event to capture used/live sizes.
>>> For Shenandoah the event is emitted at the very end of the
>>> `ShenandoahConcurrentGC::op_final_mark()` method and for ZGC it is the
>>> `ZMark::end()` method. The exact changes can be checked via branch
>>> comparison (https://github.com/openjdk/jdk/compare/master...DataDog:jb/live_set_1)
>>> but bear in mind that this is just an experimental code with no
>>> intention being checked in in its current form.
>>>
>>> Unfortunately, when I run an application on such modified JVM and
>>> collect a JFR recording the live set size numbers seem a bit 'low' -
>>> eg. on both ZGC and Shenandoah (using an already available liveness
>>> info) the reported liveness is ~50% of the reported usage. Is there a
>>> good explanation for this?
>>
>> When you create the GCHeapSummary, the "live" value reflects what was
>> live after marking, while the "used" value reflects the usage when the
>> GC cycle ended. So, after marking ended, some amount of garbage was
>> likely reclaimed, but then new objects were also allocated. For ZGC
>> (don't know if Shenandoah shows this), you can see details of how much
>> was reclaimed and how much was allocated in the GC log.
> 
> Definitely - it's just that a diff of >100MB (eg. for ZGC 350MB used
> vs. 170MB live) struck me as a bit suspicious. But maybe it is
> expected.

It's impossible to say if it's expected or not, without knowing what the 
application is doing, it's allocation rate, etc. The application could 
be allocating several gigabytes per second, in which case the diff could 
be large. However, if the application is just idling and isn't 
allocating anything, then live is expected to be equal (or close to 
equal) to used.

/Per

> 
> -JB-
> 
>>
>> /Per
>>
>>>
>>> Thanks!
>>>
>>> -JB-
>>>
>>> On Thu, Feb 11, 2021 at 7:09 PM Jaroslav Bachorík
>>> <jaroslav.bachorik at datadoghq.com> wrote:
>>>>
>>>> On Thu, Feb 11, 2021 at 6:55 PM Roman Kennke <rkennke at redhat.com> wrote:
>>>>>
>>>>> Notice that liveness information is only somewhat reliable right after
>>>>> marking. In Shenandoah, this is in the final-mark pause, and then the
>>>>
>>>> Yes, I understand this. What I am looking at is to have something like
>>>> 'last known liveness' value - captured at a well defined point and
>>>> providing an estimate within the bounds of GC implementation.
>>>>
>>>>> program is at a safepoint already. This is where you'd want to emit a
>>>>> JMX event or something similar. You can't simply query a counter and
>>>>> assume it represents current liveness in the middle or outside of GC
>>>>> cycle. This should be true for all GCs.
>>>>>
>>>>> For Serial and Parallel I am not sure at all that you can do this.
>>>>> AFAIK, they don't count liveness at all.
>>>>>
>>>>> Roman
>>>>>
>>>>>> Hi Roman,
>>>>>>
>>>>>> Thanks for your response. I checked ZGC implementation and, indeed, it
>>>>>> is very easy to get the liveness information just by extending
>>>>>> `ZStatHeap` class to report the last valid value of
>>>>>> `_at_mark_end.live`.
>>>>>>
>>>>>> I am also able to get this info from Shenandoah, although my first
>>>>>> attempt still involves a safepointing VM operation since I need to
>>>>>> iterate over regions to get the liveness info for each of them and sum
>>>>>> it up. I think it is still an acceptable trade-off, though.
>>>>>>
>>>>>> The next one in the queue is the Serial GC. My assumptions, based on
>>>>>> reading the code, are that for young gen 'live = used' at the end of
>>>>>> DefNewGeneration::collect() method and for old gen 'live = used -
>>>>>> slack' (slack is the cumulative size of objects considered to be alive
>>>>>> for the purpose of compaction although they are really dead - see
>>>>>> CompactibleSpace::scan_and_forward()). Does this sound reasonable?
>>>>>>
>>>>>> I will post my findings for Parallel GC and G1 GC later.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>>> On Wed, Feb 10, 2021 at 11:34 AM Roman Kennke <rkennke at redhat.com> wrote:
>>>>>>>
>>>>>>> Hello Jaroslav,
>>>>>>>
>>>>>>>> In connection with https://bugs.openjdk.java.net/browse/JDK-8258431 I
>>>>>>>> am trying to figure out whether providing a cheap estimation of live
>>>>>>>> set size is something actually achievable across various GC
>>>>>>>> implementations.
>>>>>>>>
>>>>>>>> What I am looking at is piggy-backing on a concurrent mark task to get
>>>>>>>> the summary size of live objects - using the 'straight-forward'
>>>>>>>> heap-inspection like approach is prohibitively expensive.
>>>>>>>
>>>>>>> In Shenandoah, this information is already collected during concurrent
>>>>>>> marking. We currently don't print it directly, but we could certainly do
>>>>>>> that. I'll look into implementing it. I'll also look into exposing
>>>>>>> liveness info via JMX.
>>>>>>>
>>>>>>> I'm not quite sure about G1: that information would only be collected
>>>>>>> during mixed or full collections. I am not sure if G1 prints it, though.
>>>>>>>
>>>>>>> ZGC prints this under -Xlog:gc+heap:
>>>>>>>
>>>>>>> [6,502s][info][gc,heap     ] GC(0)                Mark Start
>>>>>>> Mark End        Relocate Start      Relocate End           High
>>>>>>>           Low
>>>>>>> [6,502s][info][gc,heap     ] GC(0)  Capacity:      834M (10%)
>>>>>>> 1076M (13%)        1092M (14%)        1092M (14%)        1092M (14%)
>>>>>>>          834M (10%)
>>>>>>> [6,502s][info][gc,heap     ] GC(0)      Free:     7154M (90%)
>>>>>>> 6912M (87%)        6916M (87%)        7388M (92%)        7388M (92%)
>>>>>>>         6896M (86%)
>>>>>>> [6,502s][info][gc,heap     ] GC(0)      Used:      834M (10%)
>>>>>>> 1076M (13%)        1072M (13%)         600M (8%)         1092M (14%)
>>>>>>>          600M (8%)
>>>>>>> [6,502s][info][gc,heap     ] GC(0)      Live:         -
>>>>>>> 195M (2%)          195M (2%)          195M (2%)             -
>>>>>>>            -
>>>>>>> [6,502s][info][gc,heap     ] GC(0) Allocated:         -
>>>>>>> 242M (3%)          270M (3%)          380M (5%)             -
>>>>>>>            -
>>>>>>> [6,502s][info][gc,heap     ] GC(0)   Garbage:         -
>>>>>>> 638M (8%)          606M (8%)           24M (0%)             -
>>>>>>>            -
>>>>>>> [6,502s][info][gc,heap     ] GC(0) Reclaimed:         -
>>>>>>>      -                32M (0%)          614M (8%)             -
>>>>>>>           -
>>>>>>>
>>>>>>> I hope that is useful?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Roman
>>>>>>>
>>>>>>
>>>>>


More information about the hotspot-jfr-dev mailing list