Can GC implementations provide a cheap estimation of live set size?
Roman Kennke
rkennke at redhat.com
Thu Feb 11 17:55:24 UTC 2021
Notice that liveness information is only somewhat reliable right after
marking. In Shenandoah, this is in the final-mark pause, and then the
program is at a safepoint already. This is where you'd want to emit a
JMX event or something similar. You can't simply query a counter and
assume it represents current liveness in the middle or outside of GC
cycle. This should be true for all GCs.
For Serial and Parallel I am not sure at all that you can do this.
AFAIK, they don't count liveness at all.
Roman
> Hi Roman,
>
> Thanks for your response. I checked ZGC implementation and, indeed, it
> is very easy to get the liveness information just by extending
> `ZStatHeap` class to report the last valid value of
> `_at_mark_end.live`.
>
> I am also able to get this info from Shenandoah, although my first
> attempt still involves a safepointing VM operation since I need to
> iterate over regions to get the liveness info for each of them and sum
> it up. I think it is still an acceptable trade-off, though.
>
> The next one in the queue is the Serial GC. My assumptions, based on
> reading the code, are that for young gen 'live = used' at the end of
> DefNewGeneration::collect() method and for old gen 'live = used -
> slack' (slack is the cumulative size of objects considered to be alive
> for the purpose of compaction although they are really dead - see
> CompactibleSpace::scan_and_forward()). Does this sound reasonable?
>
> I will post my findings for Parallel GC and G1 GC later.
>
> Cheers,
>
> -JB-
>
> On Wed, Feb 10, 2021 at 11:34 AM Roman Kennke <rkennke at redhat.com> wrote:
>>
>> Hello Jaroslav,
>>
>>> In connection with https://bugs.openjdk.java.net/browse/JDK-8258431 I
>>> am trying to figure out whether providing a cheap estimation of live
>>> set size is something actually achievable across various GC
>>> implementations.
>>>
>>> What I am looking at is piggy-backing on a concurrent mark task to get
>>> the summary size of live objects - using the 'straight-forward'
>>> heap-inspection like approach is prohibitively expensive.
>>
>> In Shenandoah, this information is already collected during concurrent
>> marking. We currently don't print it directly, but we could certainly do
>> that. I'll look into implementing it. I'll also look into exposing
>> liveness info via JMX.
>>
>> I'm not quite sure about G1: that information would only be collected
>> during mixed or full collections. I am not sure if G1 prints it, though.
>>
>> ZGC prints this under -Xlog:gc+heap:
>>
>> [6,502s][info][gc,heap ] GC(0) Mark Start
>> Mark End Relocate Start Relocate End High
>> Low
>> [6,502s][info][gc,heap ] GC(0) Capacity: 834M (10%)
>> 1076M (13%) 1092M (14%) 1092M (14%) 1092M (14%)
>> 834M (10%)
>> [6,502s][info][gc,heap ] GC(0) Free: 7154M (90%)
>> 6912M (87%) 6916M (87%) 7388M (92%) 7388M (92%)
>> 6896M (86%)
>> [6,502s][info][gc,heap ] GC(0) Used: 834M (10%)
>> 1076M (13%) 1072M (13%) 600M (8%) 1092M (14%)
>> 600M (8%)
>> [6,502s][info][gc,heap ] GC(0) Live: -
>> 195M (2%) 195M (2%) 195M (2%) -
>> -
>> [6,502s][info][gc,heap ] GC(0) Allocated: -
>> 242M (3%) 270M (3%) 380M (5%) -
>> -
>> [6,502s][info][gc,heap ] GC(0) Garbage: -
>> 638M (8%) 606M (8%) 24M (0%) -
>> -
>> [6,502s][info][gc,heap ] GC(0) Reclaimed: -
>> - 32M (0%) 614M (8%) -
>> -
>>
>> I hope that is useful?
>>
>> Thanks,
>> Roman
>>
>
More information about the hotspot-gc-dev
mailing list