on heap overhead of ZGC

Sun Apr 23 18:28:07 UTC 2023

Hi Stefan.

Thank you for the explanations. Makes sense.

Alen

V V čet., 20. apr. 2023 ob 17:15 je oseba Stefan Karlsson <
stefan.karlsson at oracle.com> napisala:

> Hi Alen,
>
> On 2023-04-19 13:38, Alen Vrečko wrote:
> > Hello everyone.
> >
> > I did my best to search the web for any prior discussion on this.
> > Haven't found anything useful.
> >
> > I am trying to understand why there is a noticeable difference between
> > the size of all objects on the heap and the heap used (after full GC).
> > The heap used size can be 10%+ more than the size of of all live objects.
>
> The difference can come from two sources:
>
> 1) There is unusable memory in the regions, caused by address and size
> alignment requirements. (This is probably what you see in your 4MB array
> test)
>
> 2) We only calculate the `live` value when we mark through objects on
> the heap. While ZGC runs a cycle it more or less ignores new objects
> that the Java threads allocate during the GC cycle. This means that the
> `used` value will increase, but the `live` value will not be updated
> until the Mark End of the next GC cycle..
>
> The latter also makes it bit misleading to look at the used value after
> the GC cycle. With stop-the-world GCs, that used value is an OK
> approximation of what is live on the heap. With a concurrent GC it
> includes all the memory allocated during GC cycle. The used numbers is
> still true, but some (many) of those allocated objects where short-lived
> and died, but we won't be able to figure that out until the next GC cycle.
>
> I don't know if you have seen this, but we have a table where we try to
> give you  a picture of how the values progressed during the GC cycle:
>
> [6.150s][1681984168056ms][info ][gc,heap     ] GC(0) Mark Start
> Mark End        Relocate Start      Relocate End
> High               Low
> [6.150s][1681984168056ms][info ][gc,heap     ] GC(0)  Capacity: 3282M
> (10%)        3536M (11%)        3580M (11%)        3612M (11%)
> 3612M (11%)        3282M (10%)
> [6.150s][1681984168056ms][info ][gc,heap     ] GC(0)      Free: 28780M
> (90%)       28538M (89%)       28720M (90%)       29066M (91%)
> 29068M (91%)       28434M (89%)
> [6.150s][1681984168056ms][info ][gc,heap     ] GC(0)      Used: 3234M
> (10%)        3476M (11%)        3294M (10%)        2948M (9%)
> 3580M (11%)        2946M (9%)
> [6.150s][1681984168056ms][info ][gc,heap     ] GC(0) Live:
> -              2496M (8%)         2496M (8%) 2496M (8%)
> -                  -
> [6.150s][1681984168056ms][info ][gc,heap     ] GC(0) Allocated:
> -               242M (1%)          364M (1%)          411M
> (1%)             -                  -
> [6.150s][1681984168056ms][info ][gc,heap     ] GC(0) Garbage:
> -               737M (2%)          433M (1%)           39M
> (0%)             -                  -
> [6.150s][1681984168056ms][info ][gc,heap     ] GC(0) Reclaimed:
> -                  -               304M (1%)          697M
> (2%)             -                  -
>
> The `garbage` at Mark End is a diff between what was `used` when the GC
> cycle started and what we later found to be `live` in that used memory.
>
> >
> > Let's say I want to fill 1 GiB heap with 4MiB byte[] objects.
> >
> > Naively I'd imagine I can store 1 GiB / 4MiB = 256 such byte[] on the
> > heap.
> >
> > (I made a simple program that just allocates byte[], stores it in a
> > list, does GC and waits (so I can do jcmd or similar), nothing else)
> >
> > With EpsilonGC -> 255x 4MiB byte[] allocations, after this the app
> > crashes with out of memory
> > With SerialGC  -> 246x
> > With ParallelGC -> 233x
> > With G1GC -> 204x
> > With ZGC -> 170x
> >
> > For example in the ZGC case, where I have 170x of 4MiB byte[] on the
> heap.
> >
> > GC.heap_info:
> >
> >  ZHeap           used 1022M, capacity 1024M, max capacity 1024M
> >  Metaspace       used 407K, committed 576K, reserved 1114112K
> >   class space    used 24K, committed 128K, reserved 1048576K
> >
> > GC.class_histogram:
> >
> > Total         15118      713971800 (~714M)
> >
> > In this case does it mean ZGC is wasting 1022M - 714M = 308M for doing
> > its "thing"? This is like 1022/714= 43% overhead?
>
> My guess is that the object header (typically 16 bytes) pushed the
> objects size slightly beyond 4MB. ZGC allocates large objects in their
> own region. Those regions are 2MB aligned, which makes your ~4MB objects
> `use` 6MB.
>
> You would probably see similar results with G1 when the heap region size
> is increased, which happens when the heap max size is larger. You can
> test that by explicitly running with -XX:G1HeapRegionSize=2MB, to use a
> larger heap region size.
>
> >
> > This example might be convoluted and atypical of any production
> > environment.
> >
> > I am seeing the difference between live set and heap used in
> > production at around 12.5% for 3 servers looked at.
> >
> > Is there any other way to estimate the overhead apart from looking at
> > the difference between the live set and heap used? Does ZGC have any
> > internal statistic of the overhead?
>
> I don't think we have a way to differentiate between the overhead caused
> by (1) and (2) above.
>
> >
> > I'd prefer not to assume 12.5% is the number to use and then get
> > surprised that in some case it might be 25%?
>
> The overhead of yet-to-be collected garbage can easily be above 25%. It
> all depend on the workload. We strive to keep the fragmentation below
> the -XX:ZFragmentationLimit, which is set tot 25% by default, but that
> doesn't include the overhead of newly allocated object (and it doesn't
> include the large objects).
>
> >
> > Do you have any recommendations regarding ZGC overhead when estimating
> > heap space?
>
> Unfortunately, I can't. It depends on how intricate the object graph is
> (meaning how long it will take to mark through it), how many live
> objects you have, the allocation rate, number of cores, etc. There's a
> constant race between the GC and the allocating Java threads. If the
> Java threads "win", and use up all memory before the GC can mark through
> the object graph and then give back memory to the Java application, then
> the Java threads will stall waiting for more memory. You need to test
> with your workload and see if you've given enough heap memory to allow
> ZGC to complete its cycles without causing allocation stalls.
>
> Thanks,
> StefanK
>
> >
> > Thank you
> > Alen
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/zgc-dev/attachments/20230423/6757e720/attachment.htm>