<div dir="ltr">Hi Stefan.<div><br></div><div>Thank you for the explanations. Makes sense.</div><div><br></div><div>Alen</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">V V čet., 20. apr. 2023 ob 17:15 je oseba Stefan Karlsson <<a href="mailto:stefan.karlsson@oracle.com">stefan.karlsson@oracle.com</a>> napisala:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Hi Alen,<br>

<br>

On 2023-04-19 13:38, Alen Vrečko wrote:<br>

> Hello everyone.<br>

><br>

> I did my best to search the web for any prior discussion on this. <br>

> Haven't found anything useful.<br>

><br>

> I am trying to understand why there is a noticeable difference between <br>

> the size of all objects on the heap and the heap used (after full GC). <br>

> The heap used size can be 10%+ more than the size of of all live objects.<br>

<br>

The difference can come from two sources:<br>

<br>

1) There is unusable memory in the regions, caused by address and size <br>

alignment requirements. (This is probably what you see in your 4MB array <br>

test)<br>

<br>

2) We only calculate the `live` value when we mark through objects on <br>

the heap. While ZGC runs a cycle it more or less ignores new objects <br>

that the Java threads allocate during the GC cycle. This means that the <br>

`used` value will increase, but the `live` value will not be updated <br>

until the Mark End of the next GC cycle..<br>

<br>

The latter also makes it bit misleading to look at the used value after <br>

the GC cycle. With stop-the-world GCs, that used value is an OK <br>

approximation of what is live on the heap. With a concurrent GC it <br>

includes all the memory allocated during GC cycle. The used numbers is <br>

still true, but some (many) of those allocated objects where short-lived <br>

and died, but we won't be able to figure that out until the next GC cycle.<br>

<br>

I don't know if you have seen this, but we have a table where we try to <br>

give you  a picture of how the values progressed during the GC cycle:<br>

<br>

[6.150s][1681984168056ms][info ][gc,heap     ] GC(0) Mark Start          <br>

Mark End        Relocate Start      Relocate End           <br>

High               Low<br>

[6.150s][1681984168056ms][info ][gc,heap     ] GC(0)  Capacity: 3282M <br>

(10%)        3536M (11%)        3580M (11%)        3612M (11%)        <br>

3612M (11%)        3282M (10%)<br>

[6.150s][1681984168056ms][info ][gc,heap     ] GC(0)      Free: 28780M <br>

(90%)       28538M (89%)       28720M (90%)       29066M (91%)       <br>

29068M (91%)       28434M (89%)<br>

[6.150s][1681984168056ms][info ][gc,heap     ] GC(0)      Used: 3234M <br>

(10%)        3476M (11%)        3294M (10%)        2948M (9%)         <br>

3580M (11%)        2946M (9%)<br>

[6.150s][1681984168056ms][info ][gc,heap     ] GC(0) Live:         <br>

-              2496M (8%)         2496M (8%) 2496M (8%)             <br>

-                  -<br>

[6.150s][1681984168056ms][info ][gc,heap     ] GC(0) Allocated:         <br>

-               242M (1%)          364M (1%)          411M <br>

(1%)             -                  -<br>

[6.150s][1681984168056ms][info ][gc,heap     ] GC(0) Garbage:         <br>

-               737M (2%)          433M (1%)           39M <br>

(0%)             -                  -<br>

[6.150s][1681984168056ms][info ][gc,heap     ] GC(0) Reclaimed:         <br>

-                  -               304M (1%)          697M <br>

(2%)             -                  -<br>

<br>

The `garbage` at Mark End is a diff between what was `used` when the GC <br>

cycle started and what we later found to be `live` in that used memory.<br>

<br>

><br>

> Let's say I want to fill 1 GiB heap with 4MiB byte[] objects.<br>

><br>

> Naively I'd imagine I can store 1 GiB / 4MiB = 256 such byte[] on the <br>

> heap.<br>

><br>

> (I made a simple program that just allocates byte[], stores it in a <br>

> list, does GC and waits (so I can do jcmd or similar), nothing else)<br>

><br>

> With EpsilonGC -> 255x 4MiB byte[] allocations, after this the app <br>

> crashes with out of memory<br>

> With SerialGC  -> 246x<br>

> With ParallelGC -> 233x<br>

> With G1GC -> 204x<br>

> With ZGC -> 170x<br>

><br>

> For example in the ZGC case, where I have 170x of 4MiB byte[] on the heap.<br>

><br>

> GC.heap_info:<br>

><br>

>  ZHeap           used 1022M, capacity 1024M, max capacity 1024M<br>

>  Metaspace       used 407K, committed 576K, reserved 1114112K<br>

>   class space    used 24K, committed 128K, reserved 1048576K<br>

><br>

> GC.class_histogram:<br>

><br>

> Total         15118      713971800 (~714M)<br>

><br>

> In this case does it mean ZGC is wasting 1022M - 714M = 308M for doing <br>

> its "thing"? This is like 1022/714= 43% overhead?<br>

<br>

My guess is that the object header (typically 16 bytes) pushed the <br>

objects size slightly beyond 4MB. ZGC allocates large objects in their <br>

own region. Those regions are 2MB aligned, which makes your ~4MB objects <br>

`use` 6MB.<br>

<br>

You would probably see similar results with G1 when the heap region size <br>

is increased, which happens when the heap max size is larger. You can <br>

test that by explicitly running with -XX:G1HeapRegionSize=2MB, to use a <br>

larger heap region size.<br>

<br>

><br>

> This example might be convoluted and atypical of any production <br>

> environment.<br>

><br>

> I am seeing the difference between live set and heap used in <br>

> production at around 12.5% for 3 servers looked at.<br>

><br>

> Is there any other way to estimate the overhead apart from looking at <br>

> the difference between the live set and heap used? Does ZGC have any <br>

> internal statistic of the overhead?<br>

<br>

I don't think we have a way to differentiate between the overhead caused <br>

by (1) and (2) above.<br>

<br>

><br>

> I'd prefer not to assume 12.5% is the number to use and then get <br>

> surprised that in some case it might be 25%?<br>

<br>

The overhead of yet-to-be collected garbage can easily be above 25%. It <br>

all depend on the workload. We strive to keep the fragmentation below <br>

the -XX:ZFragmentationLimit, which is set tot 25% by default, but that <br>

doesn't include the overhead of newly allocated object (and it doesn't <br>

include the large objects).<br>

<br>

><br>

> Do you have any recommendations regarding ZGC overhead when estimating <br>

> heap space?<br>

<br>

Unfortunately, I can't. It depends on how intricate the object graph is <br>

(meaning how long it will take to mark through it), how many live <br>

objects you have, the allocation rate, number of cores, etc. There's a <br>

constant race between the GC and the allocating Java threads. If the <br>

Java threads "win", and use up all memory before the GC can mark through <br>

the object graph and then give back memory to the Java application, then <br>

the Java threads will stall waiting for more memory. You need to test <br>

with your workload and see if you've given enough heap memory to allow <br>

ZGC to complete its cycles without causing allocation stalls.<br>

<br>

Thanks,<br>

StefanK<br>

<br>

><br>

> Thank you<br>

> Alen<br>

<br>

</blockquote></div>