Concurrent Mark sweep increases after increase in SoftRef enqueued number increases
Per Liden
per.liden at oracle.com
Tue Apr 19 08:27:45 UTC 2022
Hi,
On 4/12/22 17:59, Sundara Mohan M wrote:
> Hi,
> Thanks for clarifying on SoftRefs.
> Already using -XX:-UseDynamicNumberOfGCThreads and a fixed number of
> concurrent threads (-XX:ConcGCThreads=9, host has 48 threads).
> I did notice the live set has increased whenever stall happens, is there
> any easy way to get the object stats when stall happens?
> Right now trying to collect heap dump when a stall happens.
Another way to get a quick overview of what's on the heap is:
jcmd <pid> GC.class_histogram
/Per
>
> Thanks
> Sundar
>
>
>
> On Tue, Apr 12, 2022 at 2:14 AM Per Liden <per.liden at oracle.com
> <mailto:per.liden at oracle.com>> wrote:
>
> Hi,
>
> When an allocation stall occurs, ZGC will start to clear SoftRefs. In
> other words, clearing of SoftRefs you see is likely just a side effect
> of you running into an allocation stall.
>
> A longer concurrent mark phase could be caused by a number of things.
> For example, the application might have a larger live-set, the machine
> might have been overloaded, etc. However, it could also be that ZGC's
> heuristics selecting number of threads to use got things wrong, and
> used
> too few. That would also prolong the concurrent mark phase. You could
> try the option -XX:-UseDynamicNumberOfGCThreads to disable this
> heuristic, and instead always use a fixed number of threads.
>
> /Per
>
> On 2022-04-11 22:50, Sundara Mohan M wrote:
> > Hi,
> > I am running ZGC in production with following settings
> > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init]
> Initializing The Z
> > Garbage Collector
> > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] Version:
> 17.0.2+8
> > (release)
> > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] NUMA
> Support: Enabled
> > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] NUMA Nodes: 2
> > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] CPUs: 72
> total, 72
> > available
> > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] Memory: 191871M
> > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] Large Page
> Support:
> > Disabled
> > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] GC Workers:
> 9/44
> > (static)
> > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Address
> Space Type:
> > Contiguous/Unrestricted/Complete
> > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Address
> Space Size:
> > 1572864M x 3 = 4718592M
> > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Heap
> Backing File:
> > /memfd:java_heap
> > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Heap Backing
> > Filesystem: tmpfs (0x1021994)
> > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Min
> Capacity: 98304M
> > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Initial
> Capacity:
> > 98304M
> > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Max
> Capacity: 98304M
> > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Medium Page
> Size: 32M
> > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Pre-touch:
> Disabled
> > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Available
> space on
> > backing filesystem: N/A
> > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Uncommit:
> Implicitly
> > Disabled (-Xms equals -Xmx)
> > [2022-04-05T22:30:48.788+0000][11.672s][info][gc,init] Runtime
> Workers: 44
> >
> > In 24 hours I have seen 4 allocation stalls, each is happening
> over 5 mins
> > with >1s stall on multiple threads.
> > Whenever allocation stalls, the following pattern is observed
> > 1. There is an increase in SoftRef enqueued values
> > 2. Concurrent Mark time increase from 3s to ~12s
> >
> >
> > [2022-04-10T19:46:22.380+0000][422145.264s][info][gc,ref ]
> GC(17874)
> > Soft: 22330 encountered, 19702 discovered, 11571 enqueued
> > [2022-04-10T19:46:34.534+0000][422157.418s][info][gc,ref ]
> GC(17875)
> > Soft: 10781 encountered, 7560 discovered, 19 enqueued
> > [2022-04-10T19:46:47.755+0000][422170.639s][info][gc,ref ]
> GC(17876)
> > Soft: 10863 encountered, 3289 discovered, 44 enqueued
> > ...
> > [2022-04-11T05:31:55.763+0000][457278.647s][info][gc,ref ]
> GC(19267)
> > Soft: 9664 encountered, 868 discovered, 12 enqueued
> > [2022-04-11T05:32:09.553+0000][457292.437s][info][gc,ref ]
> GC(19268)
> > Soft: 9846 encountered, 6202 discovered, 98 enqueued
> > [2022-04-11T05:32:23.798+0000][457306.682s][info][gc,ref ]
> GC(19269)
> > Soft: 9859 encountered, 6715 discovered, 132 enqueued
> > [2022-04-11T05:32:37.903+0000][457320.787s][info][gc,ref ]
> GC(19270)
> > Soft: 9873 encountered, 7585 discovered, 42 enqueued
> > [2022-04-11T05:32:52.609+0000][457335.493s][info][gc,ref ]
> GC(19271)
> > Soft: 10073 encountered, 6466 discovered, 57 enqueued
> > [2022-04-11T11:13:14.758+0000][477757.642s][info][gc,ref ]
> GC(20578)
> > Soft: 25760 encountered, 20850 discovered, 15635 enqueued
> >
> > Can someone share info about what else might cause Concurrent
> Mark time to
> > go higher?
> > Will cleaning SoftReference quickly using SoftRefLRUPolicyMSPerMB
> flag or
> > not using SoftRef might help in this case?
> >
> > Can send complete logs if it helps.
> >
> >
> > Thanks
> > Sundar
>
More information about the zgc-dev
mailing list