Concurrent Mark sweep increases after increase in SoftRef enqueued number increases

Tue Apr 19 08:27:45 UTC 2022

Hi,

On 4/12/22 17:59, Sundara Mohan M wrote:
> Hi,
>      Thanks for clarifying on SoftRefs.
> Already using -XX:-UseDynamicNumberOfGCThreads and a fixed number of 
> concurrent threads (-XX:ConcGCThreads=9, host has 48 threads).
> I did notice the live set has increased whenever stall happens, is there 
> any easy way to get the object stats when stall happens?
> Right now trying to collect heap dump when a stall happens.

Another way to get a quick overview of what's on the heap is:

jcmd <pid> GC.class_histogram

/Per

> 
> Thanks
> Sundar
> 
> 
> 
> On Tue, Apr 12, 2022 at 2:14 AM Per Liden <per.liden at oracle.com 
> <mailto:per.liden at oracle.com>> wrote:
> 
>     Hi,
> 
>     When an allocation stall occurs, ZGC will start to clear SoftRefs. In
>     other words, clearing of SoftRefs you see is likely just a side effect
>     of you running into an allocation stall.
> 
>     A longer concurrent mark phase could be caused by a number of things.
>     For example, the application might have a larger live-set, the machine
>     might have been overloaded, etc. However, it could also be that ZGC's
>     heuristics selecting number of threads to use got things wrong, and
>     used
>     too few. That would also prolong the concurrent mark phase. You could
>     try the option -XX:-UseDynamicNumberOfGCThreads to disable this
>     heuristic, and instead always use a fixed number of threads.
> 
>     /Per
> 
>     On 2022-04-11 22:50, Sundara Mohan M wrote:
>      > Hi,
>      >      I am running ZGC in production with following settings
>      > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init]
>     Initializing The Z
>      > Garbage Collector
>      > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] Version:
>     17.0.2+8
>      > (release)
>      > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] NUMA
>     Support: Enabled
>      > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] NUMA Nodes: 2
>      > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] CPUs: 72
>     total, 72
>      > available
>      > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] Memory: 191871M
>      > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] Large Page
>     Support:
>      > Disabled
>      > [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] GC Workers:
>     9/44
>      > (static)
>      > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Address
>     Space Type:
>      > Contiguous/Unrestricted/Complete
>      > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Address
>     Space Size:
>      > 1572864M x 3 = 4718592M
>      > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Heap
>     Backing File:
>      > /memfd:java_heap
>      > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Heap Backing
>      > Filesystem: tmpfs (0x1021994)
>      > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Min
>     Capacity: 98304M
>      > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Initial
>     Capacity:
>      > 98304M
>      > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Max
>     Capacity: 98304M
>      > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Medium Page
>     Size: 32M
>      > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Pre-touch:
>     Disabled
>      > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Available
>     space on
>      > backing filesystem: N/A
>      > [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Uncommit:
>     Implicitly
>      > Disabled (-Xms equals -Xmx)
>      > [2022-04-05T22:30:48.788+0000][11.672s][info][gc,init] Runtime
>     Workers: 44
>      >
>      > In 24 hours I have seen 4 allocation stalls, each is happening
>     over 5 mins
>      > with >1s stall on multiple threads.
>      > Whenever allocation stalls, the following pattern is observed
>      > 1. There is an increase in SoftRef enqueued values
>      > 2. Concurrent Mark time increase from 3s to ~12s
>      >
>      >
>      > [2022-04-10T19:46:22.380+0000][422145.264s][info][gc,ref      ]
>     GC(17874)
>      > Soft: 22330 encountered, 19702 discovered, 11571 enqueued
>      > [2022-04-10T19:46:34.534+0000][422157.418s][info][gc,ref      ]
>     GC(17875)
>      > Soft: 10781 encountered, 7560 discovered, 19 enqueued
>      > [2022-04-10T19:46:47.755+0000][422170.639s][info][gc,ref      ]
>     GC(17876)
>      > Soft: 10863 encountered, 3289 discovered, 44 enqueued
>      > ...
>      > [2022-04-11T05:31:55.763+0000][457278.647s][info][gc,ref      ]
>     GC(19267)
>      > Soft: 9664 encountered, 868 discovered, 12 enqueued
>      > [2022-04-11T05:32:09.553+0000][457292.437s][info][gc,ref      ]
>     GC(19268)
>      > Soft: 9846 encountered, 6202 discovered, 98 enqueued
>      > [2022-04-11T05:32:23.798+0000][457306.682s][info][gc,ref      ]
>     GC(19269)
>      > Soft: 9859 encountered, 6715 discovered, 132 enqueued
>      > [2022-04-11T05:32:37.903+0000][457320.787s][info][gc,ref      ]
>     GC(19270)
>      > Soft: 9873 encountered, 7585 discovered, 42 enqueued
>      > [2022-04-11T05:32:52.609+0000][457335.493s][info][gc,ref      ]
>     GC(19271)
>      > Soft: 10073 encountered, 6466 discovered, 57 enqueued
>      > [2022-04-11T11:13:14.758+0000][477757.642s][info][gc,ref      ]
>     GC(20578)
>      > Soft: 25760 encountered, 20850 discovered, 15635 enqueued
>      >
>      > Can someone share info about what else might cause Concurrent
>     Mark time to
>      > go higher?
>      > Will cleaning SoftReference quickly using SoftRefLRUPolicyMSPerMB
>     flag or
>      > not using SoftRef might help in this case?
>      >
>      > Can send complete logs if it helps.
>      >
>      >
>      > Thanks
>      > Sundar
>