Concurrent Mark sweep increases after increase in SoftRef enqueued number increases

Per Liden per.liden at oracle.com
Tue Apr 12 09:14:29 UTC 2022


Hi,

When an allocation stall occurs, ZGC will start to clear SoftRefs. In 
other words, clearing of SoftRefs you see is likely just a side effect 
of you running into an allocation stall.

A longer concurrent mark phase could be caused by a number of things. 
For example, the application might have a larger live-set, the machine 
might have been overloaded, etc. However, it could also be that ZGC's 
heuristics selecting number of threads to use got things wrong, and used 
too few. That would also prolong the concurrent mark phase. You could 
try the option -XX:-UseDynamicNumberOfGCThreads to disable this 
heuristic, and instead always use a fixed number of threads.

/Per

On 2022-04-11 22:50, Sundara Mohan M wrote:
> Hi,
>      I am running ZGC in production with following settings
> [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] Initializing The Z
> Garbage Collector
> [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] Version: 17.0.2+8
> (release)
> [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] NUMA Support: Enabled
> [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] NUMA Nodes: 2
> [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] CPUs: 72 total, 72
> available
> [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] Memory: 191871M
> [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] Large Page Support:
> Disabled
> [2022-04-05T22:30:37.154+0000][0.038s][info][gc,init] GC Workers: 9/44
> (static)
> [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Address Space Type:
> Contiguous/Unrestricted/Complete
> [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Address Space Size:
> 1572864M x 3 = 4718592M
> [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Heap Backing File:
> /memfd:java_heap
> [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Heap Backing
> Filesystem: tmpfs (0x1021994)
> [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Min Capacity: 98304M
> [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Initial Capacity:
> 98304M
> [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Max Capacity: 98304M
> [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Medium Page Size: 32M
> [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Pre-touch: Disabled
> [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Available space on
> backing filesystem: N/A
> [2022-04-05T22:30:37.157+0000][0.040s][info][gc,init] Uncommit: Implicitly
> Disabled (-Xms equals -Xmx)
> [2022-04-05T22:30:48.788+0000][11.672s][info][gc,init] Runtime Workers: 44
> 
> In 24 hours I have seen 4 allocation stalls, each is happening over 5 mins
> with >1s stall on multiple threads.
> Whenever allocation stalls, the following pattern is observed
> 1. There is an increase in SoftRef enqueued values
> 2. Concurrent Mark time increase from 3s to ~12s
> 
> 
> [2022-04-10T19:46:22.380+0000][422145.264s][info][gc,ref      ] GC(17874)
> Soft: 22330 encountered, 19702 discovered, 11571 enqueued
> [2022-04-10T19:46:34.534+0000][422157.418s][info][gc,ref      ] GC(17875)
> Soft: 10781 encountered, 7560 discovered, 19 enqueued
> [2022-04-10T19:46:47.755+0000][422170.639s][info][gc,ref      ] GC(17876)
> Soft: 10863 encountered, 3289 discovered, 44 enqueued
> ...
> [2022-04-11T05:31:55.763+0000][457278.647s][info][gc,ref      ] GC(19267)
> Soft: 9664 encountered, 868 discovered, 12 enqueued
> [2022-04-11T05:32:09.553+0000][457292.437s][info][gc,ref      ] GC(19268)
> Soft: 9846 encountered, 6202 discovered, 98 enqueued
> [2022-04-11T05:32:23.798+0000][457306.682s][info][gc,ref      ] GC(19269)
> Soft: 9859 encountered, 6715 discovered, 132 enqueued
> [2022-04-11T05:32:37.903+0000][457320.787s][info][gc,ref      ] GC(19270)
> Soft: 9873 encountered, 7585 discovered, 42 enqueued
> [2022-04-11T05:32:52.609+0000][457335.493s][info][gc,ref      ] GC(19271)
> Soft: 10073 encountered, 6466 discovered, 57 enqueued
> [2022-04-11T11:13:14.758+0000][477757.642s][info][gc,ref      ] GC(20578)
> Soft: 25760 encountered, 20850 discovered, 15635 enqueued
> 
> Can someone share info about what else might cause Concurrent Mark time to
> go higher?
> Will cleaning SoftReference quickly using SoftRefLRUPolicyMSPerMB flag or
> not using SoftRef might help in this case?
> 
> Can send complete logs if it helps.
> 
> 
> Thanks
> Sundar


More information about the zgc-dev mailing list