JVM stalls around uncommitting

Tue Mar 31 19:59:51 UTC 2020

Hi ZGC Team,

I run benchmarks against our application using ZGC on heaps in few
hundreds GB scale. In the beginning everything goes smooth, but 
eventually I experience very long JVM stalls, sometimes longer than one
minute. According to the JVM log, reaching safepoints occasionally takes
very long time, matching to the duration of the stalls I experience.

After a few iterations, I started looking at uncommitting and learned
that the way ZGC performs uncommitting - flushing the pages, punching
holes, removing blocks from the backing file - can be expensive [1] when
uncommitting tens or more than a hundred GB of memory. The trace level
heap logs confirmed that uncommitting blocks in this size takes many
seconds. After disabled uncommitting my benchmark runs without the huge
stalls and the overall experience with ZGC is quite good.

Since uncommitting is done asynchronously to the mutators, I expected it
not to interfere with them. My understanding is that flushing, 
bookeeping and uncommitting is done under a mutex [2], and contention on 
that can be the source of the stalls I see, such as when there is a 
demand to commit memory while uncommitting is taking place. Can you 
confirm if this above is an explanation that makes sense to you? If so, 
is there a cure to this that I couldn't find? Like a time bound or a cap 
on the amount of the memory that can be uncommitted in one go.

This is an example log captured during a stall:

[1778,704s][info ][safepoint] Safepoint "ZMarkStart", Time since last:
34394880194 ns, Reaching safepoint: 247308 ns, At safepoint: 339634 ns,
Total: 586942 ns
[1833,707s][trace][gc,heap  ] Uncommitting memory: 459560M-459562M (2M)
[...]
[... zillions of continuous uncommitting log lines ...]
[...]
[1846,076s][trace][gc,heap  ] Uncommitting memory: 84M-86M (2M)
[1846,076s][info ][gc,heap  ] Capacity: 528596M(86%)->386072M(63%),
Uncommitted: 142524M
[1846,076s][trace][gc,heap  ] Uncommit Timeout: 1s
[1846,078s][info ][safepoint] Safepoint "Cleanup", Time since last:
18001682918 ns, Reaching safepoint: 49371131055 ns, At safepoint: 252559
ns, Total: 49371383614 ns

In the above case TTSP is 49s, while the uncommitting lines cover only
13s. The TTSP would indicate that the safepoint request was signaled at 
1797s, but the log is empty between 1778s and 1883s. If my understanding 
above is correct, could it be that waiting for the mutex, flushing etc 
takes that much time and just not visible in the log?

If needed, I can dig out more details since I can reliably reproduce the 
stalls.

My environment is OpenJDK 14 running on Linux 5.2.9 with these
arguments: "-Xmx600G -XX:+HeapDumpOnOutOfMemoryError
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:+UseNUMA
-XX:+AlwaysPreTouch -Xlog:gc,safepoint,gc+heap=trace:jvm.log".

Best regards,
Zoltan

[1]
https://github.com/openjdk/zgc/blob/d90d2b1097a9de06d8b6e3e6f2f6bd4075471fa0/src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp#L566-L573
[2]
https://github.com/openjdk/zgc/blob/d90d2b1097a9de06d8b6e3e6f2f6bd4075471fa0/src/hotspot/share/gc/z/zPageAllocator.cpp#L685-L711