JVM stalls around uncommitting

Wed Apr 1 07:14:54 UTC 2020

Hi,

On 3/31/20 9:59 PM, Zoltan Baranyi wrote:
> Hi ZGC Team,
> 
> I run benchmarks against our application using ZGC on heaps in few
> hundreds GB scale. In the beginning everything goes smooth, but 
> eventually I experience very long JVM stalls, sometimes longer than one
> minute. According to the JVM log, reaching safepoints occasionally takes
> very long time, matching to the duration of the stalls I experience.
> 
> After a few iterations, I started looking at uncommitting and learned
> that the way ZGC performs uncommitting - flushing the pages, punching
> holes, removing blocks from the backing file - can be expensive [1] when
> uncommitting tens or more than a hundred GB of memory. The trace level
> heap logs confirmed that uncommitting blocks in this size takes many
> seconds. After disabled uncommitting my benchmark runs without the huge
> stalls and the overall experience with ZGC is quite good.
> 
> Since uncommitting is done asynchronously to the mutators, I expected it
> not to interfere with them. My understanding is that flushing, 
> bookeeping and uncommitting is done under a mutex [2], and contention on 
> that can be the source of the stalls I see, such as when there is a 
> demand to commit memory while uncommitting is taking place. Can you 
> confirm if this above is an explanation that makes sense to you? If so, 
> is there a cure to this that I couldn't find? Like a time bound or a cap 
> on the amount of the memory that can be uncommitted in one go.

Yes, uncommitting is relatively expensive. And it's also true that there 
is a potential for lock contention affecting mutators. That can be 
improved in various ways. Like you say, uncommitting in smaller chunks, 
or possibly by releasing the lock while doing the actual syscall.

If you still want uncommit to happen, one thing to try is using large 
pages (-XX:+UseLargePages), since committing/uncommitting large pages is 
typically less expensive.

This issue is on our radar, so we intend to improve this going forward.

cheers,
Per

> 
> This is an example log captured during a stall:
> 
> [1778,704s][info ][safepoint] Safepoint "ZMarkStart", Time since last:
> 34394880194 ns, Reaching safepoint: 247308 ns, At safepoint: 339634 ns,
> Total: 586942 ns
> [1833,707s][trace][gc,heap  ] Uncommitting memory: 459560M-459562M (2M)
> [...]
> [... zillions of continuous uncommitting log lines ...]
> [...]
> [1846,076s][trace][gc,heap  ] Uncommitting memory: 84M-86M (2M)
> [1846,076s][info ][gc,heap  ] Capacity: 528596M(86%)->386072M(63%),
> Uncommitted: 142524M
> [1846,076s][trace][gc,heap  ] Uncommit Timeout: 1s
> [1846,078s][info ][safepoint] Safepoint "Cleanup", Time since last:
> 18001682918 ns, Reaching safepoint: 49371131055 ns, At safepoint: 252559
> ns, Total: 49371383614 ns
> 
> In the above case TTSP is 49s, while the uncommitting lines cover only
> 13s. The TTSP would indicate that the safepoint request was signaled at 
> 1797s, but the log is empty between 1778s and 1883s. If my understanding 
> above is correct, could it be that waiting for the mutex, flushing etc 
> takes that much time and just not visible in the log?
> 
> If needed, I can dig out more details since I can reliably reproduce the 
> stalls.
> 
> My environment is OpenJDK 14 running on Linux 5.2.9 with these
> arguments: "-Xmx600G -XX:+HeapDumpOnOutOfMemoryError
> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:+UseNUMA
> -XX:+AlwaysPreTouch -Xlog:gc,safepoint,gc+heap=trace:jvm.log".
> 
> Best regards,
> Zoltan
> 
> [1]
> https://github.com/openjdk/zgc/blob/d90d2b1097a9de06d8b6e3e6f2f6bd4075471fa0/src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp#L566-L573 
> 
> [2]
> https://github.com/openjdk/zgc/blob/d90d2b1097a9de06d8b6e3e6f2f6bd4075471fa0/src/hotspot/share/gc/z/zPageAllocator.cpp#L685-L711 
>