JVM stalls around uncommitting

Mon May 18 21:45:59 UTC 2020

Hi,

On 4/1/20 9:14 AM, Per Liden wrote:
> Hi,
> 
> On 3/31/20 9:59 PM, Zoltan Baranyi wrote:
>> Hi ZGC Team,
>>
>> I run benchmarks against our application using ZGC on heaps in few
>> hundreds GB scale. In the beginning everything goes smooth, but 
>> eventually I experience very long JVM stalls, sometimes longer than one
>> minute. According to the JVM log, reaching safepoints occasionally takes
>> very long time, matching to the duration of the stalls I experience.
>>
>> After a few iterations, I started looking at uncommitting and learned
>> that the way ZGC performs uncommitting - flushing the pages, punching
>> holes, removing blocks from the backing file - can be expensive [1] when
>> uncommitting tens or more than a hundred GB of memory. The trace level
>> heap logs confirmed that uncommitting blocks in this size takes many
>> seconds. After disabled uncommitting my benchmark runs without the huge
>> stalls and the overall experience with ZGC is quite good.
>>
>> Since uncommitting is done asynchronously to the mutators, I expected it
>> not to interfere with them. My understanding is that flushing, 
>> bookeeping and uncommitting is done under a mutex [2], and contention 
>> on that can be the source of the stalls I see, such as when there is a 
>> demand to commit memory while uncommitting is taking place. Can you 
>> confirm if this above is an explanation that makes sense to you? If 
>> so, is there a cure to this that I couldn't find? Like a time bound or 
>> a cap on the amount of the memory that can be uncommitted in one go.
> 
> Yes, uncommitting is relatively expensive. And it's also true that there 
> is a potential for lock contention affecting mutators. That can be 
> improved in various ways. Like you say, uncommitting in smaller chunks, 
> or possibly by releasing the lock while doing the actual syscall.
> 
> If you still want uncommit to happen, one thing to try is using large 
> pages (-XX:+UseLargePages), since committing/uncommitting large pages is 
> typically less expensive.
> 
> This issue is on our radar, so we intend to improve this going forward.

Just a follow up to let you know that a fix for the uncommit issue is 
now out for review here: 
http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-May/029754.html

With these patches the interference you experienced from uncommitting 
memory should be gone. As a bonus the normal allocation/commit path is 
now also a lot more streamlined and does all expensive work (e.g. 
committing memory) outside of the allocation lock.

cheers,
Per

> 
> cheers,
> Per
> 
>>
>> This is an example log captured during a stall:
>>
>> [1778,704s][info ][safepoint] Safepoint "ZMarkStart", Time since last:
>> 34394880194 ns, Reaching safepoint: 247308 ns, At safepoint: 339634 ns,
>> Total: 586942 ns
>> [1833,707s][trace][gc,heap  ] Uncommitting memory: 459560M-459562M (2M)
>> [...]
>> [... zillions of continuous uncommitting log lines ...]
>> [...]
>> [1846,076s][trace][gc,heap  ] Uncommitting memory: 84M-86M (2M)
>> [1846,076s][info ][gc,heap  ] Capacity: 528596M(86%)->386072M(63%),
>> Uncommitted: 142524M
>> [1846,076s][trace][gc,heap  ] Uncommit Timeout: 1s
>> [1846,078s][info ][safepoint] Safepoint "Cleanup", Time since last:
>> 18001682918 ns, Reaching safepoint: 49371131055 ns, At safepoint: 252559
>> ns, Total: 49371383614 ns
>>
>> In the above case TTSP is 49s, while the uncommitting lines cover only
>> 13s. The TTSP would indicate that the safepoint request was signaled 
>> at 1797s, but the log is empty between 1778s and 1883s. If my 
>> understanding above is correct, could it be that waiting for the 
>> mutex, flushing etc takes that much time and just not visible in the log?
>>
>> If needed, I can dig out more details since I can reliably reproduce 
>> the stalls.
>>
>> My environment is OpenJDK 14 running on Linux 5.2.9 with these
>> arguments: "-Xmx600G -XX:+HeapDumpOnOutOfMemoryError
>> -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:+UseNUMA
>> -XX:+AlwaysPreTouch -Xlog:gc,safepoint,gc+heap=trace:jvm.log".
>>
>> Best regards,
>> Zoltan
>>
>> [1]
>> https://github.com/openjdk/zgc/blob/d90d2b1097a9de06d8b6e3e6f2f6bd4075471fa0/src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp#L566-L573 
>>
>> [2]
>> https://github.com/openjdk/zgc/blob/d90d2b1097a9de06d8b6e3e6f2f6bd4075471fa0/src/hotspot/share/gc/z/zPageAllocator.cpp#L685-L711 
>>