JVM stalls around uncommitting

Sat Apr 4 18:42:34 UTC 2020

Hi Thomas,

On 4/4/20 11:21 AM, Thomas Stüfe wrote:
> Sorry, let me formulate this question in a more precise manner:
> 
> Assuming you use the "traditional" huge pages, HugeTLBFS, we take the 
> pages from the huge page pool. The content of the huge page pool is not 
> usable for other applications. So uncommitting would only benefit other 
> applications using huge pages. But that's okay and would be useful too.

It depends a bit on how you've setup the huge page pool. Normally, you 
set nr_hugepages to configure the huge page pool to have a fixed number 
of pages, with a guarantee that those pages will actually be there when 
needed. Applications explicitly using huge pages will allocate from the 
pool. Applications that uncommit such pages will return them to the pool 
for other applications (that are also explicitly using huge pages) to use.

However, you can instead (or also) choose to configure 
nr_overcommit_hugepages. When the huge page pool is depleted (e.g. 
because nr_hugepages was set to 0 from the start) the kernel will try to 
allocate at most this number of huge pages from the normal page pool. 
These pages will show up as HugePages_Surp in /proc/meminfo. When 
uncommiting such pages they will be returned to the normal page pool, 
for any other process to use (not just those explicitly using huge 
pages). Of course, you don't have the same guarantee that there are 
large pages available.

> 
> The question to me would be if reserving but not committing memory 
> backed by huge pages is any different from committing them right away. 
> Or, whether uncommitted pages are returned to the pool.

It depends on what you mean with reserving. If you're going through 
ReservedSpace (i.e. os::reserve_memory_special() and friends), then yes, 
it's the same thing. But ZGC is not using those APIs, it has it's own 
reserve/commit/uncommit infrastructure where reserve only reserves 
address space, and commit/uncommit actually allocates/deallocates pages.

> 
> I made a simple test with UseLagePages and a VM with a 100M heap, and 
> see that both heap and code heap are now backed by huge pages 
> as expected. I ran once with AlwaysPreTouch, once without. I do not see 
> any difference from the outside as toward the number of used huge pages. 
> In /proc/pid/smaps the memory segments look identical in each case. I 
> may be doing this test wrong though...

Maybe you weren't using ZGC? The code heap and all GCs, except ZGC, use 
ReservedSpace where large pages will be committed and "pinned" upfront, 
and no uncommit will happen.

cheers,
Per

> 
> Thanks a lot, and sorry again for hijacking this thread,
> 
> Thomas
> 
> p.s. without doubt using huge pages is hugely beneficial even without 
> uncommitting.
> 
> 
> 
> 
> On Sat, Apr 4, 2020 at 10:00 AM Thomas Stüfe <thomas.stuefe at gmail.com 
> <mailto:thomas.stuefe at gmail.com>> wrote:
> 
>     Hi Per, Zoltan,
> 
>     sorry for getting in a question sideways, but I was curious.
> 
>     I always thought large pages are memory-pinned, so cannot be
>     uncommitted? Or are you talking using THPs?
> 
>     Cheers, Thomas
> 
> 
>     On Fri, Apr 3, 2020 at 9:38 AM Per Liden <per.liden at oracle.com
>     <mailto:per.liden at oracle.com>> wrote:
> 
>         Hi Zoltan,
> 
>         On 4/3/20 1:27 AM, Zoltán Baranyi wrote:
>          > Hi Per,
>          >
>          > Thank you for confirming the issue and for recommending large
>         pages. I
>          > re-run my benchmarks with large pages and it gave me a 25-30%
>         performance
>          > boost, which is a bit more than what I expected. My
>         benchmarks run on a
>          > 600G heap with 1.5-2GB/s allocation rate on a 40 core
>         machine, so ZGC is
>          > busy. Since a significant part of the workload is ZGC itself,
>         I assume -
>          > besides the higher TLB hit rate - this gain is from managing
>         the ZPages
>          > more effectively on large pages.
> 
>         A 25-30% improvement is indeed more than I would have expected.
>         ZGC's
>         internal handling of ZPages is the same regardless of the
>         underlying
>         page size, but as you say, you'll get better TLB hit-rate and the
>         mmap/fallocate syscalls become a lot less expensive.
> 
>         Another reason for the boost might be that ZGC's NUMA-awareness,
>         until
>         recently, worked much better when using large pages. But this
>         has now
>         been fixed, see https://bugs.openjdk.java.net/browse/JDK-8237649.
> 
>         Btw, which JDK version are you using?
> 
>          >
>          > I have a good experience overall, nice to see ZGC getting
>         more and more
>          > mature.
> 
>         Good to hear. Thanks for the feedback!
> 
>         /Per
> 
>          >
>          > Cheers,
>          > Zoltan
>          >
>          > On Wed, Apr 1, 2020 at 9:15 AM Per Liden
>         <per.liden at oracle.com <mailto:per.liden at oracle.com>> wrote:
>          >
>          >> Hi,
>          >>
>          >> On 3/31/20 9:59 PM, Zoltan Baranyi wrote:
>          >>> Hi ZGC Team,
>          >>>
>          >>> I run benchmarks against our application using ZGC on heaps
>         in few
>          >>> hundreds GB scale. In the beginning everything goes smooth, but
>          >>> eventually I experience very long JVM stalls, sometimes
>         longer than one
>          >>> minute. According to the JVM log, reaching safepoints
>         occasionally takes
>          >>> very long time, matching to the duration of the stalls I
>         experience.
>          >>>
>          >>> After a few iterations, I started looking at uncommitting
>         and learned
>          >>> that the way ZGC performs uncommitting - flushing the
>         pages, punching
>          >>> holes, removing blocks from the backing file - can be
>         expensive [1] when
>          >>> uncommitting tens or more than a hundred GB of memory. The
>         trace level
>          >>> heap logs confirmed that uncommitting blocks in this size
>         takes many
>          >>> seconds. After disabled uncommitting my benchmark runs
>         without the huge
>          >>> stalls and the overall experience with ZGC is quite good.
>          >>>
>          >>> Since uncommitting is done asynchronously to the mutators,
>         I expected it
>          >>> not to interfere with them. My understanding is that flushing,
>          >>> bookeeping and uncommitting is done under a mutex [2], and
>         contention on
>          >>> that can be the source of the stalls I see, such as when
>         there is a
>          >>> demand to commit memory while uncommitting is taking place.
>         Can you
>          >>> confirm if this above is an explanation that makes sense to
>         you? If so,
>          >>> is there a cure to this that I couldn't find? Like a time
>         bound or a cap
>          >>> on the amount of the memory that can be uncommitted in one go.
>          >>
>          >> Yes, uncommitting is relatively expensive. And it's also
>         true that there
>          >> is a potential for lock contention affecting mutators. That
>         can be
>          >> improved in various ways. Like you say, uncommitting in
>         smaller chunks,
>          >> or possibly by releasing the lock while doing the actual
>         syscall.
>          >>
>          >> If you still want uncommit to happen, one thing to try is
>         using large
>          >> pages (-XX:+UseLargePages), since committing/uncommitting
>         large pages is
>          >> typically less expensive.
>          >>
>          >> This issue is on our radar, so we intend to improve this
>         going forward.
>          >>
>          >> cheers,
>          >> Per
>          >>
>          >>
>