JVM stalls around uncommitting

Mon Apr 6 06:33:35 UTC 2020

Hi Per,

On Sat, Apr 4, 2020 at 8:42 PM Per Liden <per.liden at oracle.com> wrote:

> Hi Thomas,
>
> On 4/4/20 11:21 AM, Thomas Stüfe wrote:
> > Sorry, let me formulate this question in a more precise manner:
> >
> > Assuming you use the "traditional" huge pages, HugeTLBFS, we take the
> > pages from the huge page pool. The content of the huge page pool is not
> > usable for other applications. So uncommitting would only benefit other
> > applications using huge pages. But that's okay and would be useful too.
>
> It depends a bit on how you've setup the huge page pool. Normally, you
> set nr_hugepages to configure the huge page pool to have a fixed number
> of pages, with a guarantee that those pages will actually be there when
> needed. Applications explicitly using huge pages will allocate from the
> pool. Applications that uncommit such pages will return them to the pool
> for other applications (that are also explicitly using huge pages) to use.
>
>
Good to know.

> However, you can instead (or also) choose to configure
> nr_overcommit_hugepages. When the huge page pool is depleted (e.g.
> because nr_hugepages was set to 0 from the start) the kernel will try to
> allocate at most this number of huge pages from the normal page pool.
> These pages will show up as HugePages_Surp in /proc/meminfo. When
> uncommiting such pages they will be returned to the normal page pool,
> for any other process to use (not just those explicitly using huge
> pages). Of course, you don't have the same guarantee that there are
> large pages available.
>
>
Oh this is nice. I did not know you could do this. It takes the sting out
of preallocating a huge page pool, especially on development machines.

> >
> > The question to me would be if reserving but not committing memory
> > backed by huge pages is any different from committing them right away.
> > Or, whether uncommitted pages are returned to the pool.
>
> It depends on what you mean with reserving. If you're going through
> ReservedSpace (i.e. os::reserve_memory_special() and friends), then yes,
> it's the same thing. But ZGC is not using those APIs, it has it's own
> reserve/commit/uncommit infrastructure where reserve only reserves
> address space, and commit/uncommit actually allocates/deallocates pages.
>
>
> > I made a simple test with UseLagePages and a VM with a 100M heap, and
> > see that both heap and code heap are now backed by huge pages
> > as expected. I ran once with AlwaysPreTouch, once without. I do not see
> > any difference from the outside as toward the number of used huge pages.
> > In /proc/pid/smaps the memory segments look identical in each case. I
> > may be doing this test wrong though...
>
> Maybe you weren't using ZGC? The code heap and all GCs, except ZGC, use
> ReservedSpace where large pages will be committed and "pinned" upfront,
> and no uncommit will happen.
>
>
That, and I also got confused with AIX where huge pages are pinned by the
OS :)

> cheers,
> Per
>
>
Thank you for that extensive answer!

Cheers, Thomas

> >
> > Thanks a lot, and sorry again for hijacking this thread,
> >
> > Thomas
> >
> > p.s. without doubt using huge pages is hugely beneficial even without
> > uncommitting.
> >
> >
> >
> >
> > On Sat, Apr 4, 2020 at 10:00 AM Thomas Stüfe <thomas.stuefe at gmail.com
> > <mailto:thomas.stuefe at gmail.com>> wrote:
> >
> >     Hi Per, Zoltan,
> >
> >     sorry for getting in a question sideways, but I was curious.
> >
> >     I always thought large pages are memory-pinned, so cannot be
> >     uncommitted? Or are you talking using THPs?
> >
> >     Cheers, Thomas
> >
> >
> >     On Fri, Apr 3, 2020 at 9:38 AM Per Liden <per.liden at oracle.com
> >     <mailto:per.liden at oracle.com>> wrote:
> >
> >         Hi Zoltan,
> >
> >         On 4/3/20 1:27 AM, Zoltán Baranyi wrote:
> >          > Hi Per,
> >          >
> >          > Thank you for confirming the issue and for recommending large
> >         pages. I
> >          > re-run my benchmarks with large pages and it gave me a 25-30%
> >         performance
> >          > boost, which is a bit more than what I expected. My
> >         benchmarks run on a
> >          > 600G heap with 1.5-2GB/s allocation rate on a 40 core
> >         machine, so ZGC is
> >          > busy. Since a significant part of the workload is ZGC itself,
> >         I assume -
> >          > besides the higher TLB hit rate - this gain is from managing
> >         the ZPages
> >          > more effectively on large pages.
> >
> >         A 25-30% improvement is indeed more than I would have expected.
> >         ZGC's
> >         internal handling of ZPages is the same regardless of the
> >         underlying
> >         page size, but as you say, you'll get better TLB hit-rate and the
> >         mmap/fallocate syscalls become a lot less expensive.
> >
> >         Another reason for the boost might be that ZGC's NUMA-awareness,
> >         until
> >         recently, worked much better when using large pages. But this
> >         has now
> >         been fixed, see https://bugs.openjdk.java.net/browse/JDK-8237649
> .
> >
> >         Btw, which JDK version are you using?
> >
> >          >
> >          > I have a good experience overall, nice to see ZGC getting
> >         more and more
> >          > mature.
> >
> >         Good to hear. Thanks for the feedback!
> >
> >         /Per
> >
> >          >
> >          > Cheers,
> >          > Zoltan
> >          >
> >          > On Wed, Apr 1, 2020 at 9:15 AM Per Liden
> >         <per.liden at oracle.com <mailto:per.liden at oracle.com>> wrote:
> >          >
> >          >> Hi,
> >          >>
> >          >> On 3/31/20 9:59 PM, Zoltan Baranyi wrote:
> >          >>> Hi ZGC Team,
> >          >>>
> >          >>> I run benchmarks against our application using ZGC on heaps
> >         in few
> >          >>> hundreds GB scale. In the beginning everything goes smooth,
> but
> >          >>> eventually I experience very long JVM stalls, sometimes
> >         longer than one
> >          >>> minute. According to the JVM log, reaching safepoints
> >         occasionally takes
> >          >>> very long time, matching to the duration of the stalls I
> >         experience.
> >          >>>
> >          >>> After a few iterations, I started looking at uncommitting
> >         and learned
> >          >>> that the way ZGC performs uncommitting - flushing the
> >         pages, punching
> >          >>> holes, removing blocks from the backing file - can be
> >         expensive [1] when
> >          >>> uncommitting tens or more than a hundred GB of memory. The
> >         trace level
> >          >>> heap logs confirmed that uncommitting blocks in this size
> >         takes many
> >          >>> seconds. After disabled uncommitting my benchmark runs
> >         without the huge
> >          >>> stalls and the overall experience with ZGC is quite good.
> >          >>>
> >          >>> Since uncommitting is done asynchronously to the mutators,
> >         I expected it
> >          >>> not to interfere with them. My understanding is that
> flushing,
> >          >>> bookeeping and uncommitting is done under a mutex [2], and
> >         contention on
> >          >>> that can be the source of the stalls I see, such as when
> >         there is a
> >          >>> demand to commit memory while uncommitting is taking place.
> >         Can you
> >          >>> confirm if this above is an explanation that makes sense to
> >         you? If so,
> >          >>> is there a cure to this that I couldn't find? Like a time
> >         bound or a cap
> >          >>> on the amount of the memory that can be uncommitted in one
> go.
> >          >>
> >          >> Yes, uncommitting is relatively expensive. And it's also
> >         true that there
> >          >> is a potential for lock contention affecting mutators. That
> >         can be
> >          >> improved in various ways. Like you say, uncommitting in
> >         smaller chunks,
> >          >> or possibly by releasing the lock while doing the actual
> >         syscall.
> >          >>
> >          >> If you still want uncommit to happen, one thing to try is
> >         using large
> >          >> pages (-XX:+UseLargePages), since committing/uncommitting
> >         large pages is
> >          >> typically less expensive.
> >          >>
> >          >> This issue is on our radar, so we intend to improve this
> >         going forward.
> >          >>
> >          >> cheers,
> >          >> Per
> >          >>
> >          >>
> >
>