Hi Florian,

On Mon, May 8, 2023 at 11:15 AM Florian Weimer <fweimer@redhat.com> wrote:

* Thomas Stüfe:

> ZGC, on Linux, seems only compatible with 2M pages. Seeing that ZGC is
> often used with very large heaps, is support for 1GB pages planned?
>
> Especially if one disregards uncommitting (-ZUncommit), 1G pages could
> be a speed boost for customers with gigantic heaps, as well as reduce
> their number of VMAs.

Is the number of VMAs really tied to hugepage support?

Indirectly. AFAIU, the number of VMAs is coupled to the size of an internal granularity ZGC uses to stitch together memory from the underlying memory layer. That granularity is 2M, I assume, because large pages are 2M on the architectures relevant to ZGC. It seems hard-wired.

In long running processes, I observe a dense mixup of these stitchings, so the kernel cannot fold neighboring regions, and needs a separate VMA to represents each one.

I think ZGC
could keep the number of VMAs down simply by processing mappings at
larger granularity.

There is a Fedora discussion under way to eliminate the kernel VMA limit
completely, but the kernel OOM handler isn't really compatible with
that. The current heuristics do not seem to pick the most appropriate
process if the kernel ends up with too much (unswappable?) memory used
due to an excessive count of VMAs, so I'm not sure that we're going to
change the default.

F39 proposal: Increase vm.max_map_count value (System-Wide Change proposal)
<https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/WVWHTLXSGZN4QMAE577ZFZX4ZI6YZF3A/>

Interesting. Thanks for the hint. I have seen us running against this limit several times in the past, and the error is often confusing (e.g. when creating a thread and mprotecting the guard pages, which may split the contained VMA into two; if that fails, the mprotect fails with ENOMEM, which is not intuitive.

Thanks,
Florian