ZGC and 1G pages?
Hi, ZGC, on Linux, seems only compatible with 2M pages. Seeing that ZGC is often used with very large heaps, is support for 1GB pages planned? Especially if one disregards uncommitting (-ZUncommit), 1G pages could be a speed boost for customers with gigantic heaps, as well as reduce their number of VMAs. Thanks, Thomas
Hi Thomas, On 2023-05-08 10:50, Thomas Stüfe wrote:
Hi,
ZGC, on Linux, seems only compatible with 2M pages. Seeing that ZGC is often used with very large heaps, is support for 1GB pages planned?
There's currently no plan to support 1GB explicit larges pages.
Especially if one disregards uncommitting (-ZUncommit), 1G pages could be a speed boost for customers with gigantic heaps, as well as reduce their number of VMAs.
One hurdle to get 1GB pages to work is that we take the physical backing memory of multiple discontiguous 2MB heap regions and combine them into a larger contiguous memory region. To implement 1GB large pages support we need to figure out how to perform that detaching of physical memory from virtual memory if the 2MB regions reside on 1GB large pages. StefanK
Thanks, Thomas
Hi Stefan, On Mon, May 8, 2023 at 11:08 AM Stefan Karlsson <stefan.karlsson@oracle.com> wrote:
Hi Thomas,
On 2023-05-08 10:50, Thomas Stüfe wrote:
Hi,
ZGC, on Linux, seems only compatible with 2M pages. Seeing that ZGC is often used with very large heaps, is support for 1GB pages planned?
There's currently no plan to support 1GB explicit larges pages.
Especially if one disregards uncommitting (-ZUncommit), 1G pages could be a speed boost for customers with gigantic heaps, as well as reduce their number of VMAs.
One hurdle to get 1GB pages to work is that we take the physical backing memory of multiple discontiguous 2MB heap regions and combine them into a larger contiguous memory region. To implement 1GB large pages support we need to figure out how to perform that detaching of physical memory from virtual memory if the 2MB regions reside on 1GB large pages.
Thank you for clarifying! This is probably a naive question, but would increasing the zpage size to 1GB be a valid option? ..Thomas
StefanK
Thanks, Thomas
On 2023-05-08 11:27, Thomas Stüfe wrote:
Hi Stefan,
On Mon, May 8, 2023 at 11:08 AM Stefan Karlsson <stefan.karlsson@oracle.com> wrote:
Hi Thomas,
On 2023-05-08 10:50, Thomas Stüfe wrote: > Hi, > > ZGC, on Linux, seems only compatible with 2M pages. Seeing that ZGC is > often used with very large heaps, is support for 1GB pages planned?
There's currently no plan to support 1GB explicit larges pages.
> > Especially if one disregards uncommitting (-ZUncommit), 1G pages could > be a speed boost for customers with gigantic heaps, as well as reduce > their number of VMAs.
One hurdle to get 1GB pages to work is that we take the physical backing memory of multiple discontiguous 2MB heap regions and combine them into a larger contiguous memory region. To implement 1GB large pages support we need to figure out how to perform that detaching of physical memory from virtual memory if the 2MB regions reside on 1GB large pages.
Thank you for clarifying!
This is probably a naive question, but would increasing the zpage size to 1GB be a valid option?
I see a few problems, but there are probably more lurking: 1) This would waste a significant amount of memory. We have a number of reason why we have a couple of active pages per worker and/or CPU. 2) Allocating large heap region and initializing associated data structures can take some times. A lot of it can be prevented by using -Xmx == -Xms and -XX:+AlwaysPreTouch. 3) I think it would break some address compression schemes we have. For example, take a look at ZForwardingEntry and the "From Object Index" part. That portion will not fit for small objects in 1GB pages. So, I think it would be possible to do, but you would have to rewrite parts of the GC. StefanK
..Thomas
StefanK
> > Thanks, Thomas > >
* Thomas Stüfe:
ZGC, on Linux, seems only compatible with 2M pages. Seeing that ZGC is often used with very large heaps, is support for 1GB pages planned?
Especially if one disregards uncommitting (-ZUncommit), 1G pages could be a speed boost for customers with gigantic heaps, as well as reduce their number of VMAs.
Is the number of VMAs really tied to hugepage support? I think ZGC could keep the number of VMAs down simply by processing mappings at larger granularity. There is a Fedora discussion under way to eliminate the kernel VMA limit completely, but the kernel OOM handler isn't really compatible with that. The current heuristics do not seem to pick the most appropriate process if the kernel ends up with too much (unswappable?) memory used due to an excessive count of VMAs, so I'm not sure that we're going to change the default. F39 proposal: Increase vm.max_map_count value (System-Wide Change proposal) <https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/WVWHTLXSGZN4QMAE577ZFZX4ZI6YZF3A/> Thanks, Florian
Hi Florian, On Mon, May 8, 2023 at 11:15 AM Florian Weimer <fweimer@redhat.com> wrote:
* Thomas Stüfe:
ZGC, on Linux, seems only compatible with 2M pages. Seeing that ZGC is often used with very large heaps, is support for 1GB pages planned?
Especially if one disregards uncommitting (-ZUncommit), 1G pages could be a speed boost for customers with gigantic heaps, as well as reduce their number of VMAs.
Is the number of VMAs really tied to hugepage support?
Indirectly. AFAIU, the number of VMAs is coupled to the size of an internal granularity ZGC uses to stitch together memory from the underlying memory layer. That granularity is 2M, I assume, because large pages are 2M on the architectures relevant to ZGC. It seems hard-wired. In long running processes, I observe a dense mixup of these stitchings, so the kernel cannot fold neighboring regions, and needs a separate VMA to represents each one.
I think ZGC could keep the number of VMAs down simply by processing mappings at larger granularity.
There is a Fedora discussion under way to eliminate the kernel VMA limit completely, but the kernel OOM handler isn't really compatible with that. The current heuristics do not seem to pick the most appropriate process if the kernel ends up with too much (unswappable?) memory used due to an excessive count of VMAs, so I'm not sure that we're going to change the default.
F39 proposal: Increase vm.max_map_count value (System-Wide Change proposal) < https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...
Interesting. Thanks for the hint. I have seen us running against this limit several times in the past, and the error is often confusing (e.g. when creating a thread and mprotecting the guard pages, which may split the contained VMA into two; if that fails, the mprotect fails with ENOMEM, which is not intuitive.
Thanks, Florian
* Thomas Stüfe:
There is a Fedora discussion under way to eliminate the kernel VMA limit completely, but the kernel OOM handler isn't really compatible with that. The current heuristics do not seem to pick the most appropriate process if the kernel ends up with too much (unswappable?) memory used due to an excessive count of VMAs, so I'm not sure that we're going to change the default.
F39 proposal: Increase vm.max_map_count value (System-Wide Change proposal) <https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/WVWHTLXSGZN4QMAE577ZFZX4ZI6YZF3A/>
Interesting. Thanks for the hint. I have seen us running against this limit several times in the past, and the error is often confusing (e.g. when creating a thread and mprotecting the guard pages, which may split the contained VMA into two; if that fails, the mprotect fails with ENOMEM, which is not intuitive.
That's not great. At this point, it may make sense to engage with the kernel mm developers to see if they have any ideas. Maybe a soft/hard limit (see setrlimit) would make sense: a low-ish default limit to prevent taking down the system if processes go wrong in an unexpected way, and a way to raise it for processes that are known to need more than that. But there could also be performance implications from many VMAs, but the kernel developers would know. Thanks, Florian
participants (3)
-
Florian Weimer
-
Stefan Karlsson
-
Thomas Stüfe