Containers facing OutOfMemory errors running with Generational ZGC
Hello, First of all, congratulations on the great work with Generational ZGC. We are having great results. We run our services using k8s, and a standard setup for the memory of the containers is to use 85% of the total memory for Java Heap and 15% for the rest. So, in a service running with 10Gb, 8.5Gb is allocated to the Java Heap, and 1.5Gb is Off-Heap memory. When running with G1, 15% of Off-Heap memory is generally enough. But with Generational ZGC, we see an increase in out-of-memory container errors. It's not Java Heap Out of Memory, it's out-of-memory in the container. I started to decrease the ratio between Heap/Off-Heap memory, which was enough for some cases. However, there are services already running with 45% of memory allocated Off-Heap and still suffering from errors. Is it expected that Generational ZGC requires more Off-Heap memory? Our containers run with around 80Gb of memory each, so for some cases, we are talking of more than 20Gb allocated Off-Heap in a pod running only the Java application. Thanks! -- Confidentiality note: This e-mail may contain confidential information from Nu Holdings Ltd and/or its affiliates. If you have received it by mistake, please let us know by e-mail reply and delete it from your system; you may not copy this message or disclose its contents to anyone; for details about what personal information we collect and why, please refer to our privacy policy <https://api.mziq.com/mzfilemanager/v2/d/59a081d2-0d63-4bb5-b786-4c07ae26bc74/6f4939b9-5f74-a528-1835-596b481dca54>.
Hi Luiz, Glad to hear you are having great results with generational ZGC! It’s expected that generational ZGC at the moment uses 3% more memory due to its remembered set. However, it’s worth mentioning that we decided to not ship a young generation reference processor for 21, because it seemed too risky. That means that if you have off-heap byte buffers that rely on the frequency of reference processing to free up off-heap memory, then you might run into some trouble, as it’s only old generation GCs that will trigger the cleanup. It’s a problem in general that the GC heuristics don’t understand that your seemingly tiny object has a hidden off-heap cost of 2G or something. I can imagine that problem getting worse without a young generation reference processor. If you don’t want to rely on arbitrary GC heuristics when your application off-heap memory gets freed, I would recommend using the new panama APIs where you can just call close() and the memory gets synchronously discarded. Hope this helps! /Erik On 28 Aug 2023, at 13:04, Luiz Hespanha <luiz.hespanha@nubank.com.br> wrote: Hello, First of all, congratulations on the great work with Generational ZGC. We are having great results. We run our services using k8s, and a standard setup for the memory of the containers is to use 85% of the total memory for Java Heap and 15% for the rest. So, in a service running with 10Gb, 8.5Gb is allocated to the Java Heap, and 1.5Gb is Off-Heap memory. When running with G1, 15% of Off-Heap memory is generally enough. But with Generational ZGC, we see an increase in out-of-memory container errors. It's not Java Heap Out of Memory, it's out-of-memory in the container. I started to decrease the ratio between Heap/Off-Heap memory, which was enough for some cases. However, there are services already running with 45% of memory allocated Off-Heap and still suffering from errors. Is it expected that Generational ZGC requires more Off-Heap memory? Our containers run with around 80Gb of memory each, so for some cases, we are talking of more than 20Gb allocated Off-Heap in a pod running only the Java application. Thanks! ________________________________ Confidentiality note: This e-mail may contain confidential information from Nu Holdings Ltd and/or its affiliates. If you have received it by mistake, please let us know by e-mail reply and delete it from your system; you may not copy this message or disclose its contents to anyone; for details about what personal information we collect and why, please refer to our privacy policy<https://api.mziq.com/mzfilemanager/v2/d/59a081d2-0d63-4bb5-b786-4c07ae26bc74/6f4939b9-5f74-a528-1835-596b481dca54>.
participants (2)
-
Erik Osterlund
-
Luiz Hespanha