jcmd VM.native_memory extremely large numbers when using ZGC

Mon Oct 28 15:30:19 UTC 2024

I don't see a problem.

Process has an RSS of 2.7 GB. The JVM,- according to NMT, has ~7 GB
committed. That seems to be in line for a heap of 6GB.

On Mon, Oct 28, 2024 at 4:19 PM Marçal Perapoch Amadó <
marcal.perapoch at gmail.com> wrote:

> Hello again, Thomas.
>
> Attaching the NMT report we got from running our app
> with -XX:NativeMemoryTracking=detail and extracted with `jcmd <PID>
> VM.native_memory detail`, the GC log and a screenshot of the `top` command.
>
> Our application is running on K8s in Google Cloud Platform using openjdk
> version "21.0.4" 2024-07-16 LTS.
>
> Unfortunately we could not get the System.map report because we are using
> java 21.
>
> Please let me know if you need more information.
>
> Cheers,
> Marçal
>
>
> Missatge de Thomas Stüfe <thomas.stuefe at gmail.com> del dia dl., 28 d’oct.
> 2024 a les 11:11:
>
>> Hi Marcel,
>>
>> Too little information to say anything - would need NMT report, possible
>> jcmd System.map, and possibly the GC log. I am also not aware of any sizing
>> recommendations when switching from G1 to ZGC, but they probably exist and
>> the ZGC devs that normally frequent this ML know this stuff better than I
>> do.
>>
>> Cheers, Thomas
>>
>> On Mon, Oct 28, 2024 at 10:58 AM Marçal Perapoch Amadó <
>> marcal.perapoch at gmail.com> wrote:
>>
>>> Hey Thomas,
>>>
>>> Thanks a lot for your answer and the information you provided. I think
>>> you are right about generational not using multi-mapping (
>>> https://openjdk.org/jeps/439 - "No multi-mapped memory") also I didn't
>>> know about the max heap size * 16, which does seems to match the
>>> numbers I was seeing in my computer. Good info, thanks again!
>>>
>>> > As in, Java OOMEs? OOM killer? Or the pod being killed from the pod
>>> management?
>>> Our canary pods using ZGC were OOM killed, yes. It's also visible in our
>>> metrics how the "container_memory_working_set_bytes" of the pods using zgc
>>> went above 20GB even though they were set to use a max heap of 6GB.
>>>
>>> Also, I forgot to mention (in case it helps) we are running:
>>> openjdk 21.0.4 2024-07-16 LTS
>>> OpenJDK Runtime Environment Temurin-21.0.4+7 (build 21.0.4+7-LTS)
>>> OpenJDK 64-Bit Server VM Temurin-21.0.4+7 (build 21.0.4+7-LTS, mixed
>>> mode, sharing)
>>>
>>> Best,
>>> Marçal
>>>
>>>
>>> Missatge de Thomas Stüfe <thomas.stuefe at gmail.com> del dia dl., 28
>>> d’oct. 2024 a les 10:25:
>>>
>>>> Hi Marcal,
>>>>
>>>> likely a red herring - "reserved" should not matter unless you
>>>> artificially limit the address space size of the process (e.g. with ulimit
>>>> -v). And even then, ZGC should just work around this limit. Reserved is
>>>> just address space, and modern 64-bit OSes don't penalize you for
>>>> allocating large swathes of address space. It should not cost any real
>>>> memory.
>>>>
>>>> About the large number: AFAIK ZGC in generational mode does not do
>>>> multi-mapping anymore. Both Generational and Single Gen, however, do
>>>> over-allocate address space (max heap size * 16) - that number may be
>>>> smaller if capped by whatever is physically possible on the machine. It
>>>> does that because it rolls its own variant of physical-to-virtual memory
>>>> mapping, and needs room to maneuver. This is done to fight fragmentation
>>>> effects.
>>>>
>>>> If you want to know how much memory the process uses, the "committed"
>>>> numbers in NMT are a lot closer to the truth. They are not the truth,
>>>> however, since memory can be committed but still untouched and therefore
>>>> not live, for example when pre-committing with -Xmx==-Xms. In that case,
>>>> "committed" probably also overreports memory use.
>>>>
>>>> We are working on improving NMT; future versions will report the live
>>>> memory size too, if it can be cheaply obtained. The upcoming version of
>>>> Java 24 also contains an improved variant of jcmd System.map, which tells
>>>> you the live size for each memory segment, and at the end the actual live
>>>> size of all memory. At least on Linux.
>>>>
>>>> > our canary nodes were suddenly killed by OOM
>>>>
>>>> As in, Java OOMEs? OOM killer? Or the pod being killed from the pod
>>>> management?
>>>>
>>>> HTH,
>>>>
>>>> Cheers, Thomas
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Oct 28, 2024 at 9:11 AM Marçal Perapoch Amadó <
>>>> marcal.perapoch at gmail.com> wrote:
>>>>
>>>>> Hello!
>>>>> First of all, congratulations on all the hard work with ZGC!
>>>>>
>>>>> TLDR: Running a simple java main with generational ZGC, and NMT
>>>>> reports 221GB of reserved memory on a 32GB machine.
>>>>>
>>>>> *Context*: at my current company, we're keen on switching from G1GC
>>>>> to ZGC due to its ability to maintain very low pause times. Our problem in
>>>>> particular, is that when we scale up our application, the new nodes get so
>>>>> much traffic in that little time that even the node is technically ready to
>>>>> accept new traffic, the amount of new allocations end up adding a lot of
>>>>> pressure to g1 and that translates to multiple over the second pauses. So
>>>>> we decided to give ZGC a try and although the numbers for those pauses were
>>>>> looking amazing, our canary nodes were suddenly killed by OOM.
>>>>> I've read about the ZGC multi-mapping technique and how that can trick
>>>>> the Linux kernel. I found particularly useful this topic from this same
>>>>> mailing list:
>>>>> https://mail.openjdk.org/pipermail/zgc-dev/2018-November/000511.html
>>>>> and also read about using the -XX:+UseLargePages flag. Even saw a mailing
>>>>> topic about kubernetes and containers having issues with ZCG here:
>>>>> https://mail.openjdk.org/pipermail/zgc-dev/2023-August/001259.html.
>>>>> However, despite this research, I have not been able to find a
>>>>> solution to the issue. So I decided to reproduce the problem locally for
>>>>> further investigation. Although my local environment is quite different
>>>>> from our live setup, I encountered the same high reserved memory behavior.
>>>>>
>>>>> I created a very simple java application (just a Main that loops
>>>>> forever waiting for a number from the console and performs some allocations
>>>>> based on that, but I don't think that matters that much).
>>>>> I run my application with the following JVM args:
>>>>> -XX:+UseZGC
>>>>> -XX:+ZGenerational
>>>>> -Xms12g
>>>>> -Xmx12g
>>>>> -XX:NativeMemoryTracking=summary
>>>>> -Xlog:gc*:gc.log
>>>>>
>>>>> And that produces the following report on my MacBook Pro M2, 32GB.
>>>>>
>>>>> *Native Memory Tracking*:
>>>>> (Omitting categories weighting less than 1GB)
>>>>>
>>>>> Total: reserved=221GB, committed=12GB
>>>>>        malloc: 0GB #38256
>>>>>        mmap:   reserved=221GB, committed=12GB
>>>>>
>>>>> -                 Java Heap (reserved=192GB, committed=12GB)
>>>>>                             (mmap: reserved=192GB, committed=12GB, at
>>>>> peak)
>>>>>
>>>>> -                     Class (reserved=1GB, committed=0GB)
>>>>>                             (classes #2376)
>>>>>                             (  instance classes #2142, array classes
>>>>> #234)
>>>>>                             (mmap: reserved=1GB, committed=0GB, at
>>>>> peak)
>>>>>                             (  Metadata:   )
>>>>>                             (    reserved=0GB, committed=0GB)
>>>>>                             (    used=0GB)
>>>>>                             (    waste=0GB =0.79%)
>>>>>                             (  Class space:)
>>>>>                             (    reserved=1GB, committed=0GB)
>>>>>                             (    used=0GB)
>>>>>                             (    waste=0GB =7.49%)
>>>>>
>>>>> -                        GC (reserved=16GB, committed=0GB)
>>>>>                             (mmap: reserved=16GB, committed=0GB, at
>>>>> peak)
>>>>>
>>>>> -                   Unknown (reserved=12GB, committed=0GB)
>>>>>                             (mmap: reserved=12GB, committed=0GB,
>>>>> peak=0GB)
>>>>>
>>>>> As you can see, it is reporting a total reserved of 221GB, which I
>>>>> find very confusing. I understand it is related to the muli-mapping
>>>>> technique, but my question is, how can I be sure how much memory my app is
>>>>> using if even with jcmd I get reports like this one?
>>>>>
>>>>> Also, launching the same application with G1, reports Total:
>>>>> reserved=14GB, committed=12GB.
>>>>>
>>>>> Sorry if that has already been reported/answered, I really tried to
>>>>> inform myself before wasting your time, but I do have the impression that I
>>>>> am missing something here.
>>>>>
>>>>> Could you please provide any insights or suggestions on what might be
>>>>> happening, or how we could mitigate this issue?
>>>>> If not jcmd, which tool/command would you recommend to measure
>>>>> the memory consumption? We’d greatly appreciate your advice on how to move
>>>>> forward.
>>>>>
>>>>> Thank you very much for your time and help!
>>>>>
>>>>>
>>>>> Marçal
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/zgc-dev/attachments/20241028/48bf9326/attachment.htm>