<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    On 2023-04-25 17:31, Thomas Stüfe wrote:<br>

    <blockquote type="cite" cite="mid:CAA-vtUwYHy7EPnsLH02VFvhV-6z5=Ht_V-Yb4nwe=7Vq-KVTmg@mail.gmail.com">

      

      <div dir="ltr">

        <div>Hi Stefan,</div>

        <div><br>

        </div>

        <div>thanks a lot for your answers. Wrt THPs, yes, it would be

          wise to use explicit huge pages. <br>

        </div>

        <div><br>

        </div>

        <div>Does the single ZUnmapper thread compete with all mutator

          threads for the page allocator? <br>

        </div>

      </div>

    </blockquote>

    <br>

    In most cases the mutator threads don't compete with the ZUnmapper

    thread (except for CPU time). However, if we need to allocate either

    a medium page or a large page, and we can't grow the heap more, and

    there's no large enough page in the page cache, then we gather a

    bunch of free pages from the page cache (i.e. page cache flushing)

    and "steal" the physical memory and assign it to a new virtual

    memory range of the required sized. Then we put the flushed pages

    onto the unmap queue and let the ZUnmapper thread deal with it. So,

    the manipulation of the unmap queue uses a lock and that lock is

    what the mutator and ZUnmapper thread competes for. I first thought

    that lock contention on this thread caused the issues we were seeing

    in our internal tests, but for us it seemed to be much more caused

    by the ZUnmapper thread not getting enough run time.<br>

    <br>

    If you start to see messages about "Page Cache Flushed: " in the gc

    logs then you know that we have run the path described above.<br>

    <br>

    StefanK<br>

    <br>

    <blockquote type="cite" cite="mid:CAA-vtUwYHy7EPnsLH02VFvhV-6z5=Ht_V-Yb4nwe=7Vq-KVTmg@mail.gmail.com">

      <div dir="ltr">

        <div><br>

        </div>

        <div>Thanks, Thomas<br>

        </div>

        <div><br>

        </div>

        <div><br>

        </div>

        <div><br>

        </div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Tue, Apr 25, 2023 at

          2:59 PM Stefan Karlsson <<a href="mailto:stefan.karlsson@oracle.com" moz-do-not-send="true" class="moz-txt-link-freetext">stefan.karlsson@oracle.com</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px

          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div> <font face="monospace">Hi Thomas,</font><br>

            <br>

            <div>On 2023-04-25 09:58, Thomas Stüfe wrote:<br>

            </div>

            <blockquote type="cite">

              <div dir="ltr">Hi ZGC experts,<br>

                <br>

                I see a strangeness with one of our customers running

                JDK 17 with ZGC, THP enabled (always), and a large heap

                of 4.6TB. <br>

              </div>

            </blockquote>

            <br>

            Side-note: be careful about using THP and expecting good

            latencies, but if you do want to use THP with ZGC make sure

            to also change:<br>

            <code><br>

              /sys/kernel/mm/transparent_hugepage/shmem_enabled<br>

              <br>

              <a href="https://wiki.openjdk.org/display/zgc" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://wiki.openjdk.org/display/zgc</a><br>

            </code><br>

            <blockquote type="cite">

              <div dir="ltr"><br>

                The number of VMAs exceeds 20 million. I try to

                understand whether that is normal or pathological.<br>

                <br>

                Looking at maps, I see millions of adjacent VMAs that

                point into the heap to different offsets:<br>

                <br>

                ```<br>

                15fc5f600000-15fc5f800000 rw-s 24630400000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                15fc5f800000-15fc5fa00000 rw-s 2504e600000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                15fc5fa00000-15fc5fc00000 rw-s 25330000000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                15fc5fc00000-15fc5fe00000 rw-s 26324200000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                15fc5fe00000-15fc60000000 rw-s 26f03a00000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                ```<br>

                <br>

                The different offsets prevent these mappings from being

                folded.<br>

                <br>

                The number of mappings surpasses what would be needed to

                map the heap. Almost all are 2MB mappings:<br>

                <br>

                Total number of mappings: 18634289<br>

                Number of 2MB mappings:        18529201<br>

                Per color: 6211420 / 6211429 / 6211439<br>

                <br>

                The total address space covered by these 2MB mappings is

                38TB. Taking into account the triple-mapping, we still

                map about 12TB per color. That far exceeds the necessary

                room for a 4.6TB heap.<br>

              </div>

            </blockquote>

            <br>

            ZGC reserves a larger address space for the heap than the

            given max heap size. This is done to make it easier to deal

            with large objects. There are some hints to the address

            space layout here:<br>

            <a href="https://urldefense.com/v3/__https://github.com/openjdk/zgc/blob/5ea960728c5616373c986ae1343b44043c0db487/src/hotspot/cpu/x86/gc/z/zGlobals_x86.cpp__;!!ACWV5N9M2RV99hQ!InKmrEgd37o1vph7b34heLsWF3cazBgBKiLbsBP-IeLQ63mezZbwtCFxatSe8E7vZkveYWnKulwj5PVczQe8Q4RzJKI$" target="_blank" moz-do-not-send="true">https://github.com/openjdk/zgc/blob/5ea960728c5616373c986ae1343b44043c0db487/src/hotspot/cpu/x86/gc/z/zGlobals_x86.cpp</a><br>

            <br>

            <blockquote type="cite">

              <div dir="ltr"><br>

                Examining the mappings, I see that many offsets into the

                heap are mapped to multiple points, even discounting the

                triple mapping. For example, offset 105fe800000 is

                mapped six times per color, for a total of 12 times:<br>

                <br>

                13438de00000-13438e000000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                15bf79400000-15bf79600000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                165022800000-165022a00000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                16fdad200000-16fdad400000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                17b1b9600000-17b1b9800000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                1d9860000000-1d9860200000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                <br>

                23438de00000-23438e000000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                25bf79400000-25bf79600000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                265022800000-265022a00000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                26fdad200000-26fdad400000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                27b1b9600000-27b1b9800000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                2d9860000000-2d9860200000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                <br>

                43438de00000-43438e000000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                45bf79400000-45bf79600000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                465022800000-465022a00000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                46fdad200000-46fdad400000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                47b1b9600000-47b1b9800000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                4d9860000000-4d9860200000 rw-s 105fe800000 00:0f

                373323680               /memfd:java_heap.hugetlb

                (deleted)<br>

                <br>

              </div>

            </blockquote>

            <br>

            What I think happens here is that when we detach

            virtual-to-physical memory mappings we don't do it

            immediately, instead the memory is handed over to a separate

            ZUnmapper thread. If that thread gets starved, typically

            because of an over provisioned machine, then these mappings

            start to build up. You can see the ZUnmapper code here:<br>

            <a href="https://urldefense.com/v3/__https://github.com/openjdk/zgc/blob/5ea960728c5616373c986ae1343b44043c0db487/src/hotspot/share/gc/z/zUnmapper.cpp__;!!ACWV5N9M2RV99hQ!InKmrEgd37o1vph7b34heLsWF3cazBgBKiLbsBP-IeLQ63mezZbwtCFxatSe8E7vZkveYWnKulwj5PVczQe8MHiUbBs$" target="_blank" moz-do-not-send="true">https://github.com/openjdk/zgc/blob/5ea960728c5616373c986ae1343b44043c0db487/src/hotspot/share/gc/z/zUnmapper.cpp</a><br>

            <br>

            I recently looked into this and thought that the starvation

            happened because of how we take the lock for every ZPage we

            want to unmap. I prototyped a way to bulk fetch all pages,

            but that didn't seem to help. AFAICT, the big problem for us

            was still that the ZUnmapper thread was starved out. The

            prototype is here:<br>

            <a href="https://urldefense.com/v3/__https://github.com/stefank/jdk/tree/zgc_generational_bulk_unmapper__;!!ACWV5N9M2RV99hQ!InKmrEgd37o1vph7b34heLsWF3cazBgBKiLbsBP-IeLQ63mezZbwtCFxatSe8E7vZkveYWnKulwj5PVczQe8DCq6qQQ$" target="_blank" moz-do-not-send="true">https://github.com/stefank/jdk/tree/zgc_generational_bulk_unmapper</a><br>

            <br>

            You can can actually see this problem if you monitor the

            amount of committed memory in the Java heap. When this

            happens the reported amount of committed memory increases

            and can even go past the max heap size. This is a bug

            because of how report our virtual memory to NMT. I created a

            bug for that:<br>

            <a href="https://bugs.openjdk.org/browse/JDK-8306841" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://bugs.openjdk.org/browse/JDK-8306841</a><br>

            <br>

            And a prototype:<br>

            <a href="https://urldefense.com/v3/__https://github.com/stefank/jdk/tree/zgc_generational_fix_nmt_overcommit_reporting__;!!ACWV5N9M2RV99hQ!InKmrEgd37o1vph7b34heLsWF3cazBgBKiLbsBP-IeLQ63mezZbwtCFxatSe8E7vZkveYWnKulwj5PVczQe8k4Vxlh4$" target="_blank" moz-do-not-send="true">https://github.com/stefank/jdk/tree/zgc_generational_fix_nmt_overcommit_reporting</a><br>

            <br>

            <blockquote type="cite">

              <div dir="ltr">The ZGC Page table contains close to a

                million ZGC pages and looks okay for a heap of that

                size:<br>

                Small: 739175<br>

                Medium: 10160<br>

                Large:   65495<br>

                               -------<br>

                                814830<br>

                                <br>

                My question: is such a high number of mappings for ZGC

                normal?<br>

              </div>

            </blockquote>

            <br>

            A larger number of mappings is normal, but what you have

            above indicates some kind of performance issue with the

            system.<br>

            <br>

            Cheers,<br>

            StefanK<br>

            <br>

            <blockquote type="cite">

              <div dir="ltr"><br>

                Thank you for your time,<br>

                <br>

                Cheers, Thomas</div>

            </blockquote>

            <br>

          </div>

        </blockquote>

      </div>

    </blockquote>

    <br>

  </body>

</html>