<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    On 2023-03-14 10:08, Evaristo José Camarero wrote:<br>
    <blockquote type="cite" cite="mid:1960585974.2767326.1678784915451@mail.yahoo.com">
      
      <div class="ydp19485138yahoo-style-wrap" style="font-family:Helvetica Neue, Helvetica, Arial,
        sans-serif;font-size:13px;">
        <div dir="ltr" data-setdir="false">Thanks a lot Stefan.</div>
        <div dir="ltr" data-setdir="false"><br>
        </div>
        <div dir="ltr" data-setdir="false">I think this flag makes much
          more sense that playing with GC Parallel and Concurrent
          threads!!!!</div>
        <div><br>
        </div>
        <div dir="ltr" data-setdir="false">Some extra feedback that
          maybe is relevant.</div>
        <div dir="ltr" data-setdir="false"><br>
        </div>
        <div dir="ltr" data-setdir="false">I reported that ZGC was using
          20% more CPU than G1 for the same workload, BUT that was not
          totally true in all cases. I decided to decrease the traffic
          received by the Geode cluster, and for the same amount of
          traffic the ZGC was using around 10% more CPU than G1 (21
          cores versus 19 cores). G1 was configured to target 25 msecs
          pause times while concurrent ZGC was doing an amazing job.
          This has great benefits in our system, because Apache Geode
          client API is blocking and the GC pauses are somehow amplified
          by the rest of the system. So when G1 is pausing the app, we
          observe collateral effect like clients expanding connection
          pools and similar things.</div>
        <div dir="ltr" data-setdir="false"><br>
        </div>
        <div dir="ltr" data-setdir="false">Average response latencies
          provided by the Geode cluster were aligned with G1 or even
          better, and for sure more predictable.</div>
        <div dir="ltr" data-setdir="false"><br>
        </div>
        <div dir="ltr" data-setdir="false">I can only say nice things.
          It is working fine, no qualities issues found in our use
          cases. It is NOT as frugal as G1 (expected) BUT looks much
          better than non Generational ZGC in our case. I will share
          more feedback when we run more tests if you are interested.</div>
      </div>
    </blockquote>
    <br>
    Thanks for the extra feedback. Yes, please continue sharing your
    experience with using Generational ZGC. Feedback from the community
    helps us enhance the GC.<br>
    <br>
    We have just published a new set of EA builds with a few tweaks to
    the GC. See:<br>
    <a class="moz-txt-link-freetext" href="https://mail.openjdk.org/pipermail/zgc-dev/2023-March/001236.html">https://mail.openjdk.org/pipermail/zgc-dev/2023-March/001236.html</a><br>
    <br>
    <blockquote type="cite" cite="mid:1960585974.2767326.1678784915451@mail.yahoo.com">
      <div class="ydp19485138yahoo-style-wrap" style="font-family:Helvetica Neue, Helvetica, Arial,
        sans-serif;font-size:13px;">
        <div dir="ltr" data-setdir="false"><br>
        </div>
        <div dir="ltr" data-setdir="false">I saw that JEP is now
          candidate (congrats to the ZGC team for the hard work).</div>
      </div>
    </blockquote>
    <br>
    Thanks!<br>
    StefanK<br>
    <br>
    <blockquote type="cite" cite="mid:1960585974.2767326.1678784915451@mail.yahoo.com">
      <div class="ydp19485138yahoo-style-wrap" style="font-family:Helvetica Neue, Helvetica, Arial,
        sans-serif;font-size:13px;">
        <div dir="ltr" data-setdir="false"><br>
        </div>
        <div dir="ltr" data-setdir="false">Regards,</div>
        <div dir="ltr" data-setdir="false"><br>
        </div>
        <div dir="ltr" data-setdir="false">Evaristo</div>
      </div>
      <div id="yahoo_quoted_9432393787" class="yahoo_quoted">
        <div style="font-family:'Helvetica Neue', Helvetica, Arial,
          sans-serif;font-size:13px;color:#26282a;">
          <div> En martes, 14 de marzo de 2023, 09:14:14 CET, Stefan
            Karlsson <a class="moz-txt-link-rfc2396E" href="mailto:stefan.karlsson@oracle.com"><stefan.karlsson@oracle.com></a> escribió: </div>
          <div><br>
          </div>
          <div><br>
          </div>
          <div>
            <div id="yiv9226560069">
              <div> Hi Evaristo,<br clear="none">
                <br clear="none">
                Thanks for providing feedback on Generational ZGC. There
                is a JVM flag that can be used to force the JVM to
                assume a given number of cores:
                -XX:ActiveProcessorCount=<N><br clear="none">
                <br clear="none">
                From the code:<br clear="none">
                  product(int, ActiveProcessorCount,
                -1,                                    \<br clear="none">
                          "Specify the CPU count the VM should use and
                report as active")   \<br clear="none">
                <br clear="none">
                I've personally never used it before, but I see that
                when I try it that ZGC scales the worker threads
                accordingly. Maybe this flag could be useful for your
                use-case.<br clear="none">
                <br clear="none">
                Thanks,<br clear="none">
                StefanK<br clear="none">
                <br clear="none">
                <div id="yiv9226560069yqt49029" class="yiv9226560069yqt9075008956">
                  <div class="yiv9226560069moz-cite-prefix">On
                    2023-03-14 08:26, Evaristo José Camarero wrote:<br clear="none">
                  </div>
                  <blockquote type="cite">
                    <div style="font-family:Helvetica Neue, Helvetica,
                      Arial, sans-serif;font-size:13px;" class="yiv9226560069ydp7e7fa21eyahoo-style-wrap">
                      <div dir="ltr">Thanks Peter,</div>
                      <div dir="ltr"><br clear="none">
                      </div>
                      <div dir="ltr">This environment is NOT based on
                        AWS. It is deployed on a custom K8S flavor on
                        top of an OpenStack virtualization layer.</div>
                      <div dir="ltr"><br clear="none">
                      </div>
                      <div dir="ltr">The system has some level of CPU
                        overprovisioning, so that is the root cause of
                        the problem. App threads + Gen ZGC threads are
                        using more CPU than available. We will repeat
                        the tests with more resources to avoid the
                        issue.</div>
                      <div dir="ltr"><br clear="none">
                      </div>
                      <div dir="ltr">My previous question is more
                        related with ZGC ergonomics, and to fully
                        understand the best approach when deploying in
                        Kubernetes.</div>
                      <div dir="ltr"><br clear="none">
                      </div>
                      <div dir="ltr">I saw that Gen ZGC is calculating
                        Runtime workers using: Nº CPU * 0,6 and max
                        concurrent workers per generation using Nº CPU *
                        0,25. In a K8s POD you can define CPU limit (max
                        CPU potentially available for the POD) and CPU
                        request (CPU booked for your POD). The JVM is
                        considering the CPU limit for ergonomics
                        implementation. In our case both values diverge
                        quite a lot (64 CPUs limit vs 32 CPU request),
                        and makes a big difference when ZGC decides
                        number of workers. Usually with G1 we tune
                        number of ParallelGCThreads and ConcGCThreads in
                        order to adapt GC resources. My assumptions with
                        Gen ZGC is that again both parameters are the
                        key ones to control used resources by the
                        collector.</div>
                      <div dir="ltr"><br clear="none">
                      </div>
                      <div dir="ltr">In our next test we will use:</div>
                      <div dir="ltr"><span><span style="color:rgb(0, 0,
                            0);font-family:Helvetica Neue, Helvetica,
                            Arial, sans-serif;">ParallelGCThreads = CPU
                            request * 0,6</span></span><br clear="none">
                      </div>
                      <div dir="ltr"><span><span style="color:rgb(0, 0,
                            0);font-family:Helvetica Neue, Helvetica,
                            Arial, sans-serif;">ConcGCThreads = CPU
                            request * 0,25</span></span></div>
                      <div dir="ltr"><span><span style="color:rgb(0, 0,
                            0);font-family:Helvetica Neue, Helvetica,
                            Arial, sans-serif;"><br clear="none">
                          </span></span></div>
                      <div dir="ltr"><span><span style="color:rgb(0, 0,
                            0);font-family:Helvetica Neue, Helvetica,
                            Arial, sans-serif;">Under the assumption
                            that system is dimension to non surpass the
                            request CPU usage</span></span></div>
                      <div dir="ltr"><span><span style="color:rgb(0, 0,
                            0);font-family:Helvetica Neue, Helvetica,
                            Arial, sans-serif;"><br clear="none">
                          </span></span></div>
                      <div dir="ltr"><span><span style="color:rgb(0, 0,
                            0);font-family:Helvetica Neue, Helvetica,
                            Arial, sans-serif;">Does it make sense? Any
                            other suggestion?</span></span></div>
                      <div dir="ltr"><span><span style="color:rgb(0, 0,
                            0);font-family:Helvetica Neue, Helvetica,
                            Arial, sans-serif;"><br clear="none">
                          </span></span></div>
                      <div dir="ltr"><span><span style="color:rgb(0, 0,
                            0);font-family:Helvetica Neue, Helvetica,
                            Arial, sans-serif;">Regards,</span></span></div>
                      <div dir="ltr"><span><span style="color:rgb(0, 0,
                            0);font-family:Helvetica Neue, Helvetica,
                            Arial, sans-serif;"><br clear="none">
                          </span></span></div>
                      <div dir="ltr"><span><span style="color:rgb(0, 0,
                            0);font-family:Helvetica Neue, Helvetica,
                            Arial, sans-serif;">Evaristo</span></span></div>
                      <div dir="ltr"><br clear="none">
                      </div>
                      <div dir="ltr"><br clear="none">
                      </div>
                      <div><br clear="none">
                      </div>
                    </div>
                    <div id="yiv9226560069yahoo_quoted_8891239309" class="yiv9226560069yahoo_quoted">
                      <div style="font-family:'Helvetica Neue',
                        Helvetica, Arial,
                        sans-serif;font-size:13px;color:#26282a;">
                        <div> En lunes, 13 de marzo de 2023, 10:18:01
                          CET, Peter Booth <a rel="nofollow noopener
                            noreferrer" shape="rect" ymailto="mailto:peter_booth@me.com" target="_blank" href="mailto:peter_booth@me.com" class="yiv9226560069moz-txt-link-rfc2396E" moz-do-not-send="true"><peter_booth@me.com></a>
                          escribió: </div>
                        <div><br clear="none">
                        </div>
                        <div><br clear="none">
                        </div>
                        <div>
                          <div id="yiv9226560069">
                            <div>
                              <div class="yiv9226560069"><br class="yiv9226560069" clear="none">
                              </div>
                              <div class="yiv9226560069">The default
                                geode heartbeat timeout interval is 5
                                seconds, which is an eternity.</div>
                              <div class="yiv9226560069"><br class="yiv9226560069" clear="none">
                              </div>
                              <div class="yiv9226560069">Some
                                points/questions:</div>
                              <div class="yiv9226560069"><br class="yiv9226560069" clear="none">
                              </div>
                              <div class="yiv9226560069">I’d recommend
                                using either Solarflare’s sysjitter tool
                                or Gil Tene’s jhiccup to quantify your
                                OS jitter</div>
                              <div class="yiv9226560069">Can you capture
                                the output of vmstat 1 60 ?</div>
                              <div class="yiv9226560069">What kind of
                                EC2 instances are you using? Are they
                                RHEL?</div>
                              <div class="yiv9226560069">How much
                                physical RAM does each instance have?</div>
                              <div class="yiv9226560069">
                                <div class="yiv9226560069">Do you have
                                  THP enabled?</div>
                                <div class="yiv9226560069">What is the
                                  value of vm.min_free_kbytes ?</div>
                              </div>
                              <div class="yiv9226560069"><br class="yiv9226560069" clear="none">
                              </div>
                              <div class="yiv9226560069"><br class="yiv9226560069" clear="none">
                              </div>
                              How often do you see missed heartbeats?
                              <div class="yiv9226560069">What length of
                                time do you see the Adjusting Workers
                                message?</div>
                              <div class="yiv9226560069"><br class="yiv9226560069" clear="none">
                              </div>
                              <div class="yiv9226560069"><br class="yiv9226560069" clear="none">
                              </div>
                              <div class="yiv9226560069"><br class="yiv9226560069" clear="none">
                              </div>
                              <div class="yiv9226560069"><br class="yiv9226560069" clear="none">
                              </div>
                              <div class="yiv9226560069"><br class="yiv9226560069" clear="none">
                              </div>
                              <div id="yiv9226560069yqt69951" class="yiv9226560069yqt5015505977">
                                <div class="yiv9226560069"><br class="yiv9226560069" clear="none">
                                  <div><br class="yiv9226560069" clear="none">
                                    <blockquote type="cite" class="yiv9226560069">
                                      <div class="yiv9226560069">On Mar
                                        13, 2023, at 4:42 AM, Evaristo
                                        José Camarero <<a rel="nofollow noopener
                                          noreferrer" shape="rect" ymailto="mailto:evaristojosec@yahoo.es" target="_blank" href="mailto:evaristojosec@yahoo.es" class="yiv9226560069
                                          yiv9226560069moz-txt-link-freetext
                                          moz-txt-link-freetext" moz-do-not-send="true">evaristojosec@yahoo.es</a>>
                                        wrote:</div>
                                      <br class="yiv9226560069Apple-interchange-newline" clear="none">
                                      <div class="yiv9226560069">
                                        <div class="yiv9226560069">
                                          <div style="font-family:Helvetica
                                            Neue, Helvetica, Arial,
                                            sans-serif;font-size:13px;" class="yiv9226560069yahoo-style-wrap">
                                            <div dir="ltr" class="yiv9226560069">Hi
                                              there,</div>
                                            <div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
                                            </div>
                                            <div dir="ltr" class="yiv9226560069">We
                                              are trying latest ZGC EAB
                                              for testing an Apache
                                              Geode Cluster (distributed
                                              Key Value store with
                                              similar use cases that
                                              Apache Cassandra)</div>
                                            <div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
                                            </div>
                                            <div dir="ltr" class="yiv9226560069">We
                                              are using K8s, and we have
                                              PODs with 32 cores request
                                              (limit with 60 cores) per
                                              data node with 150GB heap
                                              per node.</div>
                                            <div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
                                            </div>
                                            <div dir="ltr" class="yiv9226560069">
                                              <div class="yiv9226560069">
                                                <div class="yiv9226560069"> 
                                                          - -Xmx152000m</div>
                                                <div class="yiv9226560069"> 
                                                          - -Xms152000m</div>
                                                <div class="yiv9226560069"> 
                                                          - -XX:+UseZGC</div>
                                                <div class="yiv9226560069"> 
                                                          -
                                                  -XX:SoftMaxHeapSize=136000m</div>
                                                <div class="yiv9226560069"><span style="white-space:pre-wrap;" class="yiv9226560069">         </span> -
-XX:ZAllocationSpikeTolerance=4.0   // We have some spiky workloads
                                                  periodically </div>
                                                <div class="yiv9226560069"> 
                                                          - -XX:+UseNUMA<br class="yiv9226560069" clear="none">
                                                </div>
                                              </div>
                                              <br class="yiv9226560069" clear="none">
                                            </div>
                                            <div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
                                            </div>
                                            <div dir="ltr" class="yiv9226560069">ZGC
                                              is working great in regard
                                              GC pauses with no
                                              allocation stalls at all
                                              during almost all the
                                              time. We observe higher
                                              CPU utilization that G1
                                              (Around 20% for a heavy
                                              workload using flag <span class="yiv9226560069">-XX:ZAllocationSpikeTolerance=4.0
                                                that maybe is making ZGC
                                                more hungry than needed.
                                                We will play further
                                                with this</span>)</div>
                                            <div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
                                            </div>
                                            <div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
                                            </div>
                                            <div dir="ltr" class="yiv9226560069">BUT
                                              from time to time we see
                                              that Geode Clusters are
                                              missing heartbeats between
                                              and Geode logic shutdowns
                                              the JVM of the node with
                                              missing heartbeats. We
                                              believe that the main
                                              problem could be CPU
                                              starvation, because some
                                              seconds before this is
                                              happening we observe ZGC
                                              to use more workers for
                                              making the job done<br class="yiv9226560069" clear="none">
                                              <br class="yiv9226560069" clear="none">
                                              <div class="yiv9226560069"><span dir="ltr" class="yiv9226560069ydp64d0df88ui-provider
yiv9226560069ydp64d0df88h yiv9226560069ydp64d0df88r
                                                  yiv9226560069ydp64d0df88k
yiv9226560069ydp64d0df88u yiv9226560069ydp64d0df88ag
                                                  yiv9226560069ydp64d0df88d
yiv9226560069ydp64d0df88ab yiv9226560069ydp64d0df88n
                                                  yiv9226560069ydp64d0df88x
yiv9226560069ydp64d0df88g yiv9226560069ydp64d0df88q
                                                  yiv9226560069ydp64d0df88aj
yiv9226560069ydp64d0df88ae yiv9226560069ydp64d0df88j
                                                  yiv9226560069ydp64d0df88t
yiv9226560069ydp64d0df88c yiv9226560069ydp64d0df88m
                                                  yiv9226560069ydp64d0df88ah
yiv9226560069ydp64d0df88w yiv9226560069ydp64d0df88ac
                                                  yiv9226560069ydp64d0df88f
yiv9226560069ydp64d0df88p yiv9226560069ydp64d0df88z
                                                  yiv9226560069ydp64d0df88i
yiv9226560069ydp64d0df88ak yiv9226560069ydp64d0df88s
                                                  yiv9226560069ydp64d0df88ve
yiv9226560069ydp64d0df88b yiv9226560069ydp64d0df88af
                                                  yiv9226560069ydp64d0df88l
yiv9226560069ydp64d0df88v yiv9226560069ydp64d0df88e
                                                  yiv9226560069ydp64d0df88o
yiv9226560069ydp64d0df88ai yiv9226560069ydp64d0df88y">[2023-03-12T18:29:38.980+0000]
                                                  Adjusting Workers for
                                                  Young Generation: 1
                                                  -> 2<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:39.781+0000] Adjusting Workers for Young Generation: 1
                                                  -> 3<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:40.181+0000] Adjusting Workers for Young Generation: 1
                                                  -> 4<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:40.382+0000] Adjusting Workers for Young Generation: 1
                                                  -> 5<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:40.582+0000] Adjusting Workers for Young Generation: 1
                                                  -> 6<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:40.782+0000] Adjusting Workers for Young Generation: 1
                                                  -> 7<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:40.882+0000] Adjusting Workers for Young Generation: 1
                                                  -> 8<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:40.982+0000] Adjusting Workers for Young Generation: 1
                                                  -> 10<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:41.083+0000] Adjusting Workers for Young Generation: 1
                                                  -> 13<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:41.183+0000] Adjusting Workers for Young Generation: 1
                                                  -> 16</span></div>
                                              <br class="yiv9226560069" clear="none">
                                            </div>
                                            <div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
                                            </div>
                                            <div dir="ltr" class="yiv9226560069">As
                                              commented we are using K8S
                                              with PODs that have 32
                                              cores for request and 60
                                              cores for limit (and it
                                              also true that our K8s
                                              workers are close to the
                                              limit in CPU utilization).
                                              ZGC is assuming on booting
                                              that machine has 60 cores
                                              (as logged). What is the
                                              best way to configure the
                                              ZGC to provide a hint to
                                              be tuned for a host with
                                              32 cores (basically the 60
                                              cores limit is just to
                                              avoid K8s produced
                                              throttling)? Is it using
                                              ParallelGCThreads flag?
                                              Any other thoughts?</div>
                                            <div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
                                            </div>
                                            <div dir="ltr" class="yiv9226560069">Regards,</div>
                                            <div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
                                            </div>
                                            <div dir="ltr" class="yiv9226560069">Evaristo</div>
                                            <div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
                                            </div>
                                          </div>
                                        </div>
                                      </div>
                                    </blockquote>
                                  </div>
                                  <br class="yiv9226560069" clear="none">
                                </div>
                              </div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                <br clear="none">
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>