<html><head></head><body><div class="ydp19485138yahoo-style-wrap" style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px;"><div></div>

        <div dir="ltr" data-setdir="false">Thanks a lot Stefan.</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">I think this flag makes much more sense that playing with GC Parallel and Concurrent threads!!!!</div><div><br></div><div dir="ltr" data-setdir="false">Some extra feedback that maybe is relevant.</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">I reported that ZGC was using 20% more CPU than G1 for the same workload, BUT that was not totally true in all cases. I decided to decrease the traffic received by the Geode cluster, and for the same amount of traffic the ZGC was using around 10% more CPU than G1 (21 cores versus 19 cores). G1 was configured to target 25 msecs pause times while concurrent ZGC was doing an amazing job. This has great benefits in our system, because Apache Geode client API is blocking and the GC pauses are somehow amplified by the rest of the system. So when G1 is pausing the app, we observe collateral effect like clients expanding connection pools and similar things.</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">Average response latencies provided by the Geode cluster were aligned with G1 or even better, and for sure more predictable.</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">I can only say nice things. It is working fine, no qualities issues found in our use cases. It is NOT as frugal as G1 (expected) BUT looks much better than non Generational ZGC in our case. I will share more feedback when we run more tests if you are interested.</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">I saw that JEP is now candidate (congrats to the ZGC team for the hard work).</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">Regards,</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">Evaristo</div>

        </div><div id="yahoo_quoted_9432393787" class="yahoo_quoted">

            <div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">

                <div>

                    En martes, 14 de marzo de 2023, 09:14:14 CET, Stefan Karlsson <stefan.karlsson@oracle.com> escribió:

                </div>

                <div><br></div>

                <div><br></div>

                <div><div id="yiv9226560069"><div>

    Hi Evaristo,<br clear="none">

    <br clear="none">

    Thanks for providing feedback on Generational ZGC. There is a JVM

    flag that can be used to force the JVM to assume a given number of

    cores: -XX:ActiveProcessorCount=<N><br clear="none">

    <br clear="none">

    From the code:<br clear="none">

      product(int, ActiveProcessorCount,

    -1,                                    \<br clear="none">

              "Specify the CPU count the VM should use and report as

    active")   \<br clear="none">

    <br clear="none">

    I've personally never used it before, but I see that when I try it

    that ZGC scales the worker threads accordingly. Maybe this flag

    could be useful for your use-case.<br clear="none">

    <br clear="none">

    Thanks,<br clear="none">

    StefanK<br clear="none">

    <br clear="none">

    <div id="yiv9226560069yqt49029" class="yiv9226560069yqt9075008956"><div class="yiv9226560069moz-cite-prefix">On 2023-03-14 08:26, Evaristo José

      Camarero wrote:<br clear="none">

    </div>

    <blockquote type="cite">

      <div style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px;" class="yiv9226560069ydp7e7fa21eyahoo-style-wrap">

        <div dir="ltr">Thanks Peter,</div>

        <div dir="ltr"><br clear="none">

        </div>

        <div dir="ltr">This environment is NOT based

          on AWS. It is deployed on a custom K8S flavor on top of an

          OpenStack virtualization layer.</div>

        <div dir="ltr"><br clear="none">

        </div>

        <div dir="ltr">The system has some level of

          CPU overprovisioning, so that is the root cause of the

          problem. App threads + Gen ZGC threads are using more CPU than

          available. We will repeat the tests with more resources to

          avoid the issue.</div>

        <div dir="ltr"><br clear="none">

        </div>

        <div dir="ltr">My previous question is more

          related with ZGC ergonomics, and to fully understand the best

          approach when deploying in Kubernetes.</div>

        <div dir="ltr"><br clear="none">

        </div>

        <div dir="ltr">I saw that Gen ZGC is

          calculating Runtime workers using: Nº CPU * 0,6 and max

          concurrent workers per generation using Nº CPU * 0,25. In a

          K8s POD you can define CPU limit (max CPU potentially

          available for the POD) and CPU request (CPU booked for your

          POD). The JVM is considering the CPU limit for ergonomics

          implementation. In our case both values diverge quite a lot

          (64 CPUs limit vs 32 CPU request), and makes a big difference

          when ZGC decides number of workers. Usually with G1 we tune

          number of ParallelGCThreads and ConcGCThreads in order to

          adapt GC resources. My assumptions with Gen ZGC is that again

          both parameters are the key ones to control used resources by

          the collector.</div>

        <div dir="ltr"><br clear="none">

        </div>

        <div dir="ltr">In our next test we will use:</div>

        <div dir="ltr"><span><span style="color:rgb(0, 0, 0);font-family:Helvetica Neue, Helvetica, Arial, sans-serif;">ParallelGCThreads = CPU request * 0,6</span></span><br clear="none">

        </div>

        <div dir="ltr"><span><span style="color:rgb(0, 0, 0);font-family:Helvetica Neue, Helvetica, Arial, sans-serif;">ConcGCThreads = CPU request * 0,25</span></span></div>

        <div dir="ltr"><span><span style="color:rgb(0, 0, 0);font-family:Helvetica Neue, Helvetica, Arial, sans-serif;"><br clear="none">

            </span></span></div>

        <div dir="ltr"><span><span style="color:rgb(0, 0, 0);font-family:Helvetica Neue, Helvetica, Arial, sans-serif;">Under the assumption that system is

              dimension to non surpass the request CPU usage</span></span></div>

        <div dir="ltr"><span><span style="color:rgb(0, 0, 0);font-family:Helvetica Neue, Helvetica, Arial, sans-serif;"><br clear="none">

            </span></span></div>

        <div dir="ltr"><span><span style="color:rgb(0, 0, 0);font-family:Helvetica Neue, Helvetica, Arial, sans-serif;">Does it make sense? Any other

              suggestion?</span></span></div>

        <div dir="ltr"><span><span style="color:rgb(0, 0, 0);font-family:Helvetica Neue, Helvetica, Arial, sans-serif;"><br clear="none">

            </span></span></div>

        <div dir="ltr"><span><span style="color:rgb(0, 0, 0);font-family:Helvetica Neue, Helvetica, Arial, sans-serif;">Regards,</span></span></div>

        <div dir="ltr"><span><span style="color:rgb(0, 0, 0);font-family:Helvetica Neue, Helvetica, Arial, sans-serif;"><br clear="none">

            </span></span></div>

        <div dir="ltr"><span><span style="color:rgb(0, 0, 0);font-family:Helvetica Neue, Helvetica, Arial, sans-serif;">Evaristo</span></span></div>

        <div dir="ltr"><br clear="none">

        </div>

        <div dir="ltr"><br clear="none">

        </div>

        <div><br clear="none">

        </div>

      </div>

      <div id="yiv9226560069yahoo_quoted_8891239309" class="yiv9226560069yahoo_quoted">

        <div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">

          <div> En lunes, 13 de marzo de 2023, 10:18:01 CET, Peter Booth

            <a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:peter_booth@me.com" target="_blank" href="mailto:peter_booth@me.com" class="yiv9226560069moz-txt-link-rfc2396E"><peter_booth@me.com></a> escribió: </div>

          <div><br clear="none">

          </div>

          <div><br clear="none">

          </div>

          <div>

            <div id="yiv9226560069">

              <div>

                <div class="yiv9226560069"><br clear="none" class="yiv9226560069">

                </div>

                <div class="yiv9226560069">The default geode heartbeat

                  timeout interval is 5 seconds, which is an eternity.</div>

                <div class="yiv9226560069"><br clear="none" class="yiv9226560069">

                </div>

                <div class="yiv9226560069">Some points/questions:</div>

                <div class="yiv9226560069"><br clear="none" class="yiv9226560069">

                </div>

                <div class="yiv9226560069">I’d recommend using either

                  Solarflare’s sysjitter tool or Gil Tene’s jhiccup to

                  quantify your OS jitter</div>

                <div class="yiv9226560069">Can you capture the output of

                  vmstat 1 60 ?</div>

                <div class="yiv9226560069">What kind of EC2 instances

                  are you using? Are they RHEL?</div>

                <div class="yiv9226560069">How much physical RAM does

                  each instance have?</div>

                <div class="yiv9226560069">

                  <div class="yiv9226560069">Do you have THP enabled?</div>

                  <div class="yiv9226560069">What is the value of

                    vm.min_free_kbytes ?</div>

                </div>

                <div class="yiv9226560069"><br clear="none" class="yiv9226560069">

                </div>

                <div class="yiv9226560069"><br clear="none" class="yiv9226560069">

                </div>

                How often do you see missed heartbeats?

                <div class="yiv9226560069">What length of time do you

                  see the Adjusting Workers message?</div>

                <div class="yiv9226560069"><br clear="none" class="yiv9226560069">

                </div>

                <div class="yiv9226560069"><br clear="none" class="yiv9226560069">

                </div>

                <div class="yiv9226560069"><br clear="none" class="yiv9226560069">

                </div>

                <div class="yiv9226560069"><br clear="none" class="yiv9226560069">

                </div>

                <div class="yiv9226560069"><br clear="none" class="yiv9226560069">

                </div>

                <div id="yiv9226560069yqt69951" class="yiv9226560069yqt5015505977">

                  <div class="yiv9226560069"><br clear="none" class="yiv9226560069">

                    <div><br clear="none" class="yiv9226560069">

                      <blockquote type="cite" class="yiv9226560069">

                        <div class="yiv9226560069">On Mar 13, 2023, at

                          4:42 AM, Evaristo José Camarero <<a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:evaristojosec@yahoo.es" target="_blank" href="mailto:evaristojosec@yahoo.es" class="yiv9226560069 yiv9226560069moz-txt-link-freetext">evaristojosec@yahoo.es</a>>

                          wrote:</div>

                        <br clear="none" class="yiv9226560069Apple-interchange-newline">

                        <div class="yiv9226560069">

                          <div class="yiv9226560069">

                            <div style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px;" class="yiv9226560069yahoo-style-wrap">

                              <div dir="ltr" class="yiv9226560069">Hi

                                there,</div>

                              <div dir="ltr" class="yiv9226560069"><br clear="none" class="yiv9226560069">

                              </div>

                              <div dir="ltr" class="yiv9226560069">We

                                are trying latest ZGC EAB for testing an

                                Apache Geode Cluster (distributed Key

                                Value store with similar use cases that

                                Apache Cassandra)</div>

                              <div dir="ltr" class="yiv9226560069"><br clear="none" class="yiv9226560069">

                              </div>

                              <div dir="ltr" class="yiv9226560069">We

                                are using K8s, and we have PODs with 32

                                cores request (limit with 60 cores) per

                                data node with 150GB heap per node.</div>

                              <div dir="ltr" class="yiv9226560069"><br clear="none" class="yiv9226560069">

                              </div>

                              <div dir="ltr" class="yiv9226560069">

                                <div class="yiv9226560069">

                                  <div class="yiv9226560069">          -

                                    -Xmx152000m</div>

                                  <div class="yiv9226560069">          -

                                    -Xms152000m</div>

                                  <div class="yiv9226560069">          -

                                    -XX:+UseZGC</div>

                                  <div class="yiv9226560069">          -

                                    -XX:SoftMaxHeapSize=136000m</div>

                                  <div class="yiv9226560069"><span style="white-space:pre-wrap;" class="yiv9226560069">         </span> -

                                    -XX:ZAllocationSpikeTolerance=4.0 

                                     // We have some spiky workloads

                                    periodically </div>

                                  <div class="yiv9226560069">          -

                                    -XX:+UseNUMA<br clear="none" class="yiv9226560069">

                                  </div>

                                </div>

                                <br clear="none" class="yiv9226560069">

                              </div>

                              <div dir="ltr" class="yiv9226560069"><br clear="none" class="yiv9226560069">

                              </div>

                              <div dir="ltr" class="yiv9226560069">ZGC

                                is working great in regard GC pauses

                                with no allocation stalls at all during

                                almost all the time. We observe higher

                                CPU utilization that G1 (Around 20% for

                                a heavy workload using flag <span class="yiv9226560069">-XX:ZAllocationSpikeTolerance=4.0

                                  that maybe is making ZGC more hungry

                                  than needed. We will play further with

                                  this</span>)</div>

                              <div dir="ltr" class="yiv9226560069"><br clear="none" class="yiv9226560069">

                              </div>

                              <div dir="ltr" class="yiv9226560069"><br clear="none" class="yiv9226560069">

                              </div>

                              <div dir="ltr" class="yiv9226560069">BUT

                                from time to time we see that Geode

                                Clusters are missing heartbeats between

                                and Geode logic shutdowns the JVM of the

                                node with missing heartbeats. We believe

                                that the main problem could be CPU

                                starvation, because some seconds before

                                this is happening we observe ZGC to use

                                more workers for making the job done<br clear="none" class="yiv9226560069">

                                <br clear="none" class="yiv9226560069">

                                <div class="yiv9226560069"><span dir="ltr" class="yiv9226560069ydp64d0df88ui-provider yiv9226560069ydp64d0df88h yiv9226560069ydp64d0df88r yiv9226560069ydp64d0df88k yiv9226560069ydp64d0df88u yiv9226560069ydp64d0df88ag yiv9226560069ydp64d0df88d yiv9226560069ydp64d0df88ab yiv9226560069ydp64d0df88n yiv9226560069ydp64d0df88x yiv9226560069ydp64d0df88g yiv9226560069ydp64d0df88q yiv9226560069ydp64d0df88aj yiv9226560069ydp64d0df88ae yiv9226560069ydp64d0df88j yiv9226560069ydp64d0df88t yiv9226560069ydp64d0df88c yiv9226560069ydp64d0df88m yiv9226560069ydp64d0df88ah yiv9226560069ydp64d0df88w yiv9226560069ydp64d0df88ac yiv9226560069ydp64d0df88f yiv9226560069ydp64d0df88p yiv9226560069ydp64d0df88z yiv9226560069ydp64d0df88i yiv9226560069ydp64d0df88ak yiv9226560069ydp64d0df88s yiv9226560069ydp64d0df88ve yiv9226560069ydp64d0df88b yiv9226560069ydp64d0df88af yiv9226560069ydp64d0df88l yiv9226560069ydp64d0df88v yiv9226560069ydp64d0df88e yiv9226560069ydp64d0df88o yiv9226560069ydp64d0df88ai yiv9226560069ydp64d0df88y">[2023-03-12T18:29:38.980+0000]

                                    Adjusting Workers for Young

                                    Generation: 1 -> 2<br clear="none" class="yiv9226560069">

                                    [2023-03-12T18:29:39.781+0000]

                                    Adjusting Workers for Young

                                    Generation: 1 -> 3<br clear="none" class="yiv9226560069">

                                    [2023-03-12T18:29:40.181+0000]

                                    Adjusting Workers for Young

                                    Generation: 1 -> 4<br clear="none" class="yiv9226560069">

                                    [2023-03-12T18:29:40.382+0000]

                                    Adjusting Workers for Young

                                    Generation: 1 -> 5<br clear="none" class="yiv9226560069">

                                    [2023-03-12T18:29:40.582+0000]

                                    Adjusting Workers for Young

                                    Generation: 1 -> 6<br clear="none" class="yiv9226560069">

                                    [2023-03-12T18:29:40.782+0000]

                                    Adjusting Workers for Young

                                    Generation: 1 -> 7<br clear="none" class="yiv9226560069">

                                    [2023-03-12T18:29:40.882+0000]

                                    Adjusting Workers for Young

                                    Generation: 1 -> 8<br clear="none" class="yiv9226560069">

                                    [2023-03-12T18:29:40.982+0000]

                                    Adjusting Workers for Young

                                    Generation: 1 -> 10<br clear="none" class="yiv9226560069">

                                    [2023-03-12T18:29:41.083+0000]

                                    Adjusting Workers for Young

                                    Generation: 1 -> 13<br clear="none" class="yiv9226560069">

                                    [2023-03-12T18:29:41.183+0000]

                                    Adjusting Workers for Young

                                    Generation: 1 -> 16</span></div>

                                <br clear="none" class="yiv9226560069">

                              </div>

                              <div dir="ltr" class="yiv9226560069"><br clear="none" class="yiv9226560069">

                              </div>

                              <div dir="ltr" class="yiv9226560069">As

                                commented we are using K8S with PODs

                                that have 32 cores for request and 60

                                cores for limit (and it also true that

                                our K8s workers are close to the limit

                                in CPU utilization). ZGC is assuming on

                                booting that machine has 60 cores (as

                                logged). What is the best way to

                                configure the ZGC to provide a hint to

                                be tuned for a host with 32 cores

                                (basically the 60 cores limit is just to

                                avoid K8s produced throttling)? Is it

                                using ParallelGCThreads flag? Any other

                                thoughts?</div>

                              <div dir="ltr" class="yiv9226560069"><br clear="none" class="yiv9226560069">

                              </div>

                              <div dir="ltr" class="yiv9226560069">Regards,</div>

                              <div dir="ltr" class="yiv9226560069"><br clear="none" class="yiv9226560069">

                              </div>

                              <div dir="ltr" class="yiv9226560069">Evaristo</div>

                              <div dir="ltr" class="yiv9226560069"><br clear="none" class="yiv9226560069">

                              </div>

                            </div>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                    <br clear="none" class="yiv9226560069">

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote></div>

    <br clear="none">

  </div></div></div>

            </div>

        </div></body></html>