<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
On 2023-03-14 10:08, Evaristo José Camarero wrote:<br>
<blockquote type="cite" cite="mid:1960585974.2767326.1678784915451@mail.yahoo.com">
<div class="ydp19485138yahoo-style-wrap" style="font-family:Helvetica Neue, Helvetica, Arial,
sans-serif;font-size:13px;">
<div dir="ltr" data-setdir="false">Thanks a lot Stefan.</div>
<div dir="ltr" data-setdir="false"><br>
</div>
<div dir="ltr" data-setdir="false">I think this flag makes much
more sense that playing with GC Parallel and Concurrent
threads!!!!</div>
<div><br>
</div>
<div dir="ltr" data-setdir="false">Some extra feedback that
maybe is relevant.</div>
<div dir="ltr" data-setdir="false"><br>
</div>
<div dir="ltr" data-setdir="false">I reported that ZGC was using
20% more CPU than G1 for the same workload, BUT that was not
totally true in all cases. I decided to decrease the traffic
received by the Geode cluster, and for the same amount of
traffic the ZGC was using around 10% more CPU than G1 (21
cores versus 19 cores). G1 was configured to target 25 msecs
pause times while concurrent ZGC was doing an amazing job.
This has great benefits in our system, because Apache Geode
client API is blocking and the GC pauses are somehow amplified
by the rest of the system. So when G1 is pausing the app, we
observe collateral effect like clients expanding connection
pools and similar things.</div>
<div dir="ltr" data-setdir="false"><br>
</div>
<div dir="ltr" data-setdir="false">Average response latencies
provided by the Geode cluster were aligned with G1 or even
better, and for sure more predictable.</div>
<div dir="ltr" data-setdir="false"><br>
</div>
<div dir="ltr" data-setdir="false">I can only say nice things.
It is working fine, no qualities issues found in our use
cases. It is NOT as frugal as G1 (expected) BUT looks much
better than non Generational ZGC in our case. I will share
more feedback when we run more tests if you are interested.</div>
</div>
</blockquote>
<br>
Thanks for the extra feedback. Yes, please continue sharing your
experience with using Generational ZGC. Feedback from the community
helps us enhance the GC.<br>
<br>
We have just published a new set of EA builds with a few tweaks to
the GC. See:<br>
<a class="moz-txt-link-freetext" href="https://mail.openjdk.org/pipermail/zgc-dev/2023-March/001236.html">https://mail.openjdk.org/pipermail/zgc-dev/2023-March/001236.html</a><br>
<br>
<blockquote type="cite" cite="mid:1960585974.2767326.1678784915451@mail.yahoo.com">
<div class="ydp19485138yahoo-style-wrap" style="font-family:Helvetica Neue, Helvetica, Arial,
sans-serif;font-size:13px;">
<div dir="ltr" data-setdir="false"><br>
</div>
<div dir="ltr" data-setdir="false">I saw that JEP is now
candidate (congrats to the ZGC team for the hard work).</div>
</div>
</blockquote>
<br>
Thanks!<br>
StefanK<br>
<br>
<blockquote type="cite" cite="mid:1960585974.2767326.1678784915451@mail.yahoo.com">
<div class="ydp19485138yahoo-style-wrap" style="font-family:Helvetica Neue, Helvetica, Arial,
sans-serif;font-size:13px;">
<div dir="ltr" data-setdir="false"><br>
</div>
<div dir="ltr" data-setdir="false">Regards,</div>
<div dir="ltr" data-setdir="false"><br>
</div>
<div dir="ltr" data-setdir="false">Evaristo</div>
</div>
<div id="yahoo_quoted_9432393787" class="yahoo_quoted">
<div style="font-family:'Helvetica Neue', Helvetica, Arial,
sans-serif;font-size:13px;color:#26282a;">
<div> En martes, 14 de marzo de 2023, 09:14:14 CET, Stefan
Karlsson <a class="moz-txt-link-rfc2396E" href="mailto:stefan.karlsson@oracle.com"><stefan.karlsson@oracle.com></a> escribió: </div>
<div><br>
</div>
<div><br>
</div>
<div>
<div id="yiv9226560069">
<div> Hi Evaristo,<br clear="none">
<br clear="none">
Thanks for providing feedback on Generational ZGC. There
is a JVM flag that can be used to force the JVM to
assume a given number of cores:
-XX:ActiveProcessorCount=<N><br clear="none">
<br clear="none">
From the code:<br clear="none">
product(int, ActiveProcessorCount,
-1, \<br clear="none">
"Specify the CPU count the VM should use and
report as active") \<br clear="none">
<br clear="none">
I've personally never used it before, but I see that
when I try it that ZGC scales the worker threads
accordingly. Maybe this flag could be useful for your
use-case.<br clear="none">
<br clear="none">
Thanks,<br clear="none">
StefanK<br clear="none">
<br clear="none">
<div id="yiv9226560069yqt49029" class="yiv9226560069yqt9075008956">
<div class="yiv9226560069moz-cite-prefix">On
2023-03-14 08:26, Evaristo José Camarero wrote:<br clear="none">
</div>
<blockquote type="cite">
<div style="font-family:Helvetica Neue, Helvetica,
Arial, sans-serif;font-size:13px;" class="yiv9226560069ydp7e7fa21eyahoo-style-wrap">
<div dir="ltr">Thanks Peter,</div>
<div dir="ltr"><br clear="none">
</div>
<div dir="ltr">This environment is NOT based on
AWS. It is deployed on a custom K8S flavor on
top of an OpenStack virtualization layer.</div>
<div dir="ltr"><br clear="none">
</div>
<div dir="ltr">The system has some level of CPU
overprovisioning, so that is the root cause of
the problem. App threads + Gen ZGC threads are
using more CPU than available. We will repeat
the tests with more resources to avoid the
issue.</div>
<div dir="ltr"><br clear="none">
</div>
<div dir="ltr">My previous question is more
related with ZGC ergonomics, and to fully
understand the best approach when deploying in
Kubernetes.</div>
<div dir="ltr"><br clear="none">
</div>
<div dir="ltr">I saw that Gen ZGC is calculating
Runtime workers using: Nº CPU * 0,6 and max
concurrent workers per generation using Nº CPU *
0,25. In a K8s POD you can define CPU limit (max
CPU potentially available for the POD) and CPU
request (CPU booked for your POD). The JVM is
considering the CPU limit for ergonomics
implementation. In our case both values diverge
quite a lot (64 CPUs limit vs 32 CPU request),
and makes a big difference when ZGC decides
number of workers. Usually with G1 we tune
number of ParallelGCThreads and ConcGCThreads in
order to adapt GC resources. My assumptions with
Gen ZGC is that again both parameters are the
key ones to control used resources by the
collector.</div>
<div dir="ltr"><br clear="none">
</div>
<div dir="ltr">In our next test we will use:</div>
<div dir="ltr"><span><span style="color:rgb(0, 0,
0);font-family:Helvetica Neue, Helvetica,
Arial, sans-serif;">ParallelGCThreads = CPU
request * 0,6</span></span><br clear="none">
</div>
<div dir="ltr"><span><span style="color:rgb(0, 0,
0);font-family:Helvetica Neue, Helvetica,
Arial, sans-serif;">ConcGCThreads = CPU
request * 0,25</span></span></div>
<div dir="ltr"><span><span style="color:rgb(0, 0,
0);font-family:Helvetica Neue, Helvetica,
Arial, sans-serif;"><br clear="none">
</span></span></div>
<div dir="ltr"><span><span style="color:rgb(0, 0,
0);font-family:Helvetica Neue, Helvetica,
Arial, sans-serif;">Under the assumption
that system is dimension to non surpass the
request CPU usage</span></span></div>
<div dir="ltr"><span><span style="color:rgb(0, 0,
0);font-family:Helvetica Neue, Helvetica,
Arial, sans-serif;"><br clear="none">
</span></span></div>
<div dir="ltr"><span><span style="color:rgb(0, 0,
0);font-family:Helvetica Neue, Helvetica,
Arial, sans-serif;">Does it make sense? Any
other suggestion?</span></span></div>
<div dir="ltr"><span><span style="color:rgb(0, 0,
0);font-family:Helvetica Neue, Helvetica,
Arial, sans-serif;"><br clear="none">
</span></span></div>
<div dir="ltr"><span><span style="color:rgb(0, 0,
0);font-family:Helvetica Neue, Helvetica,
Arial, sans-serif;">Regards,</span></span></div>
<div dir="ltr"><span><span style="color:rgb(0, 0,
0);font-family:Helvetica Neue, Helvetica,
Arial, sans-serif;"><br clear="none">
</span></span></div>
<div dir="ltr"><span><span style="color:rgb(0, 0,
0);font-family:Helvetica Neue, Helvetica,
Arial, sans-serif;">Evaristo</span></span></div>
<div dir="ltr"><br clear="none">
</div>
<div dir="ltr"><br clear="none">
</div>
<div><br clear="none">
</div>
</div>
<div id="yiv9226560069yahoo_quoted_8891239309" class="yiv9226560069yahoo_quoted">
<div style="font-family:'Helvetica Neue',
Helvetica, Arial,
sans-serif;font-size:13px;color:#26282a;">
<div> En lunes, 13 de marzo de 2023, 10:18:01
CET, Peter Booth <a rel="nofollow noopener
noreferrer" shape="rect" ymailto="mailto:peter_booth@me.com" target="_blank" href="mailto:peter_booth@me.com" class="yiv9226560069moz-txt-link-rfc2396E" moz-do-not-send="true"><peter_booth@me.com></a>
escribió: </div>
<div><br clear="none">
</div>
<div><br clear="none">
</div>
<div>
<div id="yiv9226560069">
<div>
<div class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div class="yiv9226560069">The default
geode heartbeat timeout interval is 5
seconds, which is an eternity.</div>
<div class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div class="yiv9226560069">Some
points/questions:</div>
<div class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div class="yiv9226560069">I’d recommend
using either Solarflare’s sysjitter tool
or Gil Tene’s jhiccup to quantify your
OS jitter</div>
<div class="yiv9226560069">Can you capture
the output of vmstat 1 60 ?</div>
<div class="yiv9226560069">What kind of
EC2 instances are you using? Are they
RHEL?</div>
<div class="yiv9226560069">How much
physical RAM does each instance have?</div>
<div class="yiv9226560069">
<div class="yiv9226560069">Do you have
THP enabled?</div>
<div class="yiv9226560069">What is the
value of vm.min_free_kbytes ?</div>
</div>
<div class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
How often do you see missed heartbeats?
<div class="yiv9226560069">What length of
time do you see the Adjusting Workers
message?</div>
<div class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div id="yiv9226560069yqt69951" class="yiv9226560069yqt5015505977">
<div class="yiv9226560069"><br class="yiv9226560069" clear="none">
<div><br class="yiv9226560069" clear="none">
<blockquote type="cite" class="yiv9226560069">
<div class="yiv9226560069">On Mar
13, 2023, at 4:42 AM, Evaristo
José Camarero <<a rel="nofollow noopener
noreferrer" shape="rect" ymailto="mailto:evaristojosec@yahoo.es" target="_blank" href="mailto:evaristojosec@yahoo.es" class="yiv9226560069
yiv9226560069moz-txt-link-freetext
moz-txt-link-freetext" moz-do-not-send="true">evaristojosec@yahoo.es</a>>
wrote:</div>
<br class="yiv9226560069Apple-interchange-newline" clear="none">
<div class="yiv9226560069">
<div class="yiv9226560069">
<div style="font-family:Helvetica
Neue, Helvetica, Arial,
sans-serif;font-size:13px;" class="yiv9226560069yahoo-style-wrap">
<div dir="ltr" class="yiv9226560069">Hi
there,</div>
<div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div dir="ltr" class="yiv9226560069">We
are trying latest ZGC EAB
for testing an Apache
Geode Cluster (distributed
Key Value store with
similar use cases that
Apache Cassandra)</div>
<div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div dir="ltr" class="yiv9226560069">We
are using K8s, and we have
PODs with 32 cores request
(limit with 60 cores) per
data node with 150GB heap
per node.</div>
<div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div dir="ltr" class="yiv9226560069">
<div class="yiv9226560069">
<div class="yiv9226560069">
- -Xmx152000m</div>
<div class="yiv9226560069">
- -Xms152000m</div>
<div class="yiv9226560069">
- -XX:+UseZGC</div>
<div class="yiv9226560069">
-
-XX:SoftMaxHeapSize=136000m</div>
<div class="yiv9226560069"><span style="white-space:pre-wrap;" class="yiv9226560069"> </span> -
-XX:ZAllocationSpikeTolerance=4.0 // We have some spiky workloads
periodically </div>
<div class="yiv9226560069">
- -XX:+UseNUMA<br class="yiv9226560069" clear="none">
</div>
</div>
<br class="yiv9226560069" clear="none">
</div>
<div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div dir="ltr" class="yiv9226560069">ZGC
is working great in regard
GC pauses with no
allocation stalls at all
during almost all the
time. We observe higher
CPU utilization that G1
(Around 20% for a heavy
workload using flag <span class="yiv9226560069">-XX:ZAllocationSpikeTolerance=4.0
that maybe is making ZGC
more hungry than needed.
We will play further
with this</span>)</div>
<div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div dir="ltr" class="yiv9226560069">BUT
from time to time we see
that Geode Clusters are
missing heartbeats between
and Geode logic shutdowns
the JVM of the node with
missing heartbeats. We
believe that the main
problem could be CPU
starvation, because some
seconds before this is
happening we observe ZGC
to use more workers for
making the job done<br class="yiv9226560069" clear="none">
<br class="yiv9226560069" clear="none">
<div class="yiv9226560069"><span dir="ltr" class="yiv9226560069ydp64d0df88ui-provider
yiv9226560069ydp64d0df88h yiv9226560069ydp64d0df88r
yiv9226560069ydp64d0df88k
yiv9226560069ydp64d0df88u yiv9226560069ydp64d0df88ag
yiv9226560069ydp64d0df88d
yiv9226560069ydp64d0df88ab yiv9226560069ydp64d0df88n
yiv9226560069ydp64d0df88x
yiv9226560069ydp64d0df88g yiv9226560069ydp64d0df88q
yiv9226560069ydp64d0df88aj
yiv9226560069ydp64d0df88ae yiv9226560069ydp64d0df88j
yiv9226560069ydp64d0df88t
yiv9226560069ydp64d0df88c yiv9226560069ydp64d0df88m
yiv9226560069ydp64d0df88ah
yiv9226560069ydp64d0df88w yiv9226560069ydp64d0df88ac
yiv9226560069ydp64d0df88f
yiv9226560069ydp64d0df88p yiv9226560069ydp64d0df88z
yiv9226560069ydp64d0df88i
yiv9226560069ydp64d0df88ak yiv9226560069ydp64d0df88s
yiv9226560069ydp64d0df88ve
yiv9226560069ydp64d0df88b yiv9226560069ydp64d0df88af
yiv9226560069ydp64d0df88l
yiv9226560069ydp64d0df88v yiv9226560069ydp64d0df88e
yiv9226560069ydp64d0df88o
yiv9226560069ydp64d0df88ai yiv9226560069ydp64d0df88y">[2023-03-12T18:29:38.980+0000]
Adjusting Workers for
Young Generation: 1
-> 2<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:39.781+0000] Adjusting Workers for Young Generation: 1
-> 3<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:40.181+0000] Adjusting Workers for Young Generation: 1
-> 4<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:40.382+0000] Adjusting Workers for Young Generation: 1
-> 5<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:40.582+0000] Adjusting Workers for Young Generation: 1
-> 6<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:40.782+0000] Adjusting Workers for Young Generation: 1
-> 7<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:40.882+0000] Adjusting Workers for Young Generation: 1
-> 8<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:40.982+0000] Adjusting Workers for Young Generation: 1
-> 10<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:41.083+0000] Adjusting Workers for Young Generation: 1
-> 13<br class="yiv9226560069" clear="none">
[2023-03-12T18:29:41.183+0000] Adjusting Workers for Young Generation: 1
-> 16</span></div>
<br class="yiv9226560069" clear="none">
</div>
<div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div dir="ltr" class="yiv9226560069">As
commented we are using K8S
with PODs that have 32
cores for request and 60
cores for limit (and it
also true that our K8s
workers are close to the
limit in CPU utilization).
ZGC is assuming on booting
that machine has 60 cores
(as logged). What is the
best way to configure the
ZGC to provide a hint to
be tuned for a host with
32 cores (basically the 60
cores limit is just to
avoid K8s produced
throttling)? Is it using
ParallelGCThreads flag?
Any other thoughts?</div>
<div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div dir="ltr" class="yiv9226560069">Regards,</div>
<div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
<div dir="ltr" class="yiv9226560069">Evaristo</div>
<div dir="ltr" class="yiv9226560069"><br class="yiv9226560069" clear="none">
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br class="yiv9226560069" clear="none">
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="none">
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>