<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Kirk,<div class=""><br class=""></div><div class="">A comment from me embedded below.</div><div class=""><br class=""></div><div class="">charlie</div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On May 1, 2017, at 3:42 PM, Kirk Pepperdine <<a href="mailto:kirk@kodewerk.com" class="">kirk@kodewerk.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">So, never mind all this.. I scrubbed all of the noise out of the log and what I see here is a Heap After GC chart that shows heap completely filled. Unless the Full GC cleaned things up, this VM isn’t going any further. Heap fills very quickly.<div class=""><br class=""></div><div class=""><br class=""></div><div class=""><div class=""><blockquote type="cite" class=""><br class=""></blockquote></div><div class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF" class="">
<br class="">
Remember set is the data structure that keep track of pointers from
other regions to the region we are operating on. If there are
pointers from old regions to the objects in the young regions, we
can not collect that object.<br class="">
After a gc pause, the live objects are evacuated to either the
survivor or old gen regions. Then we need to update the Remember
set. That is what I was referring to.<br class="">
It is part of object copy.<br class="">
<br class="">
Like Kirk mentioned, the parallelism of the gc threads seems fine.
Charlie.Hunt suggested maybe the object graph is very deep so that
the gc threads can not steal work and end up spinning.<br class=""></div></blockquote></div></div></div></div></blockquote><div class=""><br class=""></div>I’d like to hear more from Charlie on this point because if the parallelism is ok then I would conclude that the work load is well balanced implying that there shouldn’t be any work stealing.. am prepared to be wrong on this but…. and, that said the number of attempts to shutdown suggest that there was something going on at the tail end of the collection and maybe work stealing was far to granular???? No idea….</div></div></div></div></blockquote><div><br class=""></div>I haven’t had a chance to look at the logs in detail. That said, generally if we see high termination times it suggests that one or more of the GC threads are in a termination protocol where it is waiting for one or more other GC threads to finish their work. Hence my thought that there may be some part of the object graph that is really deep that is keeping one GC thread very occupied, i.e. much longer than the other GC threads.</div><div><br class=""></div><div>Btw, thanks for jumping in and offering a detailed analysis … a full heap followed by a Full GC is not good, especially if it didn’t make much space available. :-|</div><div><br class=""></div><div>Ooh, before I forget, did I see Nezih mention that THP was enabled? If that’s the case, I would disable it.</div><div><br class=""></div><div>thanks,</div><div><br class=""></div><div>charlie</div><div><br class=""></div><div><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><div class=""><br class=""></div><div class="">Kind regards,</div><div class="">Kirk</div><div class=""><br class=""></div><div class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF" class="">
<br class="">
Some things we can try:<br class="">
-XX:MaxGCPauseMillis=<200> Maybe increase it to 400<br class="">
Maybe try to reduce the -XX:ParallelGCThreads<br class="">
<br class="">
<pre style="box-sizing:border-box;font-family:sfmono-regular,consolas,"liberation mono",menlo,courier,monospace;font-size:13.6px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:normal;font-stretch:normal;line-height:1.45;word-wrap:normal;padding:16px;overflow:auto;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;margin-top:0px;margin-bottom:0px" class=""><code style="box-sizing:border-box;font-family:sfmono-regular,consolas,"liberation mono",menlo,courier,monospace;font-size:13.6px;padding:0px;margin:0px;background:transparent;border-radius:3px;word-break:normal;white-space:pre-wrap;border:0px;display:inline;overflow:visible;line-height:inherit;word-wrap:normal" class="">-XX:+G1SummarizeRSetStats
-XX:<wbr class="">G1SummarizeRSetStatsPeriod=10</code></pre>
<br class="">
It is very expensive to print those statistics, for now we know
there is a lot of coarsening. We can make
G1SumarizeRSetStatsPeriod=100<br class="">
<br class="">
The RSet footprint is so big, increasing <span style="font-family:georgia;font-size:18px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline" class="">-XX:G1RSetRegionEntries
will get rid of coarsening but make the memory footprint bigger.
We can delay this for now.<br class="">
<br class="">
What is your hardware/software configuration? cpu/memory/cores?
There is no swapping, right?<br class="">
<br class="">
Thanks<span class="gmail-HOEnZb"><font color="#888888" class=""><br class="">
Jenny<br class="">
<br class="">
</font></span></span><div class=""><div class="gmail-h5">
<div class="gmail-m_2983664130649143190moz-cite-prefix">On 04/27/2017 08:52 PM, nezih yigitbasi
wrote:<br class="">
</div>
<blockquote type="cite" class="">
<div dir="ltr" class="">My first question is incorrect actually, so I am
giving a better example to rephrase my question. At time
"2017-04-26T03:12:24.259-0700" there is a young GC that
took 35.47 s, where object copy took 28983.4 ms. In that event I
see the following log:
<div class=""><br class="">
</div>
<div class="">
<div class=""> [Eden: 7168.0M(7168.0M)->0.0B(7168.<wbr class="">0M) Survivors:
1024.0M->1024.0M Heap: 153.4G(160.0G)->149.4G(160.0G)<wbr class="">]</div>
<div class=""><br class="">
</div>
<div class="">My interpretation for this is, ~4GB of garbage was
collected from heap in total and we see that eden usage goes
down by ~7GB, this means ~3GB of the eden was live objects.
Is this interpretation correct? If it is, how come copying
over 3GB takes ~29s? In your answer you said "most of the
object copy time is dealing with the Remember Set", can you
please give some details about what kind of operations on
rsets are done during the object copy phase, and can we see
that from these logs?</div>
<div class=""><br class="">
</div>
<div class="">Thanks again,</div>
<div class="">Nezih</div>
<div class=""><br class="">
</div>
</div>
</div>
<div class="gmail_extra"><br class="">
<div class="gmail_quote">2017-04-27 17:20 GMT-07:00 nezih
yigitbasi <span dir="ltr" class=""><<a href="mailto:nezihyigitbasi@gmail.com" target="_blank" class="">nezihyigitbasi@gmail.com</a>></span>:<br class="">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr" class="">Thanks for the suggestions. We use the
default pause time. And here is our entire set of JVM
args: <a href="https://gist.github.com/nezihyigitbasi/04f5fdb9c32ac56097011819e20602d8" target="_blank" class="">https://gist.github.com/<wbr class="">nezihyigitbasi/04f5fdb9c32ac56<wbr class="">097011819e20602d8</a>
<div class=""><br class="">
</div>
<div class="">I have some followup questions:<br class="">
<div class="">
<div class=""><br class="">
</div>
<div class="">- In some case the object copy took 39406.8 ms,
even if the remembered set is ~30GB isn't this too
slow (that's <1GB/s of data)?</div>
<div class="">- Is there any way to reduce the rset overhead?</div>
<div class="">- My initial thought when I saw the high object
copy times was there may be some sort of contention
to have such a low throughput during the copy.
Although it may not be the case here, I just wonder
whether there is a way to see the amount of
contention from the gc logs?</div>
<span class="gmail-m_2983664130649143190HOEnZb"><font color="#888888" class="">
<div class=""><br class="">
</div>
<div class="">Nezih</div>
</font></span>
<div class="">
<div class="gmail-m_2983664130649143190h5">
<div class=""><br class="">
</div>
<div class="">
<div class="">
<div class="gmail_extra"><br class="">
<div class="gmail_quote">2017-04-27 16:58
GMT-07:00 Jenny Zhang <span dir="ltr" class=""><<a href="mailto:yu.zhang@oracle.com" target="_blank" class="">yu.zhang@oracle.com</a>></span>:<br class="">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,
Hezih,<br class="">
<br class="">
It seems this workload is very heavy on
Remember Set. It has about 31G native
memory for RSet (old gen) and still with
coarsening.<br class="">
<br class="">
What is you pause time goal? The default
(200ms) might be too small for you. Can
you increase that so G1 can increase the
young gen size? Since there is not much
promotion, I guess most of the object
copy time is dealing with the Remember
Set.<br class="">
<br class="">
There are other things you can try, like
increase the G1RSetReginEntries, but the
memory footprint will be bigger.<br class="">
<br class="">
So if you can, I suggest increase the
pause time goal first.<br class="">
<br class="">
Thanks<span class="gmail-m_2983664130649143190m_8770557251507851110gmail-HOEnZb"><font color="#888888" class=""><br class="">
<br class="">
Jenny</font></span>
<div class="gmail-m_2983664130649143190m_8770557251507851110gmail-HOEnZb">
<div class="gmail-m_2983664130649143190m_8770557251507851110gmail-h5"><br class="">
<br class="">
<br class="">
<br class="">
On 4/27/2017 9:22 AM, nezih
yigitbasi wrote:<br class="">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Hi,<br class="">
We see huge object copy times (and
relatively high termination times)
during young GCs in our production
system running on Java
1.8.0_112-b15. You can find the GC
logs here: <a href="https://gist.github.com/nezihyigitbasi/1f7a92da7860908a611cb1197bd8626b" rel="noreferrer" target="_blank" class="">https://gist.github.com/nezihy<wbr class="">igitbasi/1f7a92da7860908a611cb<wbr class="">1197bd8626b</a><br class="">
<br class="">
The young GC times start going
high after the timestamp
"2017-04-26T03:07:22.164-0700"<wbr class="">.<br class="">
<br class="">
I will appreciate if you can give
some details about:<br class="">
- what goes into the "Object Copy"
phase during young GCs and how we
can reduce it.<br class="">
- why we see high Termination
times and what we can do about it<br class="">
<br class="">
Thanks,<br class="">
Nezih<br class="">
</blockquote>
<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</blockquote>
<br class="">
</div></div></div>
</blockquote></div><br class=""></div></div>
</div></blockquote></div><br class=""></div></div></div></blockquote></div><br class=""></div></body></html>