<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Thanks, Michal for providing us with the logs. John (Cuthbertson)
and I had a great discussion yesterday pertaining to your G1 log and
we think we understand what's going on with your workload and have a
experiment(s) plan for you. John is going to send in more details
(with explanation). Your workload's pattern is very interesting to
me/us and I have started a "work folder" for it, so that I can
document our experiment plan and the results and keep track of the
progress.<br>
<br>
Thanks for sharing your logs!<br>
<br>
-Monica<br>
<br>
On 2/1/2013 2:12 AM, Michal Frajt wrote:
<blockquote
cite="mid:MHJ85D$88D20B287E57CB3DA1590E5ACC6F1D19@frajt.eu"
type="cite">
<div class="xam_msg_class">
<div>
<div>Sorry, forgot to attach the archive in the last message. </div>
<div> </div>
<div><span>Od</span><span>:
<a class="moz-txt-link-abbreviated" href="mailto:hotspot-gc-dev-bounces@openjdk.java.net">hotspot-gc-dev-bounces@openjdk.java.net</a></span></div>
<div><span>Komu</span><span>: <a class="moz-txt-link-abbreviated" href="mailto:monica.beckwith@oracle.com">monica.beckwith@oracle.com</a></span></div>
<div><span>Kopie</span><span>: <a class="moz-txt-link-abbreviated" href="mailto:hotspot-gc-dev@openjdk.java.net">hotspot-gc-dev@openjdk.java.net</a></span></div>
<div><span>Datum</span><span>: Fri, 1 Feb 2013 09:05:58 +0100</span></div>
<div><span>Předmet</span><span>: Re: CMS vs G1 - Scan RS very
long</span></div>
<br>
<div class="xam_msg_class">
<div class="xam_msg_class">
<div>
<div>> Hi Monica / John,</div>
<div>> </div>
<div>> Find the new archive which correctly contains
both the CMS and G1 log files. Yesterday I
somehow managed to overwrite the CMS log file when
creating the tar archive. </div>
<div>> </div>
<div>> Node2 - running CMS</div>
<div>> mmreutersservernode2_20130131105521_gc.log<br>
> mmreutersservernode2_jinfo.txt</div>
<div>> </div>
<div>> Node3 - running G1<br>
> mmreutersservernode3_20130131105525_gc.log<br>
> mmreutersservernode3_jinfo.txt</div>
<div>> </div>
<div>> Regards,</div>
<div>> Michal </div>
<div>> </div>
<div><span>> Od</span><span>: "Monica Beckwith"
<a class="moz-txt-link-abbreviated" href="mailto:monica.beckwith@oracle.com">monica.beckwith@oracle.com</a></span></div>
<div><span>> Komu</span><span>: "John Cuthbertson"
<a class="moz-txt-link-abbreviated" href="mailto:john.cuthbertson@oracle.com">john.cuthbertson@oracle.com</a></span></div>
<div><span>> Kopie</span><span>:
<a class="moz-txt-link-abbreviated" href="mailto:hotspot-gc-dev@openjdk.java.net">hotspot-gc-dev@openjdk.java.net</a>,"Michal Frajt"
<a class="moz-txt-link-abbreviated" href="mailto:michal@frajt.eu">michal@frajt.eu</a></span></div>
<div><span>> Datum</span><span>: Thu, 31 Jan 2013
12:44:46 -0600</span></div>
<div><span>> Předmet</span><span>: Re: CMS vs G1 -
Scan RS very long</span></div>
<br>
<div class="xam_msg_class">> > Hi Michal (and
John),<br>
> > <br>
> > I did look at Michal's log files (BTW, both
are the same G1 logs). And I can confirm that scan RS
is the issue... here's the plot:<br>
> > <img alt=""
src="cid:part1.05060004.07020808@oracle.com"
_djrealurl="" height="360" width="914"><br>
> > <br>
> > The plot above only shows the max obj copy
times (for all pauses), max RS Scan times and Parallel
Time. So yes, scan RS is the culprit. <br>
> > Also, looking at your logs, it seems that
for mixedGCs the reclaimable bytes don't cross 6%. So
can you please try increasing the G1HeapWastePercent
to 10? (the latest builds will have 10 as the default
value).<br>
> > <br>
> > Please let me know if that improves your RT.<br>
> > <br>
> > -Monica<br>
> > <br>
> > <br>
> > On 1/31/2013 11:56 AM, John Cuthbertson
wrote:
<blockquote cite="mid:510AB061.6010403@oracle.com"
type="cite">Hi Michal, <br>
> > <br>
> > I haven't looked at the logs yet but from
your description it sounds like a large number of
the RSets have been coarsened and/or the fine grain
tables have gotten very dense. <br>
> > <br>
> > The RSet for a heap region tracks incoming
references to objects in that region. There are
three levels on granularity: sparse, fine, and
coarse. The sparse and fine entries are encoded in
open hash tables based upon the region containing
the references. As the number of references from
region A that point into region B increase, the
number of cards in the hash table entry for region A
increase (it's actually a bitmap with one bit per
card) and as the number of regions that contain
references that point into B increase, the number of
region entries in fine grain table increase. Once we
run out of space in the fine grain table (i.e. we
can't add another bitmap for another region) we
evict one of the densest region bitmaps and say
"coarsen" that entry. When we have a coarsened entry
we have to search the entire referencing region
looking for incoming references compared to
searching specific cards with the sparse and fine
entries. <br>
> > <br>
> > I'll take a look at your logs to see if I
can confirm. <br>
> > <br>
> > JohnC <br>
> > <br>
> > <br>
> > On 1/31/2013 6:12 AM, Michal Frajt wrote:
<br>
> >
<blockquote type="cite">Hi all, <br>
> > <br>
> > After the iCMS got officially deprecated
we decided to compare the G1 collector with our
best tuned (i)CMS setup. Unfortunately we are not
able to make the G1 young collection running any
closer to the ParNew. Actually we wanted to
compare the G1 concurrent marking STW pauses with
the CMS initial-mark and remark STW pauses but
already incredibly long running G1 young
collections are unacceptable for us. <br>
> > <br>
> > We were able to recognize that the very
long G1 young collections are caused by the
scanning remembered sets. There is not much
documentation about G1 internals but we were able
to understand that the size of the remembered sets
is related to the amount of mutating references
from old regions (cards) to young regions.
Unfortunately all our applications mutate
permanently thousands references from old objects
to young objects. <br>
> > <br>
> > We are testing with the latest OpenJDK7u
extended by the 7189971 patch and
CMSTriggerInterval implementation. The attached GC
log files represent two very equal applications
processing very similar data sets, one running the
G1, second running the CMS collector. The
OpenJDK7u has an extra output of _pending_cards
(when G1TraceConcRefinement activated) which
somehow relates to the remembered sets size. <br>
> > <br>
> > Young Comparison (both 128m, survivor
ratio 5, max tenuring 15) <br>
> > CMS - invoked every ~20 sec, avg. stop
60ms <br>
> > G1 - invoked every ~16 sec, avg. stop
410ms !!! <br>
> > <br>
> > It there anything what could help us to
reduce the Scan RS time or the G1 is simply not
targeted for applications mutating heavily old
region objects? <br>
> > <br>
> > CMS parameters <br>
> > -Xmx8884m -Xms2048m -XX:NewSize=128m
-XX:MaxNewSize=128m -XX:PermSize=128m
-XX:SurvivorRatio=5 -XX:MaxTenuringThreshold=15
-XX:CMSMarkStackSize=8M
-XX:CMSMarkStackSizeMax=32M
-XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled
-XX:CMSWaitDuration=60000
-XX:+CMSScavengeBeforeRemark
-XX:CMSTriggerInterval=600000 -XX:+UseParNewGC
-XX:ParallelGCThreads=8 -XX:ParallelCMSThreads=2 <br>
> > <br>
> > G1 parameters (mind MaxNewSize not
specified) <br>
> > -Xmx8884m -Xms2048m -XX:NewSize=128m
-XX:PermSize=128m -XX:SurvivorRatio=5
-XX:MaxTenuringThreshold=15 -XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:G1MixedGCCountTarget=16
-XX:ParallelGCThreads=8 -XX:ConcGCThreads=2 <br>
> > <br>
> > G1 log file GC young pause <br>
> > [GC pause (young) [G1Ergonomics (CSet
Construction) start choosing CSet, _pending_cards:
23697, predicted base time: 32.88 ms, remaining <br>
> > time: 167.12 ms, target pause time:
200.00 ms] <br>
> > [Parallel Time: 389.8 ms, GC Workers: 8]
<br>
> > >>>> <br>
> > [Scan RS (ms): Min: 328.8, Avg:
330.4, Max: 332.6, Diff: 3.8, Sum: 2642.9] <br>
> > <<<< <br>
> > [Eden: 119.0M(119.0M)->0.0B(118.0M)
Survivors: 9216.0K->10.0M Heap:
1801.6M(2048.0M)->1685.7M(2048.0M)] <br>
> > <br>
> > Regards, <br>
> > Michal <br>
> > <br>
> > <br>
> > </blockquote>
<br>
> > </blockquote>
<br>
> >
<div class="moz-signature">> > -- <br>
> > <a moz-do-not-send="true"
href="http://www.oracle.com/" target="_blank"
_djrealurl="http://www.oracle.com/"><img
alt="Oracle"
src="cid:part2.07060601.09070600@oracle.com"
_djrealurl="" border="0" height="26" width="114"></a><br>
> > <font color="#666666" face="Verdana,
Arial, Helvetica, sans-serif" size="2">Monica
Beckwith | Java Performance Engineer<br>
> > VOIP: <a
href="tel:+1%20512%20401%201274"
_djrealurl="tel:+1%20512%20401%201274"
moz-do-not-send="true">+1 512 401 1274</a> <br>
> > Texas </font><br>
> > <a moz-do-not-send="true"
href="http://www.oracle.com/commitment"
target="_blank"
_djrealurl="http://www.oracle.com/commitment"><img
alt="Green Oracle"
src="cid:part5.09000504.01050906@oracle.com"
_djrealurl="" align="absMiddle" border="0"
height="28" width="44"></a> <font
color="#4b7d42" face="Verdana, Arial, Helvetica,
sans-serif" size="1">Oracle is committed to
developing practices and products that help
protect the environment</font> </div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
<div class="moz-signature">-- <br>
<a href="http://www.oracle.com" target="_blank"><img
src="cid:part7.07060205.07000209@oracle.com" alt="Oracle"
border="0" height="26" width="114"></a><br>
<font color="#666666" face="Verdana, Arial, Helvetica, sans-serif"
size="2">Monica Beckwith | Java Performance Engineer<br>
VOIP: <a href="tel:+1%20512%20401%201274"
moz-do-not-send="true">+1 512 401 1274</a>
<br>
Texas </font>
<br>
<a href="http://www.oracle.com/commitment" target="_blank"><img
src="cid:part10.01080905.02000304@oracle.com" alt="Green
Oracle" align="absmiddle" border="0" height="28" width="44"></a>
<font color="#4B7D42" face="Verdana, Arial, Helvetica, sans-serif"
size="1">Oracle is committed to developing practices and
products that help protect the environment</font>
<!-- This signature was generated by the MyDesktop Oracle Business Signature utility version 4.0 -->
</div>
</body>
</html>