CMS vs G1 - Scan RS very long

Fri Feb 1 15:38:46 UTC 2013

Thanks, Michal for providing us with the logs. John (Cuthbertson) and I 
had a great discussion yesterday pertaining to your G1 log and we think 
we understand what's going on with your workload and have a 
experiment(s) plan for you. John is going to send in more details (with 
explanation). Your workload's pattern is very interesting to me/us and I 
have started a "work folder" for it, so that I can document our 
experiment plan and the results and keep track of the progress.

Thanks for sharing your logs!

-Monica

On 2/1/2013 2:12 AM, Michal Frajt wrote:
> Sorry, forgot to attach the archive in the last message.
> Od: hotspot-gc-dev-bounces at openjdk.java.net
> Komu: monica.beckwith at oracle.com
> Kopie: hotspot-gc-dev at openjdk.java.net
> Datum: Fri, 1 Feb 2013 09:05:58 +0100
> Předmet: Re: CMS vs G1 - Scan RS very long
>
> > Hi Monica / John,
> >
> > Find the new archive which correctly contains both the CMS and G1 
> log files. Yesterday I somehow managed to overwrite the CMS log 
> file when creating the tar archive.
> >
> > Node2 - running CMS
> > mmreutersservernode2_20130131105521_gc.log
> > mmreutersservernode2_jinfo.txt
> >
> > Node3 - running G1
> > mmreutersservernode3_20130131105525_gc.log
> > mmreutersservernode3_jinfo.txt
> >
> > Regards,
> > Michal
> >
> > Od: "Monica Beckwith" monica.beckwith at oracle.com
> > Komu: "John Cuthbertson" john.cuthbertson at oracle.com
> > Kopie: hotspot-gc-dev at openjdk.java.net,"Michal Frajt" michal at frajt.eu
> > Datum: Thu, 31 Jan 2013 12:44:46 -0600
> > Předmet: Re: CMS vs G1 - Scan RS very long
>
> > > Hi Michal (and John),
> > >
> > > I did look at Michal's log files (BTW, both are the same G1 logs). 
> And I can confirm that scan RS is the issue... here's the plot:
> > >
> > >
> > > The plot above only shows the max obj copy times (for all pauses), 
> max RS Scan times and Parallel Time. So yes, scan RS is the culprit.
> > > Also, looking at your logs, it seems that for mixedGCs the 
> reclaimable bytes don't cross 6%. So can you please try increasing the 
> G1HeapWastePercent to 10? (the latest builds will have 10 as the 
> default value).
> > >
> > > Please let me know if that improves your RT.
> > >
> > > -Monica
> > >
> > >
> > > On 1/31/2013 11:56 AM, John Cuthbertson wrote:
>> Hi Michal,
>> > >
>> > > I haven't looked at the logs yet but from your description it 
>> sounds like a large number of the RSets have been coarsened and/or 
>> the fine grain tables have gotten very dense.
>> > >
>> > > The RSet for a heap region tracks incoming references to objects 
>> in that region. There are three levels on granularity: sparse, fine, 
>> and coarse. The sparse and fine entries are encoded in open hash 
>> tables based upon the region containing the references. As the number 
>> of references from region A that point into region B increase, the 
>> number of cards in the hash table entry for region A increase (it's 
>> actually a bitmap with one bit per card) and as the number of regions 
>> that contain references that point into B increase, the number of 
>> region entries in fine grain table increase. Once we run out of space 
>> in the fine grain table (i.e. we can't add another bitmap for another 
>> region) we evict one of the densest region bitmaps and say "coarsen" 
>> that entry. When we have a coarsened entry we have to search the 
>> entire referencing region looking for incoming references compared to 
>> searching specific cards with the sparse and fine entries.
>> > >
>> > > I'll take a look at your logs to see if I can confirm.
>> > >
>> > > JohnC
>> > >
>> > >
>> > > On 1/31/2013 6:12 AM, Michal Frajt wrote:
>> > >
>>> Hi all,
>>> > >
>>> > > After the iCMS got officially deprecated we decided to compare 
>>> the G1 collector with our best tuned (i)CMS setup. Unfortunately we 
>>> are not able to make the G1 young collection running any closer to 
>>> the ParNew. Actually we wanted to compare the G1 concurrent marking 
>>> STW pauses with the CMS initial-mark and remark STW pauses but 
>>> already incredibly long running G1 young collections are 
>>> unacceptable for us.
>>> > >
>>> > > We were able to recognize that the very long G1 young 
>>> collections are caused by the scanning remembered sets. There is not 
>>> much documentation about G1 internals but we were able to understand 
>>> that the size of the remembered sets is related to the amount of 
>>> mutating references from old regions (cards) to young regions. 
>>> Unfortunately all our applications mutate permanently thousands 
>>> references from old objects to young objects.
>>> > >
>>> > > We are testing with the latest OpenJDK7u extended by the 7189971 
>>> patch and CMSTriggerInterval implementation. The attached GC log 
>>> files represent two very equal applications processing very similar 
>>> data sets, one running the G1, second running the CMS collector. The 
>>> OpenJDK7u has an extra output of _pending_cards (when 
>>> G1TraceConcRefinement activated) which somehow relates to the 
>>> remembered sets size.
>>> > >
>>> > > Young Comparison (both 128m, survivor ratio 5, max tenuring 15)
>>> > > CMS - invoked every ~20 sec, avg. stop 60ms
>>> > > G1 - invoked every ~16 sec, avg. stop 410ms !!!
>>> > >
>>> > > It there anything what could help us to reduce the Scan RS time 
>>> or the G1 is simply not targeted for applications mutating heavily 
>>> old region objects?
>>> > >
>>> > > CMS parameters
>>> > > -Xmx8884m -Xms2048m -XX:NewSize=128m -XX:MaxNewSize=128m 
>>> -XX:PermSize=128m -XX:SurvivorRatio=5 -XX:MaxTenuringThreshold=15 
>>> -XX:CMSMarkStackSize=8M -XX:CMSMarkStackSizeMax=32M 
>>> -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled 
>>> -XX:CMSWaitDuration=60000 -XX:+CMSScavengeBeforeRemark 
>>> -XX:CMSTriggerInterval=600000 -XX:+UseParNewGC 
>>> -XX:ParallelGCThreads=8 -XX:ParallelCMSThreads=2
>>> > >
>>> > > G1 parameters (mind MaxNewSize not specified)
>>> > > -Xmx8884m -Xms2048m -XX:NewSize=128m -XX:PermSize=128m 
>>> -XX:SurvivorRatio=5 -XX:MaxTenuringThreshold=15 -XX:+UseG1GC 
>>> -XX:MaxGCPauseMillis=200 -XX:G1MixedGCCountTarget=16 
>>> -XX:ParallelGCThreads=8 -XX:ConcGCThreads=2
>>> > >
>>> > > G1 log file GC young pause
>>> > > [GC pause (young) [G1Ergonomics (CSet Construction) start 
>>> choosing CSet, _pending_cards: 23697, predicted base time: 32.88 ms, 
>>> remaining
>>> > > time: 167.12 ms, target pause time: 200.00 ms]
>>> > > [Parallel Time: 389.8 ms, GC Workers: 8]
>>> > > >>>>
>>> > >      [Scan RS (ms): Min: 328.8, Avg: 330.4, Max: 332.6, Diff: 
>>> 3.8, Sum: 2642.9]
>>> > > <<<<
>>> > > [Eden: 119.0M(119.0M)->0.0B(118.0M) Survivors: 9216.0K->10.0M 
>>> Heap: 1801.6M(2048.0M)->1685.7M(2048.0M)]
>>> > >
>>> > > Regards,
>>> > > Michal
>>> > >
>>> > >
>>> > > 
>>
>> > > 
>
> > >
> > > --
> > > Oracle <http://www.oracle.com/>
> > > Monica Beckwith | Java Performance Engineer
> > > VOIP: +1 512 401 1274 <tel:+1%20512%20401%201274>
> > > Texas
> > > Green Oracle <http://www.oracle.com/commitment> Oracle is 
> committed to developing practices and products that help protect the 
> environment

-- 
Oracle <http://www.oracle.com>
Monica Beckwith | Java Performance Engineer
VOIP: +1 512 401 1274 <tel:+1%20512%20401%201274>
Texas
Green Oracle <http://www.oracle.com/commitment> Oracle is committed to 
developing practices and products that help protect the environment
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130201/6c99c44d/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 27 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130201/6c99c44d/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 27 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130201/6c99c44d/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 27 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130201/6c99c44d/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oracle_sig_logo.gif
Type: image/gif
Size: 658 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130201/6c99c44d/oracle_sig_logo.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: green-for-email-sig_0.gif
Type: image/gif
Size: 356 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130201/6c99c44d/green-for-email-sig_0.gif>