CMS vs G1 - Scan RS very long

Sat Feb 2 09:47:10 UTC 2013

Hi Ramki,

I'm not sure what type of profiling your looking at nor do I have a clue about your workload. That said, what I've been seeing is that taking steps to reducing safe-pointing seems to have a impact on improving hardware saturation.

Regards,
Kirk

On 2013-02-02, at 10:32 AM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:

> I've throttled down the concurrency of my batch task in baseline and specimen to leave lots of idle cores -- the design center for G1.
> Presumably the end-to-end overhead I am measuring is not because there isn't available cpu, but rather because of worse cache behaviour, or remote memory access, or inline wtrte-barrier overhead, or G1's barriers preventing code optimization, or something of that nature. I am yet to profile my run sufficiently to get a good read on what's contributing to the slow-down when running with G1.
> 
> -- ramki
> 
> On Sat, Feb 2, 2013 at 1:23 AM, Kirk Pepperdine <kirk at kodewerk.com> wrote:
> Hi Ramki,
> 
> There is an obvious question.. and that is, if there is oodles and oodles of free CPU, what can't your batch process get to it?
> 
> Regards,
> Kirk
> 
> On 2013-02-02, at 10:14 AM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
> 
>> Thanks for the interesting discussion.
>> 
>> I've just started playing around with G1 and have found in very, very preliminary initial measurements with some specialized workloads, and without any G1 tuning other than to set the min and max heap size, that although G1 pauses (max and mean) and computed GC overhead are much lower and much more consistent than CMS (and of course ParallelOld), that the end-to-end batch processing latency (even with oodles of free cpu) is much higher (to the extent of almost 20%). I'll see if, for a workload that is a mix of interactive and longer running tasks, whether this translates into better interactivity for the shorter, more interactive tasks, or whether the apparent overheads of G1 translate into no gains for interactive tasks as well.
>> 
>> Is there any similar experience wrt long-running batch tasks (with plenty of idle concurrent cpu) out there?
>> 
>> I need to go look at the Hunt/Beckwith J1/G1 talk to see if there's any low-hanging fruit to harvest wrt G1, as well as try and characterize its performance penalty a bit more carefully.
>> 
>> -- ramki
>> 
>> On Fri, Feb 1, 2013 at 9:15 AM, Michal Frajt <michal at frajt.eu> wrote:
>> Monica,
>> 
>>  
>> 
>> Today morning I sent an email with the correct log files. The attachment size was over the limit, it is still waiting for an approval to be added to the posting list.
>> 
>>  
>> 
>> Some points
>> 
>> -          we compare G1 with CMS-ParNew (not iCMS, even we are using iCMS everywhere else)
>> 
>> -          currently we evaluate the young gen processing only
>> 
>> -          now we set the G1RSetRegionEntries to 1024, it reduced Scan RS time but it is still 3 times slower than ParNew
>> 
>> -          we do set NewSize to 128 for both the G1 and ParNew to keep young collection invoked on same amount of garbage
>> 
>> -          setting G1 NewSize to 16GB and the target time to 60ms is able to keep the track with the ParNew but the G1 young gen is invoked 3 times more frequently as adapted Eden size is around 30MB (ParNew has 128MB-survivor spaces)
>> 
>> -          I will be able to provide you the new log files and more details on Tuesday
>> 
>>  
>> 
>> Thanks,
>> 
>> Michal
>> 
>>  
>> 
>> From: Monica Beckwith [mailto:monica.beckwith at oracle.com] 
>> Sent: Friday, February 01, 2013 4:39 PM
>> To: Michal Frajt
>> Cc: hotspot-gc-dev at openjdk.java.net; John Cuthbertson
>> Subject: Re: CMS vs G1 - Scan RS very long
>> 
>>  
>> 
>> Thanks, Michal for providing us with the logs. John (Cuthbertson) and I had a great discussion yesterday pertaining to your G1 log and we think we understand what's going on with your workload and have a experiment(s) plan for you. John is going to send in more details (with explanation). Your workload's pattern is very interesting to me/us and I have started a "work folder" for it, so that I can document our experiment plan and the results and keep track of the progress.
>> 
>> Thanks for sharing your logs!
>> 
>> -Monica
>> 
>> On 2/1/2013 2:12 AM, Michal Frajt wrote:
>> 
>> Sorry, forgot to attach the archive in the last message. 
>> 
>>  
>> 
>> Od: hotspot-gc-dev-bounces at openjdk.java.net
>> 
>> Komu: monica.beckwith at oracle.com
>> 
>> Kopie: hotspot-gc-dev at openjdk.java.net
>> 
>> Datum: Fri, 1 Feb 2013 09:05:58 +0100
>> 
>> Předmet: Re: CMS vs G1 - Scan RS very long
>> 
>>  
>> 
>> > Hi Monica / John,
>> 
>> >  
>> 
>> > Find the new archive which correctly contains both the CMS and G1 log files. Yesterday I somehow managed to overwrite the CMS log file when creating the tar archive. 
>> 
>> >   
>> 
>> > Node2 - running CMS
>> 
>> > mmreutersservernode2_20130131105521_gc.log
>> > mmreutersservernode2_jinfo.txt
>> 
>> >  
>> 
>> > Node3 - running G1
>> > mmreutersservernode3_20130131105525_gc.log
>> > mmreutersservernode3_jinfo.txt
>> 
>> >  
>> 
>> > Regards,
>> 
>> > Michal 
>> 
>> >  
>> 
>> > Od: "Monica Beckwith" monica.beckwith at oracle.com
>> 
>> > Komu: "John Cuthbertson" john.cuthbertson at oracle.com
>> 
>> > Kopie: hotspot-gc-dev at openjdk.java.net,"Michal Frajt" michal at frajt.eu
>> 
>> > Datum: Thu, 31 Jan 2013 12:44:46 -0600
>> 
>> > Předmet: Re: CMS vs G1 - Scan RS very long
>> 
>>  
>> 
>> > > Hi Michal (and John),
>> > > 
>> > > I did look at Michal's log files (BTW, both are the same G1 logs). And I can confirm that scan RS is the issue... here's the plot:
>> > > 
>> > > 
>> > > The plot above only shows the max obj copy times (for all pauses), max RS Scan times and Parallel Time. So yes, scan RS is the culprit. 
>> > > Also, looking at your logs, it seems that for mixedGCs the reclaimable bytes don't cross 6%. So can you please try increasing the G1HeapWastePercent to 10? (the latest builds will have 10 as the default value).
>> > > 
>> > > Please let me know if that improves your RT.
>> > > 
>> > > -Monica
>> > > 
>> > > 
>> > > On 1/31/2013 11:56 AM, John Cuthbertson wrote:
>> 
>> Hi Michal, 
>> > > 
>> > > I haven't looked at the logs yet but from your description it sounds like a large number of the RSets have been coarsened and/or the fine grain tables have gotten very dense. 
>> > > 
>> > > The RSet for a heap region tracks incoming references to objects in that region. There are three levels on granularity: sparse, fine, and coarse. The sparse and fine entries are encoded in open hash tables based upon the region containing the references. As the number of references from region A that point into region B increase, the number of cards in the hash table entry for region A increase (it's actually a bitmap with one bit per card) and as the number of regions that contain references that point into B increase, the number of region entries in fine grain table increase. Once we run out of space in the fine grain table (i.e. we can't add another bitmap for another region) we evict one of the densest region bitmaps and say "coarsen" that entry. When we have a coarsened entry we have to search the entire referencing region looking for incoming references compared to searching specific cards with the sparse and fine entries. 
>> > > 
>> > > I'll take a look at your logs to see if I can confirm. 
>> > > 
>> > > JohnC 
>> > > 
>> > > 
>> > > On 1/31/2013 6:12 AM, Michal Frajt wrote: 
>> > >
>> 
>> Hi all, 
>> > > 
>> > > After the iCMS got officially deprecated we decided to compare the G1 collector with our best tuned (i)CMS setup. Unfortunately we are not able to make the G1 young collection running any closer to the ParNew. Actually we wanted to compare the G1 concurrent marking STW pauses with the CMS initial-mark and remark STW pauses but already incredibly long running G1 young collections are unacceptable for us. 
>> > > 
>> > > We were able to recognize that the very long G1 young collections are caused by the scanning remembered sets. There is not much documentation about G1 internals but we were able to understand that the size of the remembered sets is related to the amount of mutating references from old regions (cards) to young regions. Unfortunately all our applications mutate permanently thousands references from old objects to young objects. 
>> > > 
>> > > We are testing with the latest OpenJDK7u extended by the 7189971 patch and CMSTriggerInterval implementation. The attached GC log files represent two very equal applications processing very similar data sets, one running the G1, second running the CMS collector. The OpenJDK7u has an extra output of _pending_cards (when G1TraceConcRefinement activated) which somehow relates to the remembered sets size. 
>> > > 
>> > > Young Comparison (both 128m, survivor ratio 5, max tenuring 15) 
>> > > CMS - invoked every ~20 sec, avg. stop 60ms 
>> > > G1 - invoked every ~16 sec, avg. stop 410ms !!! 
>> > > 
>> > > It there anything what could help us to reduce the Scan RS time or the G1 is simply not targeted for applications mutating heavily old region objects? 
>> > > 
>> > > CMS parameters 
>> > > -Xmx8884m -Xms2048m -XX:NewSize=128m -XX:MaxNewSize=128m -XX:PermSize=128m -XX:SurvivorRatio=5 -XX:MaxTenuringThreshold=15 -XX:CMSMarkStackSize=8M -XX:CMSMarkStackSizeMax=32M -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:CMSWaitDuration=60000 -XX:+CMSScavengeBeforeRemark -XX:CMSTriggerInterval=600000 -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:ParallelCMSThreads=2 
>> > > 
>> > > G1 parameters (mind MaxNewSize not specified) 
>> > > -Xmx8884m -Xms2048m -XX:NewSize=128m -XX:PermSize=128m -XX:SurvivorRatio=5 -XX:MaxTenuringThreshold=15 -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:G1MixedGCCountTarget=16 -XX:ParallelGCThreads=8 -XX:ConcGCThreads=2 
>> > > 
>> > > G1 log file GC young pause 
>> > > [GC pause (young) [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 23697, predicted base time: 32.88 ms, remaining 
>> > > time: 167.12 ms, target pause time: 200.00 ms] 
>> > > [Parallel Time: 389.8 ms, GC Workers: 8] 
>> > >      >>>> 
>> > >      [Scan RS (ms): Min: 328.8, Avg: 330.4, Max: 332.6, Diff: 3.8, Sum: 2642.9] 
>> > >      <<<< 
>> > > [Eden: 119.0M(119.0M)->0.0B(118.0M) Survivors: 9216.0K->10.0M Heap: 1801.6M(2048.0M)->1685.7M(2048.0M)] 
>> > > 
>> > > Regards, 
>> > > Michal 
>> > > 
>> > > 
>> > >
>> 
>> 
>> > >
>> 
>> 
>> > >
>> 
>> > > -- 
>> > > 
>> > > Monica Beckwith | Java Performance Engineer
>> > > VOIP: +1 512 401 1274 
>> > > Texas 
>> > > Oracle is committed to developing practices and products that help protect the environment
>> 
>>  
>> 
>> -- 
>> <image001.gif>
>> 
>> 
>> Monica Beckwith | Java Performance Engineer
>> VOIP: +1 512 401 1274 
>> Texas 
>> <image002.gif>Oracle is committed to developing practices and products that help protect the environment
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130202/73965f99/attachment.htm>