CMS vs G1 - Scan RS very long

Sat Feb 2 09:32:15 UTC 2013

I've throttled down the concurrency of my batch task in baseline and
specimen to leave lots of idle cores -- the design center for G1.
Presumably the end-to-end overhead I am measuring is not because there
isn't available cpu, but rather because of worse cache behaviour, or remote
memory access, or inline wtrte-barrier overhead, or G1's barriers
preventing code optimization, or something of that nature. I am yet to
profile my run sufficiently to get a good read on what's contributing to
the slow-down when running with G1.

-- ramki

On Sat, Feb 2, 2013 at 1:23 AM, Kirk Pepperdine <kirk at kodewerk.com> wrote:

> Hi Ramki,
>
> There is an obvious question.. and that is, if there is oodles and oodles
> of free CPU, what can't your batch process get to it?
>
> Regards,
> Kirk
>
> On 2013-02-02, at 10:14 AM, Srinivas Ramakrishna <ysr1729 at gmail.com>
> wrote:
>
> Thanks for the interesting discussion.
>
> I've just started playing around with G1 and have found in very, very
> preliminary initial measurements with some specialized workloads, and
> without any G1 tuning other than to set the min and max heap size, that
> although G1 pauses (max and mean) and computed GC overhead are much lower
> and much more consistent than CMS (and of course ParallelOld), that the
> end-to-end batch processing latency (even with oodles of free cpu) is much
> higher (to the extent of almost 20%). I'll see if, for a workload that is a
> mix of interactive and longer running tasks, whether this translates into
> better interactivity for the shorter, more interactive tasks, or whether
> the apparent overheads of G1 translate into no gains for interactive tasks
> as well.
>
> Is there any similar experience wrt long-running batch tasks (with plenty
> of idle concurrent cpu) out there?
>
> I need to go look at the Hunt/Beckwith J1/G1 talk to see if there's any
> low-hanging fruit to harvest wrt G1, as well as try and characterize its
> performance penalty a bit more carefully.
>
> -- ramki
>
> On Fri, Feb 1, 2013 at 9:15 AM, Michal Frajt <michal at frajt.eu> wrote:
>
>> Monica,****
>>
>> ** **
>>
>> Today morning I sent an email with the correct log files. The attachment
>> size was over the limit, it is still waiting for an approval to be added to
>> the posting list.****
>>
>> ** **
>>
>> Some points****
>>
>> **-          **we compare G1 with CMS-ParNew (not iCMS, even we are
>> using iCMS everywhere else)****
>>
>> **-          **currently we evaluate the young gen processing only****
>>
>> **-          **now we set the G1RSetRegionEntries to 1024, it reduced
>> Scan RS time but it is still 3 times slower than ParNew****
>>
>> **-          **we do set NewSize to 128 for both the G1 and ParNew to
>> keep young collection invoked on same amount of garbage****
>>
>> **-          **setting G1 NewSize to 16GB and the target time to 60ms is
>> able to keep the track with the ParNew but the G1 young gen is invoked 3
>> times more frequently as adapted Eden size is around 30MB (ParNew has
>> 128MB-survivor spaces)****
>>
>> **-          **I will be able to provide you the new log files and more
>> details on Tuesday****
>>
>> ** **
>>
>> Thanks,****
>>
>> Michal****
>>
>> ** **
>>
>> *From:* Monica Beckwith [mailto:monica.beckwith at oracle.com]
>> *Sent:* Friday, February 01, 2013 4:39 PM
>> *To:* Michal Frajt
>> *Cc:* hotspot-gc-dev at openjdk.java.net; John Cuthbertson
>> *Subject:* Re: CMS vs G1 - Scan RS very long****
>>
>> ** **
>>
>> Thanks, Michal for providing us with the logs. John (Cuthbertson) and I
>> had a great discussion yesterday pertaining to your G1 log and we think we
>> understand what's going on with your workload and have a experiment(s) plan
>> for you. John is going to send in more details (with explanation). Your
>> workload's pattern is very interesting to me/us and I have started a "work
>> folder" for it, so that I can document our experiment plan and the results
>> and keep track of the progress.
>>
>> Thanks for sharing your logs!
>>
>> -Monica
>>
>> On 2/1/2013 2:12 AM, Michal Frajt wrote: ****
>>
>> Sorry, forgot to attach the archive in the last message. ****
>>
>>  ****
>>
>> Od: hotspot-gc-dev-bounces at openjdk.java.net****
>>
>> Komu: monica.beckwith at oracle.com****
>>
>> Kopie: hotspot-gc-dev at openjdk.java.net****
>>
>> Datum: Fri, 1 Feb 2013 09:05:58 +0100****
>>
>> Předmet: Re: CMS vs G1 - Scan RS very long****
>>
>> ** **
>>
>> > Hi Monica / John,****
>>
>> >  ****
>>
>> > Find the new archive which correctly contains both the CMS and G1 log
>> files. Yesterday I somehow managed to overwrite the CMS log file when
>> creating the tar archive. ****
>>
>> >   ****
>>
>> > Node2 - running CMS****
>>
>> > mmreutersservernode2_20130131105521_gc.log
>> > mmreutersservernode2_jinfo.txt****
>>
>> >  ****
>>
>> > Node3 - running G1
>> > mmreutersservernode3_20130131105525_gc.log
>> > mmreutersservernode3_jinfo.txt****
>>
>> >  ****
>>
>> > Regards,****
>>
>> > Michal ****
>>
>> >  ****
>>
>> > Od: "Monica Beckwith" monica.beckwith at oracle.com****
>>
>> > Komu: "John Cuthbertson" john.cuthbertson at oracle.com****
>>
>> > Kopie: hotspot-gc-dev at openjdk.java.net,"Michal Frajt" michal at frajt.eu**
>> **
>>
>> > Datum: Thu, 31 Jan 2013 12:44:46 -0600****
>>
>> > Předmet: Re: CMS vs G1 - Scan RS very long****
>>
>> ** **
>>
>> > > Hi Michal (and John),
>> > >
>> > > I did look at Michal's log files (BTW, both are the same G1 logs).
>> And I can confirm that scan RS is the issue... here's the plot:
>> > >
>> > >
>> > > The plot above only shows the max obj copy times (for all pauses),
>> max RS Scan times and Parallel Time. So yes, scan RS is the culprit.
>> > > Also, looking at your logs, it seems that for mixedGCs the
>> reclaimable bytes don't cross 6%. So can you please try increasing the
>> G1HeapWastePercent to 10? (the latest builds will have 10 as the default
>> value).
>> > >
>> > > Please let me know if that improves your RT.
>> > >
>> > > -Monica
>> > >
>> > >
>> > > On 1/31/2013 11:56 AM, John Cuthbertson wrote: ****
>>
>> Hi Michal,
>> > >
>> > > I haven't looked at the logs yet but from your description it sounds
>> like a large number of the RSets have been coarsened and/or the fine grain
>> tables have gotten very dense.
>> > >
>> > > The RSet for a heap region tracks incoming references to objects in
>> that region. There are three levels on granularity: sparse, fine, and
>> coarse. The sparse and fine entries are encoded in open hash tables based
>> upon the region containing the references. As the number of references from
>> region A that point into region B increase, the number of cards in the hash
>> table entry for region A increase (it's actually a bitmap with one bit per
>> card) and as the number of regions that contain references that point into
>> B increase, the number of region entries in fine grain table increase. Once
>> we run out of space in the fine grain table (i.e. we can't add another
>> bitmap for another region) we evict one of the densest region bitmaps and
>> say "coarsen" that entry. When we have a coarsened entry we have to search
>> the entire referencing region looking for incoming references compared to
>> searching specific cards with the sparse and fine entries.
>> > >
>> > > I'll take a look at your logs to see if I can confirm.
>> > >
>> > > JohnC
>> > >
>> > >
>> > > On 1/31/2013 6:12 AM, Michal Frajt wrote:
>> > > ****
>>
>> Hi all,
>> > >
>> > > After the iCMS got officially deprecated we decided to compare the G1
>> collector with our best tuned (i)CMS setup. Unfortunately we are not able
>> to make the G1 young collection running any closer to the ParNew. Actually
>> we wanted to compare the G1 concurrent marking STW pauses with the CMS
>> initial-mark and remark STW pauses but already incredibly long running G1
>> young collections are unacceptable for us.
>> > >
>> > > We were able to recognize that the very long G1 young collections are
>> caused by the scanning remembered sets. There is not much documentation
>> about G1 internals but we were able to understand that the size of the
>> remembered sets is related to the amount of mutating references from old
>> regions (cards) to young regions. Unfortunately all our applications mutate
>> permanently thousands references from old objects to young objects.
>> > >
>> > > We are testing with the latest OpenJDK7u extended by the 7189971
>> patch and CMSTriggerInterval implementation. The attached GC log files
>> represent two very equal applications processing very similar data sets,
>> one running the G1, second running the CMS collector. The OpenJDK7u has an
>> extra output of _pending_cards (when G1TraceConcRefinement activated) which
>> somehow relates to the remembered sets size.
>> > >
>> > > Young Comparison (both 128m, survivor ratio 5, max tenuring 15)
>> > > CMS - invoked every ~20 sec, avg. stop 60ms
>> > > G1 - invoked every ~16 sec, avg. stop 410ms !!!
>> > >
>> > > It there anything what could help us to reduce the Scan RS time or
>> the G1 is simply not targeted for applications mutating heavily old region
>> objects?
>> > >
>> > > CMS parameters
>> > > -Xmx8884m -Xms2048m -XX:NewSize=128m -XX:MaxNewSize=128m
>> -XX:PermSize=128m -XX:SurvivorRatio=5 -XX:MaxTenuringThreshold=15
>> -XX:CMSMarkStackSize=8M -XX:CMSMarkStackSizeMax=32M -XX:+UseConcMarkSweepGC
>> -XX:+CMSClassUnloadingEnabled -XX:CMSWaitDuration=60000
>> -XX:+CMSScavengeBeforeRemark -XX:CMSTriggerInterval=600000 -XX:+UseParNewGC
>> -XX:ParallelGCThreads=8 -XX:ParallelCMSThreads=2
>> > >
>> > > G1 parameters (mind MaxNewSize not specified)
>> > > -Xmx8884m -Xms2048m -XX:NewSize=128m -XX:PermSize=128m
>> -XX:SurvivorRatio=5 -XX:MaxTenuringThreshold=15 -XX:+UseG1GC
>> -XX:MaxGCPauseMillis=200 -XX:G1MixedGCCountTarget=16
>> -XX:ParallelGCThreads=8 -XX:ConcGCThreads=2
>> > >
>> > > G1 log file GC young pause
>> > > [GC pause (young) [G1Ergonomics (CSet Construction) start choosing
>> CSet, _pending_cards: 23697, predicted base time: 32.88 ms, remaining
>> > > time: 167.12 ms, target pause time: 200.00 ms]
>> > > [Parallel Time: 389.8 ms, GC Workers: 8]
>> > >      >>>>
>> > >      [Scan RS (ms): Min: 328.8, Avg: 330.4, Max: 332.6, Diff: 3.8,
>> Sum: 2642.9]
>> > >      <<<<
>> > > [Eden: 119.0M(119.0M)->0.0B(118.0M) Survivors: 9216.0K->10.0M Heap:
>> 1801.6M(2048.0M)->1685.7M(2048.0M)]
>> > >
>> > > Regards,
>> > > Michal
>> > >
>> > >
>> > > ****
>>
>>
>> > > ****
>>
>>
>> > > ****
>>
>> > > --
>> > > [image: Oracle] <http://www.oracle.com/>
>> > > Monica Beckwith | Java Performance Engineer
>> > > VOIP: +1 512 401 1274 <+1%20512%20401%201274>
>> > > Texas
>> > > [image: Green Oracle] <http://www.oracle.com/commitment>Oracle is
>> committed to developing practices and products that help protect the
>> environment ****
>>
>> ** **
>>
>> --
>> <image001.gif> <http://www.oracle.com/>
>>
>> Monica Beckwith | Java Performance Engineer
>> VOIP: +1 512 401 1274 <+1%20512%20401%201274>
>> Texas
>> <image002.gif> <http://www.oracle.com/commitment>Oracle is committed to
>> developing practices and products that help protect the environment ****
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130202/9988b47a/attachment.htm>