Number of CMS collections and the JMM GC counters

Y. S. Ramakrishna y.s.ramakrishna at oracle.com
Fri Jul 29 01:08:09 UTC 2011


I filed: 7072527 CMS: JMM GC counters overcount in some cases

On 07/28/11 17:29, Y. S. Ramakrishna wrote:
> Hi Kevin --
> 
> thanks for jumping on this! More inline below ...
> 
> On 07/28/11 09:33, Krystal Mok wrote:
>> Hi Kevin,
>>
>> Thank you for taking care of this, and it's good to see the problem is 
>> verified.
>>
>> I think whether or not the suggested fix is sufficient depends on what 
>> paths can reach CMSCollector::do_compaction_work(). If all paths that 
>> can reach CMSCollector::do_compaction_work() come from 
>> GenCollectedHeap::do_collection(), then the fix should be good to go. 
>> Otherwise it'll need a better workaround.
>>
>> I believe all concurrent mode failures/interrupts (which includes the 
>> System.gc() case) does come from GenCollectedHeap::do_collection(), 
>> but I'm not exactly sure about this, could anybody please clarify on it?
> 
> Yes, i believe this is indeed the case, and my browsing of the
> code using cscope seemed to confirm that belief.
> 
> More below ...
> 
>>
>> Regards,
>> Kris Mok
>>
>> On Thu, Jul 28, 2011 at 10:12 PM, Kevin Walls <kevin.walls at oracle.com 
>> <mailto:kevin.walls at oracle.com>> wrote:
>>
>>     __
>>     Hi --
>>
>>     6580448 was marked as a duplicate of 6581734, which fixed the fact
>>     that CMS collections were just not counted at all - with CMS, only a
>>     stop the world full gc would be counted in the stats.
>>
>>     But looks like you're right...  Here is a quick variation of the
>>     testcase from 6581734 which shows the same thing, and this verifies
>>     the same, and is solved by ExplicitGCInvokesConcurrent.  If there is
>>     no other feedback I can test if the removal of the
>>     TraceCMSMemoryManagerStats() call in
>>     CMSCollector::do_compaction_work is all we need...
> 
> 
> Kevin, yes, it would be great if you could verify this and push the fix.
> I am not sure if the push would need to wait for the signing of OCA
> from Kris, but best to check with Those Who Would Know Such Things.
> 
> Since the original CR has been closed, i'll open one momentarily and
> can make you RE (if that's OK with you). I'll be happy to serve as
> reviewer of the change.
> 
> As regards the jstat counter reporting two pauses per concurrent
> CMS cycle, I am of two minds on what the original intention
> was. I'd have originally regarded the double increment as a
> bug, but as you state it is really two pauses, even if part of
> a single cycle. And it makes sense to count them as two. I
> agree that this should be documented and left alone, given
> how long we have had this behaviour, and the alternative
> (of counting cycles, rather than pauses) may be no better
> (or arguably worse). There's actually an open CR for this
> which we can redirect into a CR to update the relevant documentation.
> 
> -- ramki
> 
>>
>>     Regards
>>     Kevin
>>
>>
>>     /*
>>      * Copyright (c) 2011, Oracle and/or its affiliates. All rights
>>     reserved.
>>      * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>>      *
>>      * This code is free software; you can redistribute it and/or 
>> modify it
>>      * under the terms of the GNU General Public License version 2 
>> only, as
>>      * published by the Free Software Foundation.
>>      *
>>      * This code is distributed in the hope that it will be useful, but
>>     WITHOUT
>>      * ANY WARRANTY; without even the implied warranty of 
>> MERCHANTABILITY or
>>      * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>> License
>>      * version 2 for more details (a copy is included in the LICENSE
>>     file that
>>      * accompanied this code).
>>      *
>>      * You should have received a copy of the GNU General Public License
>>     version
>>      * 2 along with this work; if not, write to the Free Software
>>     Foundation,
>>      * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
>>      *
>>      * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA
>>     94065 USA
>>      * or visit www.oracle.com <http://www.oracle.com> if you need
>>     additional information or have any
>>      * questions.
>>      */
>>
>>     /*
>>      * @test TestFullGCount.java
>>      * @bug
>>      * @summary
>>      * @run main/othervm -XX:+UseConcMarkSweepGC TestFullGCCount
>>      *
>>      */
>>     import java.util.*;
>>     import java.lang.management.*;
>>
>>
>>
>>     public class TestFullGCCount {
>>
>>     private String poolName = "CMS";
>>     private String collectorName = "ConcurrentMarkSweep";
>>
>>     public static void main(String [] args) {
>>
>>         TestFullGCCount t = null;
>>         if (args.length==2) {
>>             t = new TestFullGCCount(args[0], args[1]);
>>         } else {
>>             System.out.println("Defaulting to monitor CMS pool and
>>     collector.");
>>             t = new TestFullGCCount();
>>         }
>>         t.run();
>>
>>     }
>>
>>     public TestFullGCCount(String pool, String collector) {
>>         poolName = pool;
>>         collectorName = collector;
>>     }
>>     public TestFullGCCount() {
>>     }
>>
>>     public void run() {
>>
>>         int count = 0;
>>         int iterations = 20;
>>         long counts[] = new long[iterations];
>>         boolean diffAlways2 = true; // assume we will fail
>>
>>         for (int i=0; i<iterations; i++) {
>>             System.gc();
>>             counts[i] = checkStats();
>>             if (i>0) {
>>                 if (counts[i] - counts[i-1] != 2) {
>>                     diffAlways2 = false;
>>                 }
>>             }
>>         }
>>         if (diffAlways2) {
>>             throw new RuntimeException("FAILED: difference in count is
>>     always 2.");
>>         }
>>         System.out.println("Passed.");
>>     }
>>
>>     private long checkStats() {
>>         long count = 0;
>>         List<MemoryPoolMXBean> pools =
>>     ManagementFactory.getMemoryPoolMXBeans();
>>         List<GarbageCollectorMXBean> collectors =
>>     ManagementFactory.getGarbageCollectorMXBeans();
>>         for (int i=0; i<collectors.size(); i++) {
>>             GarbageCollectorMXBean collector = collectors.get(i);
>>             String name = collector.getName();
>>             if (name.contains(collectorName)) {
>>                 System.out.println(name + ": collection count = "
>>                                    + collector.getCollectionCount());
>>                 count = collector.getCollectionCount();
>>             }
>>         }
>>         return count;
>>
>>     }
>>
>>     }
>>
>>
>>     On 27/07/11 17:12, Krystal Mok wrote:
>>>     Hi all,
>>>
>>>     I've been looking at a strange inconsistency of full GC count
>>>     recorded by jvmstat and JMM counters. I'd like to know which ones
>>>     of the following behaviors are by design, which ones are bugs, and
>>>     which ones are just my misunderstanding. I apologize for making a
>>>     short story long...
>>>
>>>     =====================================================
>>>
>>>     The counters involved:
>>>
>>>     * A jvmstat counter named "sun.gc.collector.1.invocations" keeps
>>>     track of the number of pauses occured as a result of a major
>>>     collection. It is used by utilities such as jstat as the source of
>>>     "FGC" (full collection count), and the old gen collection count in
>>>     Visual GC. It's updated by an TraceCollectorStats object.
>>>     * A JMM counter, GCMemoryManager::_num_collections, keeps track of
>>>     the number of collections that have ended. This counter is used as
>>>     HotSpot's implementation of the JMX
>>>     GarbageCollectorMXBean.getCollectionCount(). It's updated by
>>>     either a TraceMemoryManagerStats object or a
>>>     TraceCMSMemoryManagerStats object.
>>>
>>>     To show the situation, I've made a screenshot of a VisualVM and a
>>>     JConsole running side by side, both are monitoring the VisualVM's
>>>     GC stats:
>>>     
>>> http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png 
>>>
>>>     (I'll upload the screenshot to somewhere else if anybody can't 
>>> see it)
>>>     The VisualVM instance is running on JDK6u26, with ParNew+CMS.
>>>     In the screenshot, Visual GC reports that the old gen collection
>>>     count is 20, while JConsole reports 10.
>>>
>>>     I see that there was this bug:
>>>     6580448: CMS: Full GC collection count mismatch between
>>>     GarbageCollectorMXBean and jvmstat (VisualGC)
>>>     I don't think the current implementation has a bug in the sense
>>>     that the two counters don't report the same number.
>>>
>>>     This behavior seems reasonable, but the naming of the value in
>>>     these tools are confusing: both tools say "collections", but
>>>     apparently the number in Visual GC means "number of pauses" where
>>>     as the number in JConsole means "number of collection cycles".
>>>     It'd be great if the difference could be documented somewhere, if
>>>     that's the intended behavior.
>>>
>>>     And then the buggy behavior. Code demo posted on gist:
>>>     https://gist.github.com/1106263
>>>     Starting from JDK6u23, when using CMS without
>>>     ExplicitGCInvokesConcurrent, System.gc() (or
>>>     Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would
>>>     make the JMM GC counter increment by 2 per invocation, while the
>>>     jvmstat counter is only incremented by 1. I believe the latter is
>>>     correct and the former needs some fixing.
>>>
>>>     =====================================================
>>>
>>>     My understanding of the behavior shown above:
>>>
>>>     1. The concurrent GC part:
>>>
>>>     There are 2 pauses in a CMS concurrent GC cycle, one in the
>>>     initial mark phase, and one in the final remark phase.
>>>     To trigger a concurrent GC cycle, the CMS thread wakes up
>>>     periodically to see if it shouldConcurrentCollect(), and trigger a
>>>     cycle when the predicate returned true, or goes back to sleep if
>>>     the predicate returned false. The whole concurrent GC cycle
>>>     doesn't go through GenCollectedHeap::do_collection().
>>>
>>>     The jvmstat counter for old gen pauses is updated in
>>>     CMSCollector::do_CMS_operation(CMS_op_type op), which exactly
>>>     covers both pause phases.
>>>
>>>     The JMM counter, however, is updated in the concurrent sweep
>>>     phase, CMSCollector::sweep(bool asynch), if there was no
>>>     concurrent mode failure; or it is updated in
>>>     CMSCollector::do_compaction_work(bool clear_all_soft_refs) in case
>>>     of a bailout due to concurrent mode failure (advertised as so in
>>>     the code comments). So that's an increment by 1 per concurrent GC
>>>     cycle, which does reflect the "number of collection cycles", fair
>>>     enough.
>>>
>>>     So far so good.
>>>
>>>     2. The System.gc() part:
>>>
>>>     Without ExplicitGCInvokesConcurrent set, System.gc() does a
>>>     stop-the-world full GC, which consists of only one pause, so
>>>     "number of pauses" would equal "number of collections" in this case.
>>>     It will go into GenCollectedHeap::do_collection(); both the
>>>     jvmstat and the JMM GC counter gets incremented by 1 here,
>>>
>>>     TraceCollectorStats tcs(_gens[i]->counters());
>>>     TraceMemoryManagerStats tmms(_gens[i]->kind());
>>>
>>>     But, drilling down into:
>>>     _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);
>>>
>>>     That'll eventually go into:
>>>     CMSCollector::acquire_control_and_collect(bool full, bool
>>>     clear_all_soft_refs)
>>>
>>>     System.gc() is user requested so that'll go further into
>>>     mark-sweep-compact:
>>>     CMSCollector::do_compaction_work(bool clear_all_soft_refs)
>>>     And here, it increments the JMM GC counter again (remember it was
>>>     in the concurrent GC path too, to handle bailouts), even though
>>>     this is still in the same collection. This leads to the "buggy
>>>     behavior" mentioned earlier.
>>>
>>>     The JMM GC counter wasn't added to CMS until this fix got in:
>>>     6581734: CMS Old Gen's collection usage is zero after GC which is
>>>     incorrect
>>>
>>>     The code added to CMSCollector::do_compaction_work() works fine in
>>>     the concurrent GC path, but interacts badly with the original
>>>     logic in GenCollectedHeap::do_collection().
>>>
>>>     =====================================================
>>>
>>>     I thought all concurrent mode failures/interrupts come from
>>>     GenCollectedHeap::do_collection(). If that's the case, then it
>>>     seems unnecessary to update the JMM GC counter in
>>>     CMSCollector::do_compaction_work(), simply removing it should fix
>>>     the problem.
>>>
>>>     With that, I'd purpose the following (XS) change: (diff against 
>>> HS20)
>>>
>>>     diff -r f0f676c5a2c6
>>>     
>>> src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp 
>>>
>>>     ---
>>>     
>>> a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp 
>>>
>>>          Tue Mar 15 19:30:16 2011 -0700
>>>     +++
>>>     
>>> b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp 
>>>
>>>          Thu Jul 28 00:02:41 2011 +0800
>>>     @@ -2022,9 +2022,6 @@
>>>                                                     
>>> _intra_sweep_estimate.padded_average());
>>>        }
>>>
>>>     -  {
>>>     -    TraceCMSMemoryManagerStats();
>>>     -  }
>>>        GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
>>>          ref_processor(), clear_all_soft_refs);
>>>        #ifdef ASSERT
>>>
>>>     The same goes for the changes in:
>>>     7036199: Adding a notification to the implementation of
>>>     GarbageCollectorMXBeans
>>>
>>>     =====================================================
>>>
>>>     P.S. Is there an "official" name for the counters that I referred
>>>     to as "jvmstat counters" above? Is it just jvmstat, or PerfData or
>>>     HSPERFDATA?
>>
>>



More information about the hotspot-gc-dev mailing list