Review request: 7072527 CMS: JMM GC counters overcount in some cases

Kevin Walls kevin.walls at oracle.com
Wed Aug 3 04:56:38 PDT 2011


Thanks Ramki, and thanks for chasing up and clarifying the OCA point.

Kevin



On 01/08/11 19:35, Ramki Ramakrishna wrote:
> Looks good to me; ship it!
>
> -- ramki
>
> On 7/29/2011 4:13 PM, Kevin Walls wrote:
>>
>> Actually I missed the other comment that we could go further and 
>> remove the unneeded constructor.
>>
>> http://cr.openjdk.java.net/~kevinw/7072527/webrev.01/
>>
>> Thanks
>> Kevin
>>
>> On 29/07/11 17:35, Kevin Walls wrote:
>>> Hi --
>>>
>>> I got this ready in a webrev:
>>>
>>> http://cr.openjdk.java.net/~kevinw/7072527/webrev.00/
>>>
>>> The new test is passing, as are the old ones. 8-)
>>>
>>> Also, this change means that a CMS cycle ending in concurrent mode 
>>> failure now counts as one collection.  One could argue that either 
>>> way (should it be 2 collections?) - but I'm thinking if we are 
>>> counting "completed" collections then we are now counting correctly!
>>>
>>> Thanks
>>> Kevin
>>>
>>>
>>> On 29/07/11 02:08, Y. S. Ramakrishna wrote:
>>>> I filed: 7072527 CMS: JMM GC counters overcount in some cases
>>>>
>>>> On 07/28/11 17:29, Y. S. Ramakrishna wrote:
>>>>> Hi Kevin --
>>>>>
>>>>> thanks for jumping on this! More inline below ...
>>>>>
>>>>> On 07/28/11 09:33, Krystal Mok wrote:
>>>>>> Hi Kevin,
>>>>>>
>>>>>> Thank you for taking care of this, and it's good to see the 
>>>>>> problem is verified.
>>>>>>
>>>>>> I think whether or not the suggested fix is sufficient depends on 
>>>>>> what paths can reach CMSCollector::do_compaction_work(). If all 
>>>>>> paths that can reach CMSCollector::do_compaction_work() come from 
>>>>>> GenCollectedHeap::do_collection(), then the fix should be good to 
>>>>>> go. Otherwise it'll need a better workaround.
>>>>>>
>>>>>> I believe all concurrent mode failures/interrupts (which includes 
>>>>>> the System.gc() case) does come from 
>>>>>> GenCollectedHeap::do_collection(), but I'm not exactly sure about 
>>>>>> this, could anybody please clarify on it?
>>>>>
>>>>> Yes, i believe this is indeed the case, and my browsing of the
>>>>> code using cscope seemed to confirm that belief.
>>>>>
>>>>> More below ...
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Kris Mok
>>>>>>
>>>>>> On Thu, Jul 28, 2011 at 10:12 PM, Kevin Walls 
>>>>>> <kevin.walls at oracle.com <mailto:kevin.walls at oracle.com>> wrote:
>>>>>>
>>>>>>     __
>>>>>>     Hi --
>>>>>>
>>>>>>     6580448 was marked as a duplicate of 6581734, which fixed the 
>>>>>> fact
>>>>>>     that CMS collections were just not counted at all - with CMS, 
>>>>>> only a
>>>>>>     stop the world full gc would be counted in the stats.
>>>>>>
>>>>>>     But looks like you're right...  Here is a quick variation of the
>>>>>>     testcase from 6581734 which shows the same thing, and this 
>>>>>> verifies
>>>>>>     the same, and is solved by ExplicitGCInvokesConcurrent.  If 
>>>>>> there is
>>>>>>     no other feedback I can test if the removal of the
>>>>>>     TraceCMSMemoryManagerStats() call in
>>>>>>     CMSCollector::do_compaction_work is all we need...
>>>>>
>>>>>
>>>>> Kevin, yes, it would be great if you could verify this and push 
>>>>> the fix.
>>>>> I am not sure if the push would need to wait for the signing of OCA
>>>>> from Kris, but best to check with Those Who Would Know Such Things.
>>>>>
>>>>> Since the original CR has been closed, i'll open one momentarily and
>>>>> can make you RE (if that's OK with you). I'll be happy to serve as
>>>>> reviewer of the change.
>>>>>
>>>>> As regards the jstat counter reporting two pauses per concurrent
>>>>> CMS cycle, I am of two minds on what the original intention
>>>>> was. I'd have originally regarded the double increment as a
>>>>> bug, but as you state it is really two pauses, even if part of
>>>>> a single cycle. And it makes sense to count them as two. I
>>>>> agree that this should be documented and left alone, given
>>>>> how long we have had this behaviour, and the alternative
>>>>> (of counting cycles, rather than pauses) may be no better
>>>>> (or arguably worse). There's actually an open CR for this
>>>>> which we can redirect into a CR to update the relevant documentation.
>>>>>
>>>>> -- ramki
>>>>>
>>>>>>
>>>>>>     Regards
>>>>>>     Kevin
>>>>>>
>>>>>>
>>>>>>     /*
>>>>>>      * Copyright (c) 2011, Oracle and/or its affiliates. All rights
>>>>>>     reserved.
>>>>>>      * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>>>>>>      *
>>>>>>      * This code is free software; you can redistribute it and/or 
>>>>>> modify it
>>>>>>      * under the terms of the GNU General Public License version 
>>>>>> 2 only, as
>>>>>>      * published by the Free Software Foundation.
>>>>>>      *
>>>>>>      * This code is distributed in the hope that it will be 
>>>>>> useful, but
>>>>>>     WITHOUT
>>>>>>      * ANY WARRANTY; without even the implied warranty of 
>>>>>> MERCHANTABILITY or
>>>>>>      * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General 
>>>>>> Public License
>>>>>>      * version 2 for more details (a copy is included in the LICENSE
>>>>>>     file that
>>>>>>      * accompanied this code).
>>>>>>      *
>>>>>>      * You should have received a copy of the GNU General Public 
>>>>>> License
>>>>>>     version
>>>>>>      * 2 along with this work; if not, write to the Free Software
>>>>>>     Foundation,
>>>>>>      * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
>>>>>>      *
>>>>>>      * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA
>>>>>>     94065 USA
>>>>>>      * or visit www.oracle.com <http://www.oracle.com> if you need
>>>>>>     additional information or have any
>>>>>>      * questions.
>>>>>>      */
>>>>>>
>>>>>>     /*
>>>>>>      * @test TestFullGCount.java
>>>>>>      * @bug
>>>>>>      * @summary
>>>>>>      * @run main/othervm -XX:+UseConcMarkSweepGC TestFullGCCount
>>>>>>      *
>>>>>>      */
>>>>>>     import java.util.*;
>>>>>>     import java.lang.management.*;
>>>>>>
>>>>>>
>>>>>>
>>>>>>     public class TestFullGCCount {
>>>>>>
>>>>>>     private String poolName = "CMS";
>>>>>>     private String collectorName = "ConcurrentMarkSweep";
>>>>>>
>>>>>>     public static void main(String [] args) {
>>>>>>
>>>>>>         TestFullGCCount t = null;
>>>>>>         if (args.length==2) {
>>>>>>             t = new TestFullGCCount(args[0], args[1]);
>>>>>>         } else {
>>>>>>             System.out.println("Defaulting to monitor CMS pool and
>>>>>>     collector.");
>>>>>>             t = new TestFullGCCount();
>>>>>>         }
>>>>>>         t.run();
>>>>>>
>>>>>>     }
>>>>>>
>>>>>>     public TestFullGCCount(String pool, String collector) {
>>>>>>         poolName = pool;
>>>>>>         collectorName = collector;
>>>>>>     }
>>>>>>     public TestFullGCCount() {
>>>>>>     }
>>>>>>
>>>>>>     public void run() {
>>>>>>
>>>>>>         int count = 0;
>>>>>>         int iterations = 20;
>>>>>>         long counts[] = new long[iterations];
>>>>>>         boolean diffAlways2 = true; // assume we will fail
>>>>>>
>>>>>>         for (int i=0; i<iterations; i++) {
>>>>>>             System.gc();
>>>>>>             counts[i] = checkStats();
>>>>>>             if (i>0) {
>>>>>>                 if (counts[i] - counts[i-1] != 2) {
>>>>>>                     diffAlways2 = false;
>>>>>>                 }
>>>>>>             }
>>>>>>         }
>>>>>>         if (diffAlways2) {
>>>>>>             throw new RuntimeException("FAILED: difference in 
>>>>>> count is
>>>>>>     always 2.");
>>>>>>         }
>>>>>>         System.out.println("Passed.");
>>>>>>     }
>>>>>>
>>>>>>     private long checkStats() {
>>>>>>         long count = 0;
>>>>>>         List<MemoryPoolMXBean> pools =
>>>>>>     ManagementFactory.getMemoryPoolMXBeans();
>>>>>>         List<GarbageCollectorMXBean> collectors =
>>>>>>     ManagementFactory.getGarbageCollectorMXBeans();
>>>>>>         for (int i=0; i<collectors.size(); i++) {
>>>>>>             GarbageCollectorMXBean collector = collectors.get(i);
>>>>>>             String name = collector.getName();
>>>>>>             if (name.contains(collectorName)) {
>>>>>>                 System.out.println(name + ": collection count = "
>>>>>>                                    + 
>>>>>> collector.getCollectionCount());
>>>>>>                 count = collector.getCollectionCount();
>>>>>>             }
>>>>>>         }
>>>>>>         return count;
>>>>>>
>>>>>>     }
>>>>>>
>>>>>>     }
>>>>>>
>>>>>>
>>>>>>     On 27/07/11 17:12, Krystal Mok wrote:
>>>>>>>     Hi all,
>>>>>>>
>>>>>>>     I've been looking at a strange inconsistency of full GC count
>>>>>>>     recorded by jvmstat and JMM counters. I'd like to know which 
>>>>>>> ones
>>>>>>>     of the following behaviors are by design, which ones are 
>>>>>>> bugs, and
>>>>>>>     which ones are just my misunderstanding. I apologize for 
>>>>>>> making a
>>>>>>>     short story long...
>>>>>>>
>>>>>>>     =====================================================
>>>>>>>
>>>>>>>     The counters involved:
>>>>>>>
>>>>>>>     * A jvmstat counter named "sun.gc.collector.1.invocations" 
>>>>>>> keeps
>>>>>>>     track of the number of pauses occured as a result of a major
>>>>>>>     collection. It is used by utilities such as jstat as the 
>>>>>>> source of
>>>>>>>     "FGC" (full collection count), and the old gen collection 
>>>>>>> count in
>>>>>>>     Visual GC. It's updated by an TraceCollectorStats object.
>>>>>>>     * A JMM counter, GCMemoryManager::_num_collections, keeps 
>>>>>>> track of
>>>>>>>     the number of collections that have ended. This counter is 
>>>>>>> used as
>>>>>>>     HotSpot's implementation of the JMX
>>>>>>>     GarbageCollectorMXBean.getCollectionCount(). It's updated by
>>>>>>>     either a TraceMemoryManagerStats object or a
>>>>>>>     TraceCMSMemoryManagerStats object.
>>>>>>>
>>>>>>>     To show the situation, I've made a screenshot of a VisualVM 
>>>>>>> and a
>>>>>>>     JConsole running side by side, both are monitoring the 
>>>>>>> VisualVM's
>>>>>>>     GC stats:
>>>>>>>     
>>>>>>> http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png 
>>>>>>>
>>>>>>>     (I'll upload the screenshot to somewhere else if anybody 
>>>>>>> can't see it)
>>>>>>>     The VisualVM instance is running on JDK6u26, with ParNew+CMS.
>>>>>>>     In the screenshot, Visual GC reports that the old gen 
>>>>>>> collection
>>>>>>>     count is 20, while JConsole reports 10.
>>>>>>>
>>>>>>>     I see that there was this bug:
>>>>>>>     6580448: CMS: Full GC collection count mismatch between
>>>>>>>     GarbageCollectorMXBean and jvmstat (VisualGC)
>>>>>>>     I don't think the current implementation has a bug in the sense
>>>>>>>     that the two counters don't report the same number.
>>>>>>>
>>>>>>>     This behavior seems reasonable, but the naming of the value in
>>>>>>>     these tools are confusing: both tools say "collections", but
>>>>>>>     apparently the number in Visual GC means "number of pauses" 
>>>>>>> where
>>>>>>>     as the number in JConsole means "number of collection cycles".
>>>>>>>     It'd be great if the difference could be documented 
>>>>>>> somewhere, if
>>>>>>>     that's the intended behavior.
>>>>>>>
>>>>>>>     And then the buggy behavior. Code demo posted on gist:
>>>>>>>     https://gist.github.com/1106263
>>>>>>>     Starting from JDK6u23, when using CMS without
>>>>>>>     ExplicitGCInvokesConcurrent, System.gc() (or
>>>>>>>     Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would
>>>>>>>     make the JMM GC counter increment by 2 per invocation, while 
>>>>>>> the
>>>>>>>     jvmstat counter is only incremented by 1. I believe the 
>>>>>>> latter is
>>>>>>>     correct and the former needs some fixing.
>>>>>>>
>>>>>>>     =====================================================
>>>>>>>
>>>>>>>     My understanding of the behavior shown above:
>>>>>>>
>>>>>>>     1. The concurrent GC part:
>>>>>>>
>>>>>>>     There are 2 pauses in a CMS concurrent GC cycle, one in the
>>>>>>>     initial mark phase, and one in the final remark phase.
>>>>>>>     To trigger a concurrent GC cycle, the CMS thread wakes up
>>>>>>>     periodically to see if it shouldConcurrentCollect(), and 
>>>>>>> trigger a
>>>>>>>     cycle when the predicate returned true, or goes back to 
>>>>>>> sleep if
>>>>>>>     the predicate returned false. The whole concurrent GC cycle
>>>>>>>     doesn't go through GenCollectedHeap::do_collection().
>>>>>>>
>>>>>>>     The jvmstat counter for old gen pauses is updated in
>>>>>>>     CMSCollector::do_CMS_operation(CMS_op_type op), which exactly
>>>>>>>     covers both pause phases.
>>>>>>>
>>>>>>>     The JMM counter, however, is updated in the concurrent sweep
>>>>>>>     phase, CMSCollector::sweep(bool asynch), if there was no
>>>>>>>     concurrent mode failure; or it is updated in
>>>>>>>     CMSCollector::do_compaction_work(bool clear_all_soft_refs) 
>>>>>>> in case
>>>>>>>     of a bailout due to concurrent mode failure (advertised as 
>>>>>>> so in
>>>>>>>     the code comments). So that's an increment by 1 per 
>>>>>>> concurrent GC
>>>>>>>     cycle, which does reflect the "number of collection cycles", 
>>>>>>> fair
>>>>>>>     enough.
>>>>>>>
>>>>>>>     So far so good.
>>>>>>>
>>>>>>>     2. The System.gc() part:
>>>>>>>
>>>>>>>     Without ExplicitGCInvokesConcurrent set, System.gc() does a
>>>>>>>     stop-the-world full GC, which consists of only one pause, so
>>>>>>>     "number of pauses" would equal "number of collections" in 
>>>>>>> this case.
>>>>>>>     It will go into GenCollectedHeap::do_collection(); both the
>>>>>>>     jvmstat and the JMM GC counter gets incremented by 1 here,
>>>>>>>
>>>>>>>     TraceCollectorStats tcs(_gens[i]->counters());
>>>>>>>     TraceMemoryManagerStats tmms(_gens[i]->kind());
>>>>>>>
>>>>>>>     But, drilling down into:
>>>>>>>     _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);
>>>>>>>
>>>>>>>     That'll eventually go into:
>>>>>>>     CMSCollector::acquire_control_and_collect(bool full, bool
>>>>>>>     clear_all_soft_refs)
>>>>>>>
>>>>>>>     System.gc() is user requested so that'll go further into
>>>>>>>     mark-sweep-compact:
>>>>>>>     CMSCollector::do_compaction_work(bool clear_all_soft_refs)
>>>>>>>     And here, it increments the JMM GC counter again (remember 
>>>>>>> it was
>>>>>>>     in the concurrent GC path too, to handle bailouts), even though
>>>>>>>     this is still in the same collection. This leads to the "buggy
>>>>>>>     behavior" mentioned earlier.
>>>>>>>
>>>>>>>     The JMM GC counter wasn't added to CMS until this fix got in:
>>>>>>>     6581734: CMS Old Gen's collection usage is zero after GC 
>>>>>>> which is
>>>>>>>     incorrect
>>>>>>>
>>>>>>>     The code added to CMSCollector::do_compaction_work() works 
>>>>>>> fine in
>>>>>>>     the concurrent GC path, but interacts badly with the original
>>>>>>>     logic in GenCollectedHeap::do_collection().
>>>>>>>
>>>>>>>     =====================================================
>>>>>>>
>>>>>>>     I thought all concurrent mode failures/interrupts come from
>>>>>>>     GenCollectedHeap::do_collection(). If that's the case, then it
>>>>>>>     seems unnecessary to update the JMM GC counter in
>>>>>>>     CMSCollector::do_compaction_work(), simply removing it 
>>>>>>> should fix
>>>>>>>     the problem.
>>>>>>>
>>>>>>>     With that, I'd purpose the following (XS) change: (diff 
>>>>>>> against HS20)
>>>>>>>
>>>>>>>     diff -r f0f676c5a2c6
>>>>>>>     
>>>>>>> src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp 
>>>>>>>
>>>>>>>     ---
>>>>>>>     
>>>>>>> a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp 
>>>>>>>
>>>>>>>          Tue Mar 15 19:30:16 2011 -0700
>>>>>>>     +++
>>>>>>>     
>>>>>>> b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp 
>>>>>>>
>>>>>>>          Thu Jul 28 00:02:41 2011 +0800
>>>>>>>     @@ -2022,9 +2022,6 @@
>>>>>>>                                                     
>>>>>>> _intra_sweep_estimate.padded_average());
>>>>>>>        }
>>>>>>>
>>>>>>>     -  {
>>>>>>>     -    TraceCMSMemoryManagerStats();
>>>>>>>     -  }
>>>>>>>        GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
>>>>>>>          ref_processor(), clear_all_soft_refs);
>>>>>>>        #ifdef ASSERT
>>>>>>>
>>>>>>>     The same goes for the changes in:
>>>>>>>     7036199: Adding a notification to the implementation of
>>>>>>>     GarbageCollectorMXBeans
>>>>>>>
>>>>>>>     =====================================================
>>>>>>>
>>>>>>>     P.S. Is there an "official" name for the counters that I 
>>>>>>> referred
>>>>>>>     to as "jvmstat counters" above? Is it just jvmstat, or 
>>>>>>> PerfData or
>>>>>>>     HSPERFDATA?
>>>>>>
>>>>>>
>>>
>>



More information about the serviceability-dev mailing list