Review request: 7072527 CMS: JMM GC counters overcount in some cases

Kevin Walls kevin.walls at oracle.com
Fri Jul 29 16:35:52 UTC 2011


Hi --

I got this ready in a webrev:

http://cr.openjdk.java.net/~kevinw/7072527/webrev.00/

The new test is passing, as are the old ones. 8-)

Also, this change means that a CMS cycle ending in concurrent mode 
failure now counts as one collection.  One could argue that either way 
(should it be 2 collections?) - but I'm thinking if we are counting 
"completed" collections then we are now counting correctly!

Thanks
Kevin


On 29/07/11 02:08, Y. S. Ramakrishna wrote:
> I filed: 7072527 CMS: JMM GC counters overcount in some cases
>
> On 07/28/11 17:29, Y. S. Ramakrishna wrote:
>> Hi Kevin --
>>
>> thanks for jumping on this! More inline below ...
>>
>> On 07/28/11 09:33, Krystal Mok wrote:
>>> Hi Kevin,
>>>
>>> Thank you for taking care of this, and it's good to see the problem 
>>> is verified.
>>>
>>> I think whether or not the suggested fix is sufficient depends on 
>>> what paths can reach CMSCollector::do_compaction_work(). If all 
>>> paths that can reach CMSCollector::do_compaction_work() come from 
>>> GenCollectedHeap::do_collection(), then the fix should be good to 
>>> go. Otherwise it'll need a better workaround.
>>>
>>> I believe all concurrent mode failures/interrupts (which includes 
>>> the System.gc() case) does come from 
>>> GenCollectedHeap::do_collection(), but I'm not exactly sure about 
>>> this, could anybody please clarify on it?
>>
>> Yes, i believe this is indeed the case, and my browsing of the
>> code using cscope seemed to confirm that belief.
>>
>> More below ...
>>
>>>
>>> Regards,
>>> Kris Mok
>>>
>>> On Thu, Jul 28, 2011 at 10:12 PM, Kevin Walls 
>>> <kevin.walls at oracle.com <mailto:kevin.walls at oracle.com>> wrote:
>>>
>>>     __
>>>     Hi --
>>>
>>>     6580448 was marked as a duplicate of 6581734, which fixed the fact
>>>     that CMS collections were just not counted at all - with CMS, 
>>> only a
>>>     stop the world full gc would be counted in the stats.
>>>
>>>     But looks like you're right...  Here is a quick variation of the
>>>     testcase from 6581734 which shows the same thing, and this verifies
>>>     the same, and is solved by ExplicitGCInvokesConcurrent.  If 
>>> there is
>>>     no other feedback I can test if the removal of the
>>>     TraceCMSMemoryManagerStats() call in
>>>     CMSCollector::do_compaction_work is all we need...
>>
>>
>> Kevin, yes, it would be great if you could verify this and push the fix.
>> I am not sure if the push would need to wait for the signing of OCA
>> from Kris, but best to check with Those Who Would Know Such Things.
>>
>> Since the original CR has been closed, i'll open one momentarily and
>> can make you RE (if that's OK with you). I'll be happy to serve as
>> reviewer of the change.
>>
>> As regards the jstat counter reporting two pauses per concurrent
>> CMS cycle, I am of two minds on what the original intention
>> was. I'd have originally regarded the double increment as a
>> bug, but as you state it is really two pauses, even if part of
>> a single cycle. And it makes sense to count them as two. I
>> agree that this should be documented and left alone, given
>> how long we have had this behaviour, and the alternative
>> (of counting cycles, rather than pauses) may be no better
>> (or arguably worse). There's actually an open CR for this
>> which we can redirect into a CR to update the relevant documentation.
>>
>> -- ramki
>>
>>>
>>>     Regards
>>>     Kevin
>>>
>>>
>>>     /*
>>>      * Copyright (c) 2011, Oracle and/or its affiliates. All rights
>>>     reserved.
>>>      * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>>>      *
>>>      * This code is free software; you can redistribute it and/or 
>>> modify it
>>>      * under the terms of the GNU General Public License version 2 
>>> only, as
>>>      * published by the Free Software Foundation.
>>>      *
>>>      * This code is distributed in the hope that it will be useful, but
>>>     WITHOUT
>>>      * ANY WARRANTY; without even the implied warranty of 
>>> MERCHANTABILITY or
>>>      * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>>> License
>>>      * version 2 for more details (a copy is included in the LICENSE
>>>     file that
>>>      * accompanied this code).
>>>      *
>>>      * You should have received a copy of the GNU General Public 
>>> License
>>>     version
>>>      * 2 along with this work; if not, write to the Free Software
>>>     Foundation,
>>>      * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
>>>      *
>>>      * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA
>>>     94065 USA
>>>      * or visit www.oracle.com <http://www.oracle.com> if you need
>>>     additional information or have any
>>>      * questions.
>>>      */
>>>
>>>     /*
>>>      * @test TestFullGCount.java
>>>      * @bug
>>>      * @summary
>>>      * @run main/othervm -XX:+UseConcMarkSweepGC TestFullGCCount
>>>      *
>>>      */
>>>     import java.util.*;
>>>     import java.lang.management.*;
>>>
>>>
>>>
>>>     public class TestFullGCCount {
>>>
>>>     private String poolName = "CMS";
>>>     private String collectorName = "ConcurrentMarkSweep";
>>>
>>>     public static void main(String [] args) {
>>>
>>>         TestFullGCCount t = null;
>>>         if (args.length==2) {
>>>             t = new TestFullGCCount(args[0], args[1]);
>>>         } else {
>>>             System.out.println("Defaulting to monitor CMS pool and
>>>     collector.");
>>>             t = new TestFullGCCount();
>>>         }
>>>         t.run();
>>>
>>>     }
>>>
>>>     public TestFullGCCount(String pool, String collector) {
>>>         poolName = pool;
>>>         collectorName = collector;
>>>     }
>>>     public TestFullGCCount() {
>>>     }
>>>
>>>     public void run() {
>>>
>>>         int count = 0;
>>>         int iterations = 20;
>>>         long counts[] = new long[iterations];
>>>         boolean diffAlways2 = true; // assume we will fail
>>>
>>>         for (int i=0; i<iterations; i++) {
>>>             System.gc();
>>>             counts[i] = checkStats();
>>>             if (i>0) {
>>>                 if (counts[i] - counts[i-1] != 2) {
>>>                     diffAlways2 = false;
>>>                 }
>>>             }
>>>         }
>>>         if (diffAlways2) {
>>>             throw new RuntimeException("FAILED: difference in count is
>>>     always 2.");
>>>         }
>>>         System.out.println("Passed.");
>>>     }
>>>
>>>     private long checkStats() {
>>>         long count = 0;
>>>         List<MemoryPoolMXBean> pools =
>>>     ManagementFactory.getMemoryPoolMXBeans();
>>>         List<GarbageCollectorMXBean> collectors =
>>>     ManagementFactory.getGarbageCollectorMXBeans();
>>>         for (int i=0; i<collectors.size(); i++) {
>>>             GarbageCollectorMXBean collector = collectors.get(i);
>>>             String name = collector.getName();
>>>             if (name.contains(collectorName)) {
>>>                 System.out.println(name + ": collection count = "
>>>                                    + collector.getCollectionCount());
>>>                 count = collector.getCollectionCount();
>>>             }
>>>         }
>>>         return count;
>>>
>>>     }
>>>
>>>     }
>>>
>>>
>>>     On 27/07/11 17:12, Krystal Mok wrote:
>>>>     Hi all,
>>>>
>>>>     I've been looking at a strange inconsistency of full GC count
>>>>     recorded by jvmstat and JMM counters. I'd like to know which ones
>>>>     of the following behaviors are by design, which ones are bugs, and
>>>>     which ones are just my misunderstanding. I apologize for making a
>>>>     short story long...
>>>>
>>>>     =====================================================
>>>>
>>>>     The counters involved:
>>>>
>>>>     * A jvmstat counter named "sun.gc.collector.1.invocations" keeps
>>>>     track of the number of pauses occured as a result of a major
>>>>     collection. It is used by utilities such as jstat as the source of
>>>>     "FGC" (full collection count), and the old gen collection count in
>>>>     Visual GC. It's updated by an TraceCollectorStats object.
>>>>     * A JMM counter, GCMemoryManager::_num_collections, keeps track of
>>>>     the number of collections that have ended. This counter is used as
>>>>     HotSpot's implementation of the JMX
>>>>     GarbageCollectorMXBean.getCollectionCount(). It's updated by
>>>>     either a TraceMemoryManagerStats object or a
>>>>     TraceCMSMemoryManagerStats object.
>>>>
>>>>     To show the situation, I've made a screenshot of a VisualVM and a
>>>>     JConsole running side by side, both are monitoring the VisualVM's
>>>>     GC stats:
>>>>     
>>>> http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png 
>>>>
>>>>     (I'll upload the screenshot to somewhere else if anybody can't 
>>>> see it)
>>>>     The VisualVM instance is running on JDK6u26, with ParNew+CMS.
>>>>     In the screenshot, Visual GC reports that the old gen collection
>>>>     count is 20, while JConsole reports 10.
>>>>
>>>>     I see that there was this bug:
>>>>     6580448: CMS: Full GC collection count mismatch between
>>>>     GarbageCollectorMXBean and jvmstat (VisualGC)
>>>>     I don't think the current implementation has a bug in the sense
>>>>     that the two counters don't report the same number.
>>>>
>>>>     This behavior seems reasonable, but the naming of the value in
>>>>     these tools are confusing: both tools say "collections", but
>>>>     apparently the number in Visual GC means "number of pauses" where
>>>>     as the number in JConsole means "number of collection cycles".
>>>>     It'd be great if the difference could be documented somewhere, if
>>>>     that's the intended behavior.
>>>>
>>>>     And then the buggy behavior. Code demo posted on gist:
>>>>     https://gist.github.com/1106263
>>>>     Starting from JDK6u23, when using CMS without
>>>>     ExplicitGCInvokesConcurrent, System.gc() (or
>>>>     Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would
>>>>     make the JMM GC counter increment by 2 per invocation, while the
>>>>     jvmstat counter is only incremented by 1. I believe the latter is
>>>>     correct and the former needs some fixing.
>>>>
>>>>     =====================================================
>>>>
>>>>     My understanding of the behavior shown above:
>>>>
>>>>     1. The concurrent GC part:
>>>>
>>>>     There are 2 pauses in a CMS concurrent GC cycle, one in the
>>>>     initial mark phase, and one in the final remark phase.
>>>>     To trigger a concurrent GC cycle, the CMS thread wakes up
>>>>     periodically to see if it shouldConcurrentCollect(), and trigger a
>>>>     cycle when the predicate returned true, or goes back to sleep if
>>>>     the predicate returned false. The whole concurrent GC cycle
>>>>     doesn't go through GenCollectedHeap::do_collection().
>>>>
>>>>     The jvmstat counter for old gen pauses is updated in
>>>>     CMSCollector::do_CMS_operation(CMS_op_type op), which exactly
>>>>     covers both pause phases.
>>>>
>>>>     The JMM counter, however, is updated in the concurrent sweep
>>>>     phase, CMSCollector::sweep(bool asynch), if there was no
>>>>     concurrent mode failure; or it is updated in
>>>>     CMSCollector::do_compaction_work(bool clear_all_soft_refs) in case
>>>>     of a bailout due to concurrent mode failure (advertised as so in
>>>>     the code comments). So that's an increment by 1 per concurrent GC
>>>>     cycle, which does reflect the "number of collection cycles", fair
>>>>     enough.
>>>>
>>>>     So far so good.
>>>>
>>>>     2. The System.gc() part:
>>>>
>>>>     Without ExplicitGCInvokesConcurrent set, System.gc() does a
>>>>     stop-the-world full GC, which consists of only one pause, so
>>>>     "number of pauses" would equal "number of collections" in this 
>>>> case.
>>>>     It will go into GenCollectedHeap::do_collection(); both the
>>>>     jvmstat and the JMM GC counter gets incremented by 1 here,
>>>>
>>>>     TraceCollectorStats tcs(_gens[i]->counters());
>>>>     TraceMemoryManagerStats tmms(_gens[i]->kind());
>>>>
>>>>     But, drilling down into:
>>>>     _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);
>>>>
>>>>     That'll eventually go into:
>>>>     CMSCollector::acquire_control_and_collect(bool full, bool
>>>>     clear_all_soft_refs)
>>>>
>>>>     System.gc() is user requested so that'll go further into
>>>>     mark-sweep-compact:
>>>>     CMSCollector::do_compaction_work(bool clear_all_soft_refs)
>>>>     And here, it increments the JMM GC counter again (remember it was
>>>>     in the concurrent GC path too, to handle bailouts), even though
>>>>     this is still in the same collection. This leads to the "buggy
>>>>     behavior" mentioned earlier.
>>>>
>>>>     The JMM GC counter wasn't added to CMS until this fix got in:
>>>>     6581734: CMS Old Gen's collection usage is zero after GC which is
>>>>     incorrect
>>>>
>>>>     The code added to CMSCollector::do_compaction_work() works fine in
>>>>     the concurrent GC path, but interacts badly with the original
>>>>     logic in GenCollectedHeap::do_collection().
>>>>
>>>>     =====================================================
>>>>
>>>>     I thought all concurrent mode failures/interrupts come from
>>>>     GenCollectedHeap::do_collection(). If that's the case, then it
>>>>     seems unnecessary to update the JMM GC counter in
>>>>     CMSCollector::do_compaction_work(), simply removing it should fix
>>>>     the problem.
>>>>
>>>>     With that, I'd purpose the following (XS) change: (diff against 
>>>> HS20)
>>>>
>>>>     diff -r f0f676c5a2c6
>>>>     
>>>> src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp 
>>>>
>>>>     ---
>>>>     
>>>> a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp 
>>>>
>>>>          Tue Mar 15 19:30:16 2011 -0700
>>>>     +++
>>>>     
>>>> b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp 
>>>>
>>>>          Thu Jul 28 00:02:41 2011 +0800
>>>>     @@ -2022,9 +2022,6 @@
>>>>                                                     
>>>> _intra_sweep_estimate.padded_average());
>>>>        }
>>>>
>>>>     -  {
>>>>     -    TraceCMSMemoryManagerStats();
>>>>     -  }
>>>>        GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
>>>>          ref_processor(), clear_all_soft_refs);
>>>>        #ifdef ASSERT
>>>>
>>>>     The same goes for the changes in:
>>>>     7036199: Adding a notification to the implementation of
>>>>     GarbageCollectorMXBeans
>>>>
>>>>     =====================================================
>>>>
>>>>     P.S. Is there an "official" name for the counters that I referred
>>>>     to as "jvmstat counters" above? Is it just jvmstat, or PerfData or
>>>>     HSPERFDATA?
>>>
>>>




More information about the hotspot-gc-dev mailing list