Number of CMS collections and the JMM GC counters
Y. S. Ramakrishna
y.s.ramakrishna at oracle.com
Thu Jul 28 18:08:09 PDT 2011
I filed: 7072527 CMS: JMM GC counters overcount in some cases
On 07/28/11 17:29, Y. S. Ramakrishna wrote:
> Hi Kevin --
>
> thanks for jumping on this! More inline below ...
>
> On 07/28/11 09:33, Krystal Mok wrote:
>> Hi Kevin,
>>
>> Thank you for taking care of this, and it's good to see the problem is
>> verified.
>>
>> I think whether or not the suggested fix is sufficient depends on what
>> paths can reach CMSCollector::do_compaction_work(). If all paths that
>> can reach CMSCollector::do_compaction_work() come from
>> GenCollectedHeap::do_collection(), then the fix should be good to go.
>> Otherwise it'll need a better workaround.
>>
>> I believe all concurrent mode failures/interrupts (which includes the
>> System.gc() case) does come from GenCollectedHeap::do_collection(),
>> but I'm not exactly sure about this, could anybody please clarify on it?
>
> Yes, i believe this is indeed the case, and my browsing of the
> code using cscope seemed to confirm that belief.
>
> More below ...
>
>>
>> Regards,
>> Kris Mok
>>
>> On Thu, Jul 28, 2011 at 10:12 PM, Kevin Walls <kevin.walls at oracle.com
>> <mailto:kevin.walls at oracle.com>> wrote:
>>
>> __
>> Hi --
>>
>> 6580448 was marked as a duplicate of 6581734, which fixed the fact
>> that CMS collections were just not counted at all - with CMS, only a
>> stop the world full gc would be counted in the stats.
>>
>> But looks like you're right... Here is a quick variation of the
>> testcase from 6581734 which shows the same thing, and this verifies
>> the same, and is solved by ExplicitGCInvokesConcurrent. If there is
>> no other feedback I can test if the removal of the
>> TraceCMSMemoryManagerStats() call in
>> CMSCollector::do_compaction_work is all we need...
>
>
> Kevin, yes, it would be great if you could verify this and push the fix.
> I am not sure if the push would need to wait for the signing of OCA
> from Kris, but best to check with Those Who Would Know Such Things.
>
> Since the original CR has been closed, i'll open one momentarily and
> can make you RE (if that's OK with you). I'll be happy to serve as
> reviewer of the change.
>
> As regards the jstat counter reporting two pauses per concurrent
> CMS cycle, I am of two minds on what the original intention
> was. I'd have originally regarded the double increment as a
> bug, but as you state it is really two pauses, even if part of
> a single cycle. And it makes sense to count them as two. I
> agree that this should be documented and left alone, given
> how long we have had this behaviour, and the alternative
> (of counting cycles, rather than pauses) may be no better
> (or arguably worse). There's actually an open CR for this
> which we can redirect into a CR to update the relevant documentation.
>
> -- ramki
>
>>
>> Regards
>> Kevin
>>
>>
>> /*
>> * Copyright (c) 2011, Oracle and/or its affiliates. All rights
>> reserved.
>> * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>> *
>> * This code is free software; you can redistribute it and/or
>> modify it
>> * under the terms of the GNU General Public License version 2
>> only, as
>> * published by the Free Software Foundation.
>> *
>> * This code is distributed in the hope that it will be useful, but
>> WITHOUT
>> * ANY WARRANTY; without even the implied warranty of
>> MERCHANTABILITY or
>> * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
>> License
>> * version 2 for more details (a copy is included in the LICENSE
>> file that
>> * accompanied this code).
>> *
>> * You should have received a copy of the GNU General Public License
>> version
>> * 2 along with this work; if not, write to the Free Software
>> Foundation,
>> * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
>> *
>> * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA
>> 94065 USA
>> * or visit www.oracle.com <http://www.oracle.com> if you need
>> additional information or have any
>> * questions.
>> */
>>
>> /*
>> * @test TestFullGCount.java
>> * @bug
>> * @summary
>> * @run main/othervm -XX:+UseConcMarkSweepGC TestFullGCCount
>> *
>> */
>> import java.util.*;
>> import java.lang.management.*;
>>
>>
>>
>> public class TestFullGCCount {
>>
>> private String poolName = "CMS";
>> private String collectorName = "ConcurrentMarkSweep";
>>
>> public static void main(String [] args) {
>>
>> TestFullGCCount t = null;
>> if (args.length==2) {
>> t = new TestFullGCCount(args[0], args[1]);
>> } else {
>> System.out.println("Defaulting to monitor CMS pool and
>> collector.");
>> t = new TestFullGCCount();
>> }
>> t.run();
>>
>> }
>>
>> public TestFullGCCount(String pool, String collector) {
>> poolName = pool;
>> collectorName = collector;
>> }
>> public TestFullGCCount() {
>> }
>>
>> public void run() {
>>
>> int count = 0;
>> int iterations = 20;
>> long counts[] = new long[iterations];
>> boolean diffAlways2 = true; // assume we will fail
>>
>> for (int i=0; i<iterations; i++) {
>> System.gc();
>> counts[i] = checkStats();
>> if (i>0) {
>> if (counts[i] - counts[i-1] != 2) {
>> diffAlways2 = false;
>> }
>> }
>> }
>> if (diffAlways2) {
>> throw new RuntimeException("FAILED: difference in count is
>> always 2.");
>> }
>> System.out.println("Passed.");
>> }
>>
>> private long checkStats() {
>> long count = 0;
>> List<MemoryPoolMXBean> pools =
>> ManagementFactory.getMemoryPoolMXBeans();
>> List<GarbageCollectorMXBean> collectors =
>> ManagementFactory.getGarbageCollectorMXBeans();
>> for (int i=0; i<collectors.size(); i++) {
>> GarbageCollectorMXBean collector = collectors.get(i);
>> String name = collector.getName();
>> if (name.contains(collectorName)) {
>> System.out.println(name + ": collection count = "
>> + collector.getCollectionCount());
>> count = collector.getCollectionCount();
>> }
>> }
>> return count;
>>
>> }
>>
>> }
>>
>>
>> On 27/07/11 17:12, Krystal Mok wrote:
>>> Hi all,
>>>
>>> I've been looking at a strange inconsistency of full GC count
>>> recorded by jvmstat and JMM counters. I'd like to know which ones
>>> of the following behaviors are by design, which ones are bugs, and
>>> which ones are just my misunderstanding. I apologize for making a
>>> short story long...
>>>
>>> =====================================================
>>>
>>> The counters involved:
>>>
>>> * A jvmstat counter named "sun.gc.collector.1.invocations" keeps
>>> track of the number of pauses occured as a result of a major
>>> collection. It is used by utilities such as jstat as the source of
>>> "FGC" (full collection count), and the old gen collection count in
>>> Visual GC. It's updated by an TraceCollectorStats object.
>>> * A JMM counter, GCMemoryManager::_num_collections, keeps track of
>>> the number of collections that have ended. This counter is used as
>>> HotSpot's implementation of the JMX
>>> GarbageCollectorMXBean.getCollectionCount(). It's updated by
>>> either a TraceMemoryManagerStats object or a
>>> TraceCMSMemoryManagerStats object.
>>>
>>> To show the situation, I've made a screenshot of a VisualVM and a
>>> JConsole running side by side, both are monitoring the VisualVM's
>>> GC stats:
>>>
>>> http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png
>>>
>>> (I'll upload the screenshot to somewhere else if anybody can't
>>> see it)
>>> The VisualVM instance is running on JDK6u26, with ParNew+CMS.
>>> In the screenshot, Visual GC reports that the old gen collection
>>> count is 20, while JConsole reports 10.
>>>
>>> I see that there was this bug:
>>> 6580448: CMS: Full GC collection count mismatch between
>>> GarbageCollectorMXBean and jvmstat (VisualGC)
>>> I don't think the current implementation has a bug in the sense
>>> that the two counters don't report the same number.
>>>
>>> This behavior seems reasonable, but the naming of the value in
>>> these tools are confusing: both tools say "collections", but
>>> apparently the number in Visual GC means "number of pauses" where
>>> as the number in JConsole means "number of collection cycles".
>>> It'd be great if the difference could be documented somewhere, if
>>> that's the intended behavior.
>>>
>>> And then the buggy behavior. Code demo posted on gist:
>>> https://gist.github.com/1106263
>>> Starting from JDK6u23, when using CMS without
>>> ExplicitGCInvokesConcurrent, System.gc() (or
>>> Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would
>>> make the JMM GC counter increment by 2 per invocation, while the
>>> jvmstat counter is only incremented by 1. I believe the latter is
>>> correct and the former needs some fixing.
>>>
>>> =====================================================
>>>
>>> My understanding of the behavior shown above:
>>>
>>> 1. The concurrent GC part:
>>>
>>> There are 2 pauses in a CMS concurrent GC cycle, one in the
>>> initial mark phase, and one in the final remark phase.
>>> To trigger a concurrent GC cycle, the CMS thread wakes up
>>> periodically to see if it shouldConcurrentCollect(), and trigger a
>>> cycle when the predicate returned true, or goes back to sleep if
>>> the predicate returned false. The whole concurrent GC cycle
>>> doesn't go through GenCollectedHeap::do_collection().
>>>
>>> The jvmstat counter for old gen pauses is updated in
>>> CMSCollector::do_CMS_operation(CMS_op_type op), which exactly
>>> covers both pause phases.
>>>
>>> The JMM counter, however, is updated in the concurrent sweep
>>> phase, CMSCollector::sweep(bool asynch), if there was no
>>> concurrent mode failure; or it is updated in
>>> CMSCollector::do_compaction_work(bool clear_all_soft_refs) in case
>>> of a bailout due to concurrent mode failure (advertised as so in
>>> the code comments). So that's an increment by 1 per concurrent GC
>>> cycle, which does reflect the "number of collection cycles", fair
>>> enough.
>>>
>>> So far so good.
>>>
>>> 2. The System.gc() part:
>>>
>>> Without ExplicitGCInvokesConcurrent set, System.gc() does a
>>> stop-the-world full GC, which consists of only one pause, so
>>> "number of pauses" would equal "number of collections" in this case.
>>> It will go into GenCollectedHeap::do_collection(); both the
>>> jvmstat and the JMM GC counter gets incremented by 1 here,
>>>
>>> TraceCollectorStats tcs(_gens[i]->counters());
>>> TraceMemoryManagerStats tmms(_gens[i]->kind());
>>>
>>> But, drilling down into:
>>> _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);
>>>
>>> That'll eventually go into:
>>> CMSCollector::acquire_control_and_collect(bool full, bool
>>> clear_all_soft_refs)
>>>
>>> System.gc() is user requested so that'll go further into
>>> mark-sweep-compact:
>>> CMSCollector::do_compaction_work(bool clear_all_soft_refs)
>>> And here, it increments the JMM GC counter again (remember it was
>>> in the concurrent GC path too, to handle bailouts), even though
>>> this is still in the same collection. This leads to the "buggy
>>> behavior" mentioned earlier.
>>>
>>> The JMM GC counter wasn't added to CMS until this fix got in:
>>> 6581734: CMS Old Gen's collection usage is zero after GC which is
>>> incorrect
>>>
>>> The code added to CMSCollector::do_compaction_work() works fine in
>>> the concurrent GC path, but interacts badly with the original
>>> logic in GenCollectedHeap::do_collection().
>>>
>>> =====================================================
>>>
>>> I thought all concurrent mode failures/interrupts come from
>>> GenCollectedHeap::do_collection(). If that's the case, then it
>>> seems unnecessary to update the JMM GC counter in
>>> CMSCollector::do_compaction_work(), simply removing it should fix
>>> the problem.
>>>
>>> With that, I'd purpose the following (XS) change: (diff against
>>> HS20)
>>>
>>> diff -r f0f676c5a2c6
>>>
>>> src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>>>
>>> ---
>>>
>>> a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>>>
>>> Tue Mar 15 19:30:16 2011 -0700
>>> +++
>>>
>>> b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>>>
>>> Thu Jul 28 00:02:41 2011 +0800
>>> @@ -2022,9 +2022,6 @@
>>>
>>> _intra_sweep_estimate.padded_average());
>>> }
>>>
>>> - {
>>> - TraceCMSMemoryManagerStats();
>>> - }
>>> GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
>>> ref_processor(), clear_all_soft_refs);
>>> #ifdef ASSERT
>>>
>>> The same goes for the changes in:
>>> 7036199: Adding a notification to the implementation of
>>> GarbageCollectorMXBeans
>>>
>>> =====================================================
>>>
>>> P.S. Is there an "official" name for the counters that I referred
>>> to as "jvmstat counters" above? Is it just jvmstat, or PerfData or
>>> HSPERFDATA?
>>
>>
More information about the serviceability-dev
mailing list