Number of CMS collections and the JMM GC counters

Krystal Mok rednaxelafx at gmail.com
Fri Jul 29 02:29:09 UTC 2011


Hi Ramki and Kevin,

Kevin, yes, it would be great if you could verify this and push the fix.
> I am not sure if the push would need to wait for the signing of OCA
> from Kris, but best to check with Those Who Would Know Such Things.

It'll take some time before we can sign the OCA. I heard that the internal
process is moving but I'm not sure how long it's going to take. Sorry about
that. Will report our progress on it ASAP.

As regards the jstat counter reporting two pauses per concurrent
> CMS cycle, I am of two minds on what the original intention
> was. I'd have originally regarded the double increment as a
> bug, but as you state it is really two pauses, even if part of
> a single cycle. And it makes sense to count them as two. I
> agree that this should be documented and left alone, given
> how long we have had this behaviour, and the alternative
> (of counting cycles, rather than pauses) may be no better
> (or arguably worse). There's actually an open CR for this
> which we can redirect into a CR to update the relevant documentation.

Cool! I'd love to see improved documentation on this. Then it's much easier
to explain about it to other people.

Thank you all,
Kris Mok

On Fri, Jul 29, 2011 at 8:29 AM, Y. S. Ramakrishna <
y.s.ramakrishna at oracle.com> wrote:

> Hi Kevin --
>
> thanks for jumping on this! More inline below ...
>
>
> On 07/28/11 09:33, Krystal Mok wrote:
>
>> Hi Kevin,
>>
>> Thank you for taking care of this, and it's good to see the problem is
>> verified.
>>
>> I think whether or not the suggested fix is sufficient depends on what
>> paths can reach CMSCollector::do_compaction_**work(). If all paths that
>> can reach CMSCollector::do_compaction_**work() come from
>> GenCollectedHeap::do_**collection(), then the fix should be good to go.
>> Otherwise it'll need a better workaround.
>>
>> I believe all concurrent mode failures/interrupts (which includes the
>> System.gc() case) does come from GenCollectedHeap::do_**collection(), but
>> I'm not exactly sure about this, could anybody please clarify on it?
>>
>
> Yes, i believe this is indeed the case, and my browsing of the
> code using cscope seemed to confirm that belief.
>
> More below ...
>
>
>> Regards,
>> Kris Mok
>>
>>
>> On Thu, Jul 28, 2011 at 10:12 PM, Kevin Walls <kevin.walls at oracle.com<mailto:
>> kevin.walls at oracle.com**>> wrote:
>>
>>    __
>>    Hi --
>>
>>    6580448 was marked as a duplicate of 6581734, which fixed the fact
>>    that CMS collections were just not counted at all - with CMS, only a
>>    stop the world full gc would be counted in the stats.
>>
>>    But looks like you're right...  Here is a quick variation of the
>>    testcase from 6581734 which shows the same thing, and this verifies
>>    the same, and is solved by ExplicitGCInvokesConcurrent.  If there is
>>    no other feedback I can test if the removal of the
>>    TraceCMSMemoryManagerStats() call in
>>    CMSCollector::do_compaction_**work is all we need...
>>
>
>
> Kevin, yes, it would be great if you could verify this and push the fix.
> I am not sure if the push would need to wait for the signing of OCA
> from Kris, but best to check with Those Who Would Know Such Things.
>
> Since the original CR has been closed, i'll open one momentarily and
> can make you RE (if that's OK with you). I'll be happy to serve as
> reviewer of the change.
>
> As regards the jstat counter reporting two pauses per concurrent
> CMS cycle, I am of two minds on what the original intention
> was. I'd have originally regarded the double increment as a
> bug, but as you state it is really two pauses, even if part of
> a single cycle. And it makes sense to count them as two. I
> agree that this should be documented and left alone, given
> how long we have had this behaviour, and the alternative
> (of counting cycles, rather than pauses) may be no better
> (or arguably worse). There's actually an open CR for this
> which we can redirect into a CR to update the relevant documentation.
>
> -- ramki
>
>
>>    Regards
>>    Kevin
>>
>>
>>    /*
>>     * Copyright (c) 2011, Oracle and/or its affiliates. All rights
>>    reserved.
>>     * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>>     *
>>     * This code is free software; you can redistribute it and/or modify it
>>     * under the terms of the GNU General Public License version 2 only, as
>>     * published by the Free Software Foundation.
>>     *
>>     * This code is distributed in the hope that it will be useful, but
>>    WITHOUT
>>     * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
>> or
>>     * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
>> License
>>     * version 2 for more details (a copy is included in the LICENSE
>>    file that
>>     * accompanied this code).
>>     *
>>     * You should have received a copy of the GNU General Public License
>>    version
>>     * 2 along with this work; if not, write to the Free Software
>>    Foundation,
>>     * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
>>     *
>>     * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA
>>    94065 USA
>>     * or visit www.oracle.com <http://www.oracle.com> if you need
>>
>>    additional information or have any
>>     * questions.
>>     */
>>
>>    /*
>>     * @test TestFullGCount.java
>>     * @bug
>>     * @summary
>>     * @run main/othervm -XX:+UseConcMarkSweepGC TestFullGCCount
>>     *
>>     */
>>    import java.util.*;
>>    import java.lang.management.*;
>>
>>
>>
>>    public class TestFullGCCount {
>>
>>    private String poolName = "CMS";
>>    private String collectorName = "ConcurrentMarkSweep";
>>
>>    public static void main(String [] args) {
>>
>>        TestFullGCCount t = null;
>>        if (args.length==2) {
>>            t = new TestFullGCCount(args[0], args[1]);
>>        } else {
>>            System.out.println("Defaulting to monitor CMS pool and
>>    collector.");
>>            t = new TestFullGCCount();
>>        }
>>        t.run();
>>
>>    }
>>
>>    public TestFullGCCount(String pool, String collector) {
>>        poolName = pool;
>>        collectorName = collector;
>>    }
>>    public TestFullGCCount() {
>>    }
>>
>>    public void run() {
>>
>>        int count = 0;
>>        int iterations = 20;
>>        long counts[] = new long[iterations];
>>        boolean diffAlways2 = true; // assume we will fail
>>
>>        for (int i=0; i<iterations; i++) {
>>            System.gc();
>>            counts[i] = checkStats();
>>            if (i>0) {
>>                if (counts[i] - counts[i-1] != 2) {
>>                    diffAlways2 = false;
>>                }
>>            }
>>        }
>>        if (diffAlways2) {
>>            throw new RuntimeException("FAILED: difference in count is
>>    always 2.");
>>        }
>>        System.out.println("Passed.");
>>    }
>>
>>    private long checkStats() {
>>        long count = 0;
>>        List<MemoryPoolMXBean> pools =
>>    ManagementFactory.**getMemoryPoolMXBeans();
>>        List<GarbageCollectorMXBean> collectors =
>>    ManagementFactory.**getGarbageCollectorMXBeans();
>>        for (int i=0; i<collectors.size(); i++) {
>>            GarbageCollectorMXBean collector = collectors.get(i);
>>            String name = collector.getName();
>>            if (name.contains(collectorName)) {
>>                System.out.println(name + ": collection count = "
>>                                   + collector.getCollectionCount()**);
>>                count = collector.getCollectionCount()**;
>>            }
>>        }
>>        return count;
>>
>>    }
>>
>>    }
>>
>>
>>    On 27/07/11 17:12, Krystal Mok wrote:
>>
>>>    Hi all,
>>>
>>>    I've been looking at a strange inconsistency of full GC count
>>>    recorded by jvmstat and JMM counters. I'd like to know which ones
>>>    of the following behaviors are by design, which ones are bugs, and
>>>    which ones are just my misunderstanding. I apologize for making a
>>>    short story long...
>>>
>>>    ==============================**=======================
>>>
>>>    The counters involved:
>>>
>>>    * A jvmstat counter named "sun.gc.collector.1.**invocations" keeps
>>>    track of the number of pauses occured as a result of a major
>>>    collection. It is used by utilities such as jstat as the source of
>>>    "FGC" (full collection count), and the old gen collection count in
>>>    Visual GC. It's updated by an TraceCollectorStats object.
>>>    * A JMM counter, GCMemoryManager::_num_**collections, keeps track of
>>>    the number of collections that have ended. This counter is used as
>>>    HotSpot's implementation of the JMX
>>>    GarbageCollectorMXBean.**getCollectionCount(). It's updated by
>>>    either a TraceMemoryManagerStats object or a
>>>    TraceCMSMemoryManagerStats object.
>>>
>>>    To show the situation, I've made a screenshot of a VisualVM and a
>>>    JConsole running side by side, both are monitoring the VisualVM's
>>>    GC stats:
>>>    http://dl.iteye.com/upload/**attachment/524811/913cb0e1-**
>>> 7add-3ac0-a718-24ca705cad22.**png<http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png>
>>>    (I'll upload the screenshot to somewhere else if anybody can't see it)
>>>    The VisualVM instance is running on JDK6u26, with ParNew+CMS.
>>>    In the screenshot, Visual GC reports that the old gen collection
>>>    count is 20, while JConsole reports 10.
>>>
>>>    I see that there was this bug:
>>>    6580448: CMS: Full GC collection count mismatch between
>>>    GarbageCollectorMXBean and jvmstat (VisualGC)
>>>    I don't think the current implementation has a bug in the sense
>>>    that the two counters don't report the same number.
>>>
>>>    This behavior seems reasonable, but the naming of the value in
>>>    these tools are confusing: both tools say "collections", but
>>>    apparently the number in Visual GC means "number of pauses" where
>>>    as the number in JConsole means "number of collection cycles".
>>>    It'd be great if the difference could be documented somewhere, if
>>>    that's the intended behavior.
>>>
>>>    And then the buggy behavior. Code demo posted on gist:
>>>    https://gist.github.com/**1106263 <https://gist.github.com/1106263>
>>>    Starting from JDK6u23, when using CMS without
>>>    ExplicitGCInvokesConcurrent, System.gc() (or
>>>    Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would
>>>    make the JMM GC counter increment by 2 per invocation, while the
>>>    jvmstat counter is only incremented by 1. I believe the latter is
>>>    correct and the former needs some fixing.
>>>
>>>    ==============================**=======================
>>>
>>>    My understanding of the behavior shown above:
>>>
>>>    1. The concurrent GC part:
>>>
>>>    There are 2 pauses in a CMS concurrent GC cycle, one in the
>>>    initial mark phase, and one in the final remark phase.
>>>    To trigger a concurrent GC cycle, the CMS thread wakes up
>>>    periodically to see if it shouldConcurrentCollect(), and trigger a
>>>    cycle when the predicate returned true, or goes back to sleep if
>>>    the predicate returned false. The whole concurrent GC cycle
>>>    doesn't go through GenCollectedHeap::do_**collection().
>>>
>>>    The jvmstat counter for old gen pauses is updated in
>>>    CMSCollector::do_CMS_**operation(CMS_op_type op), which exactly
>>>    covers both pause phases.
>>>
>>>    The JMM counter, however, is updated in the concurrent sweep
>>>    phase, CMSCollector::sweep(bool asynch), if there was no
>>>    concurrent mode failure; or it is updated in
>>>    CMSCollector::do_compaction_**work(bool clear_all_soft_refs) in case
>>>    of a bailout due to concurrent mode failure (advertised as so in
>>>    the code comments). So that's an increment by 1 per concurrent GC
>>>    cycle, which does reflect the "number of collection cycles", fair
>>>    enough.
>>>
>>>    So far so good.
>>>
>>>    2. The System.gc() part:
>>>
>>>    Without ExplicitGCInvokesConcurrent set, System.gc() does a
>>>    stop-the-world full GC, which consists of only one pause, so
>>>    "number of pauses" would equal "number of collections" in this case.
>>>    It will go into GenCollectedHeap::do_**collection(); both the
>>>    jvmstat and the JMM GC counter gets incremented by 1 here,
>>>
>>>    TraceCollectorStats tcs(_gens[i]->counters());
>>>    TraceMemoryManagerStats tmms(_gens[i]->kind());
>>>
>>>    But, drilling down into:
>>>    _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);
>>>
>>>    That'll eventually go into:
>>>    CMSCollector::acquire_control_**and_collect(bool full, bool
>>>    clear_all_soft_refs)
>>>
>>>    System.gc() is user requested so that'll go further into
>>>    mark-sweep-compact:
>>>    CMSCollector::do_compaction_**work(bool clear_all_soft_refs)
>>>    And here, it increments the JMM GC counter again (remember it was
>>>    in the concurrent GC path too, to handle bailouts), even though
>>>    this is still in the same collection. This leads to the "buggy
>>>    behavior" mentioned earlier.
>>>
>>>    The JMM GC counter wasn't added to CMS until this fix got in:
>>>    6581734: CMS Old Gen's collection usage is zero after GC which is
>>>    incorrect
>>>
>>>    The code added to CMSCollector::do_compaction_**work() works fine in
>>>    the concurrent GC path, but interacts badly with the original
>>>    logic in GenCollectedHeap::do_**collection().
>>>
>>>    ==============================**=======================
>>>
>>>    I thought all concurrent mode failures/interrupts come from
>>>    GenCollectedHeap::do_**collection(). If that's the case, then it
>>>    seems unnecessary to update the JMM GC counter in
>>>    CMSCollector::do_compaction_**work(), simply removing it should fix
>>>    the problem.
>>>
>>>    With that, I'd purpose the following (XS) change: (diff against HS20)
>>>
>>>    diff -r f0f676c5a2c6
>>>    src/share/vm/gc_**implementation/**concurrentMarkSweep/**
>>> concurrentMarkSweepGeneration.**cpp
>>>    ---
>>>    a/src/share/vm/gc_**implementation/**concurrentMarkSweep/**
>>> concurrentMarkSweepGeneration.**cpp
>>>         Tue Mar 15 19:30:16 2011 -0700
>>>    +++
>>>    b/src/share/vm/gc_**implementation/**concurrentMarkSweep/**
>>> concurrentMarkSweepGeneration.**cpp
>>>         Thu Jul 28 00:02:41 2011 +0800
>>>    @@ -2022,9 +2022,6 @@
>>>
>>>  _intra_sweep_estimate.padded_**average());
>>>       }
>>>
>>>    -  {
>>>    -    TraceCMSMemoryManagerStats();
>>>    -  }
>>>       GenMarkSweep::invoke_at_**safepoint(_cmsGen->level(),
>>>         ref_processor(), clear_all_soft_refs);
>>>       #ifdef ASSERT
>>>
>>>    The same goes for the changes in:
>>>    7036199: Adding a notification to the implementation of
>>>    GarbageCollectorMXBeans
>>>
>>>    ==============================**=======================
>>>
>>>    P.S. Is there an "official" name for the counters that I referred
>>>    to as "jvmstat counters" above? Is it just jvmstat, or PerfData or
>>>    HSPERFDATA?
>>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20110729/a8780e3e/attachment.htm>


More information about the hotspot-gc-dev mailing list