Review request: 7072527 CMS: JMM GC counters overcount in some cases
Kevin Walls
kevin.walls at oracle.com
Wed Aug 3 11:56:38 UTC 2011
Thanks Ramki, and thanks for chasing up and clarifying the OCA point.
Kevin
On 01/08/11 19:35, Ramki Ramakrishna wrote:
> Looks good to me; ship it!
>
> -- ramki
>
> On 7/29/2011 4:13 PM, Kevin Walls wrote:
>>
>> Actually I missed the other comment that we could go further and
>> remove the unneeded constructor.
>>
>> http://cr.openjdk.java.net/~kevinw/7072527/webrev.01/
>>
>> Thanks
>> Kevin
>>
>> On 29/07/11 17:35, Kevin Walls wrote:
>>> Hi --
>>>
>>> I got this ready in a webrev:
>>>
>>> http://cr.openjdk.java.net/~kevinw/7072527/webrev.00/
>>>
>>> The new test is passing, as are the old ones. 8-)
>>>
>>> Also, this change means that a CMS cycle ending in concurrent mode
>>> failure now counts as one collection. One could argue that either
>>> way (should it be 2 collections?) - but I'm thinking if we are
>>> counting "completed" collections then we are now counting correctly!
>>>
>>> Thanks
>>> Kevin
>>>
>>>
>>> On 29/07/11 02:08, Y. S. Ramakrishna wrote:
>>>> I filed: 7072527 CMS: JMM GC counters overcount in some cases
>>>>
>>>> On 07/28/11 17:29, Y. S. Ramakrishna wrote:
>>>>> Hi Kevin --
>>>>>
>>>>> thanks for jumping on this! More inline below ...
>>>>>
>>>>> On 07/28/11 09:33, Krystal Mok wrote:
>>>>>> Hi Kevin,
>>>>>>
>>>>>> Thank you for taking care of this, and it's good to see the
>>>>>> problem is verified.
>>>>>>
>>>>>> I think whether or not the suggested fix is sufficient depends on
>>>>>> what paths can reach CMSCollector::do_compaction_work(). If all
>>>>>> paths that can reach CMSCollector::do_compaction_work() come from
>>>>>> GenCollectedHeap::do_collection(), then the fix should be good to
>>>>>> go. Otherwise it'll need a better workaround.
>>>>>>
>>>>>> I believe all concurrent mode failures/interrupts (which includes
>>>>>> the System.gc() case) does come from
>>>>>> GenCollectedHeap::do_collection(), but I'm not exactly sure about
>>>>>> this, could anybody please clarify on it?
>>>>>
>>>>> Yes, i believe this is indeed the case, and my browsing of the
>>>>> code using cscope seemed to confirm that belief.
>>>>>
>>>>> More below ...
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Kris Mok
>>>>>>
>>>>>> On Thu, Jul 28, 2011 at 10:12 PM, Kevin Walls
>>>>>> <kevin.walls at oracle.com <mailto:kevin.walls at oracle.com>> wrote:
>>>>>>
>>>>>> __
>>>>>> Hi --
>>>>>>
>>>>>> 6580448 was marked as a duplicate of 6581734, which fixed the
>>>>>> fact
>>>>>> that CMS collections were just not counted at all - with CMS,
>>>>>> only a
>>>>>> stop the world full gc would be counted in the stats.
>>>>>>
>>>>>> But looks like you're right... Here is a quick variation of the
>>>>>> testcase from 6581734 which shows the same thing, and this
>>>>>> verifies
>>>>>> the same, and is solved by ExplicitGCInvokesConcurrent. If
>>>>>> there is
>>>>>> no other feedback I can test if the removal of the
>>>>>> TraceCMSMemoryManagerStats() call in
>>>>>> CMSCollector::do_compaction_work is all we need...
>>>>>
>>>>>
>>>>> Kevin, yes, it would be great if you could verify this and push
>>>>> the fix.
>>>>> I am not sure if the push would need to wait for the signing of OCA
>>>>> from Kris, but best to check with Those Who Would Know Such Things.
>>>>>
>>>>> Since the original CR has been closed, i'll open one momentarily and
>>>>> can make you RE (if that's OK with you). I'll be happy to serve as
>>>>> reviewer of the change.
>>>>>
>>>>> As regards the jstat counter reporting two pauses per concurrent
>>>>> CMS cycle, I am of two minds on what the original intention
>>>>> was. I'd have originally regarded the double increment as a
>>>>> bug, but as you state it is really two pauses, even if part of
>>>>> a single cycle. And it makes sense to count them as two. I
>>>>> agree that this should be documented and left alone, given
>>>>> how long we have had this behaviour, and the alternative
>>>>> (of counting cycles, rather than pauses) may be no better
>>>>> (or arguably worse). There's actually an open CR for this
>>>>> which we can redirect into a CR to update the relevant documentation.
>>>>>
>>>>> -- ramki
>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Kevin
>>>>>>
>>>>>>
>>>>>> /*
>>>>>> * Copyright (c) 2011, Oracle and/or its affiliates. All rights
>>>>>> reserved.
>>>>>> * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>>>>>> *
>>>>>> * This code is free software; you can redistribute it and/or
>>>>>> modify it
>>>>>> * under the terms of the GNU General Public License version
>>>>>> 2 only, as
>>>>>> * published by the Free Software Foundation.
>>>>>> *
>>>>>> * This code is distributed in the hope that it will be
>>>>>> useful, but
>>>>>> WITHOUT
>>>>>> * ANY WARRANTY; without even the implied warranty of
>>>>>> MERCHANTABILITY or
>>>>>> * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
>>>>>> Public License
>>>>>> * version 2 for more details (a copy is included in the LICENSE
>>>>>> file that
>>>>>> * accompanied this code).
>>>>>> *
>>>>>> * You should have received a copy of the GNU General Public
>>>>>> License
>>>>>> version
>>>>>> * 2 along with this work; if not, write to the Free Software
>>>>>> Foundation,
>>>>>> * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
>>>>>> *
>>>>>> * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA
>>>>>> 94065 USA
>>>>>> * or visit www.oracle.com <http://www.oracle.com> if you need
>>>>>> additional information or have any
>>>>>> * questions.
>>>>>> */
>>>>>>
>>>>>> /*
>>>>>> * @test TestFullGCount.java
>>>>>> * @bug
>>>>>> * @summary
>>>>>> * @run main/othervm -XX:+UseConcMarkSweepGC TestFullGCCount
>>>>>> *
>>>>>> */
>>>>>> import java.util.*;
>>>>>> import java.lang.management.*;
>>>>>>
>>>>>>
>>>>>>
>>>>>> public class TestFullGCCount {
>>>>>>
>>>>>> private String poolName = "CMS";
>>>>>> private String collectorName = "ConcurrentMarkSweep";
>>>>>>
>>>>>> public static void main(String [] args) {
>>>>>>
>>>>>> TestFullGCCount t = null;
>>>>>> if (args.length==2) {
>>>>>> t = new TestFullGCCount(args[0], args[1]);
>>>>>> } else {
>>>>>> System.out.println("Defaulting to monitor CMS pool and
>>>>>> collector.");
>>>>>> t = new TestFullGCCount();
>>>>>> }
>>>>>> t.run();
>>>>>>
>>>>>> }
>>>>>>
>>>>>> public TestFullGCCount(String pool, String collector) {
>>>>>> poolName = pool;
>>>>>> collectorName = collector;
>>>>>> }
>>>>>> public TestFullGCCount() {
>>>>>> }
>>>>>>
>>>>>> public void run() {
>>>>>>
>>>>>> int count = 0;
>>>>>> int iterations = 20;
>>>>>> long counts[] = new long[iterations];
>>>>>> boolean diffAlways2 = true; // assume we will fail
>>>>>>
>>>>>> for (int i=0; i<iterations; i++) {
>>>>>> System.gc();
>>>>>> counts[i] = checkStats();
>>>>>> if (i>0) {
>>>>>> if (counts[i] - counts[i-1] != 2) {
>>>>>> diffAlways2 = false;
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>> if (diffAlways2) {
>>>>>> throw new RuntimeException("FAILED: difference in
>>>>>> count is
>>>>>> always 2.");
>>>>>> }
>>>>>> System.out.println("Passed.");
>>>>>> }
>>>>>>
>>>>>> private long checkStats() {
>>>>>> long count = 0;
>>>>>> List<MemoryPoolMXBean> pools =
>>>>>> ManagementFactory.getMemoryPoolMXBeans();
>>>>>> List<GarbageCollectorMXBean> collectors =
>>>>>> ManagementFactory.getGarbageCollectorMXBeans();
>>>>>> for (int i=0; i<collectors.size(); i++) {
>>>>>> GarbageCollectorMXBean collector = collectors.get(i);
>>>>>> String name = collector.getName();
>>>>>> if (name.contains(collectorName)) {
>>>>>> System.out.println(name + ": collection count = "
>>>>>> +
>>>>>> collector.getCollectionCount());
>>>>>> count = collector.getCollectionCount();
>>>>>> }
>>>>>> }
>>>>>> return count;
>>>>>>
>>>>>> }
>>>>>>
>>>>>> }
>>>>>>
>>>>>>
>>>>>> On 27/07/11 17:12, Krystal Mok wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I've been looking at a strange inconsistency of full GC count
>>>>>>> recorded by jvmstat and JMM counters. I'd like to know which
>>>>>>> ones
>>>>>>> of the following behaviors are by design, which ones are
>>>>>>> bugs, and
>>>>>>> which ones are just my misunderstanding. I apologize for
>>>>>>> making a
>>>>>>> short story long...
>>>>>>>
>>>>>>> =====================================================
>>>>>>>
>>>>>>> The counters involved:
>>>>>>>
>>>>>>> * A jvmstat counter named "sun.gc.collector.1.invocations"
>>>>>>> keeps
>>>>>>> track of the number of pauses occured as a result of a major
>>>>>>> collection. It is used by utilities such as jstat as the
>>>>>>> source of
>>>>>>> "FGC" (full collection count), and the old gen collection
>>>>>>> count in
>>>>>>> Visual GC. It's updated by an TraceCollectorStats object.
>>>>>>> * A JMM counter, GCMemoryManager::_num_collections, keeps
>>>>>>> track of
>>>>>>> the number of collections that have ended. This counter is
>>>>>>> used as
>>>>>>> HotSpot's implementation of the JMX
>>>>>>> GarbageCollectorMXBean.getCollectionCount(). It's updated by
>>>>>>> either a TraceMemoryManagerStats object or a
>>>>>>> TraceCMSMemoryManagerStats object.
>>>>>>>
>>>>>>> To show the situation, I've made a screenshot of a VisualVM
>>>>>>> and a
>>>>>>> JConsole running side by side, both are monitoring the
>>>>>>> VisualVM's
>>>>>>> GC stats:
>>>>>>>
>>>>>>> http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png
>>>>>>>
>>>>>>> (I'll upload the screenshot to somewhere else if anybody
>>>>>>> can't see it)
>>>>>>> The VisualVM instance is running on JDK6u26, with ParNew+CMS.
>>>>>>> In the screenshot, Visual GC reports that the old gen
>>>>>>> collection
>>>>>>> count is 20, while JConsole reports 10.
>>>>>>>
>>>>>>> I see that there was this bug:
>>>>>>> 6580448: CMS: Full GC collection count mismatch between
>>>>>>> GarbageCollectorMXBean and jvmstat (VisualGC)
>>>>>>> I don't think the current implementation has a bug in the sense
>>>>>>> that the two counters don't report the same number.
>>>>>>>
>>>>>>> This behavior seems reasonable, but the naming of the value in
>>>>>>> these tools are confusing: both tools say "collections", but
>>>>>>> apparently the number in Visual GC means "number of pauses"
>>>>>>> where
>>>>>>> as the number in JConsole means "number of collection cycles".
>>>>>>> It'd be great if the difference could be documented
>>>>>>> somewhere, if
>>>>>>> that's the intended behavior.
>>>>>>>
>>>>>>> And then the buggy behavior. Code demo posted on gist:
>>>>>>> https://gist.github.com/1106263
>>>>>>> Starting from JDK6u23, when using CMS without
>>>>>>> ExplicitGCInvokesConcurrent, System.gc() (or
>>>>>>> Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would
>>>>>>> make the JMM GC counter increment by 2 per invocation, while
>>>>>>> the
>>>>>>> jvmstat counter is only incremented by 1. I believe the
>>>>>>> latter is
>>>>>>> correct and the former needs some fixing.
>>>>>>>
>>>>>>> =====================================================
>>>>>>>
>>>>>>> My understanding of the behavior shown above:
>>>>>>>
>>>>>>> 1. The concurrent GC part:
>>>>>>>
>>>>>>> There are 2 pauses in a CMS concurrent GC cycle, one in the
>>>>>>> initial mark phase, and one in the final remark phase.
>>>>>>> To trigger a concurrent GC cycle, the CMS thread wakes up
>>>>>>> periodically to see if it shouldConcurrentCollect(), and
>>>>>>> trigger a
>>>>>>> cycle when the predicate returned true, or goes back to
>>>>>>> sleep if
>>>>>>> the predicate returned false. The whole concurrent GC cycle
>>>>>>> doesn't go through GenCollectedHeap::do_collection().
>>>>>>>
>>>>>>> The jvmstat counter for old gen pauses is updated in
>>>>>>> CMSCollector::do_CMS_operation(CMS_op_type op), which exactly
>>>>>>> covers both pause phases.
>>>>>>>
>>>>>>> The JMM counter, however, is updated in the concurrent sweep
>>>>>>> phase, CMSCollector::sweep(bool asynch), if there was no
>>>>>>> concurrent mode failure; or it is updated in
>>>>>>> CMSCollector::do_compaction_work(bool clear_all_soft_refs)
>>>>>>> in case
>>>>>>> of a bailout due to concurrent mode failure (advertised as
>>>>>>> so in
>>>>>>> the code comments). So that's an increment by 1 per
>>>>>>> concurrent GC
>>>>>>> cycle, which does reflect the "number of collection cycles",
>>>>>>> fair
>>>>>>> enough.
>>>>>>>
>>>>>>> So far so good.
>>>>>>>
>>>>>>> 2. The System.gc() part:
>>>>>>>
>>>>>>> Without ExplicitGCInvokesConcurrent set, System.gc() does a
>>>>>>> stop-the-world full GC, which consists of only one pause, so
>>>>>>> "number of pauses" would equal "number of collections" in
>>>>>>> this case.
>>>>>>> It will go into GenCollectedHeap::do_collection(); both the
>>>>>>> jvmstat and the JMM GC counter gets incremented by 1 here,
>>>>>>>
>>>>>>> TraceCollectorStats tcs(_gens[i]->counters());
>>>>>>> TraceMemoryManagerStats tmms(_gens[i]->kind());
>>>>>>>
>>>>>>> But, drilling down into:
>>>>>>> _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);
>>>>>>>
>>>>>>> That'll eventually go into:
>>>>>>> CMSCollector::acquire_control_and_collect(bool full, bool
>>>>>>> clear_all_soft_refs)
>>>>>>>
>>>>>>> System.gc() is user requested so that'll go further into
>>>>>>> mark-sweep-compact:
>>>>>>> CMSCollector::do_compaction_work(bool clear_all_soft_refs)
>>>>>>> And here, it increments the JMM GC counter again (remember
>>>>>>> it was
>>>>>>> in the concurrent GC path too, to handle bailouts), even though
>>>>>>> this is still in the same collection. This leads to the "buggy
>>>>>>> behavior" mentioned earlier.
>>>>>>>
>>>>>>> The JMM GC counter wasn't added to CMS until this fix got in:
>>>>>>> 6581734: CMS Old Gen's collection usage is zero after GC
>>>>>>> which is
>>>>>>> incorrect
>>>>>>>
>>>>>>> The code added to CMSCollector::do_compaction_work() works
>>>>>>> fine in
>>>>>>> the concurrent GC path, but interacts badly with the original
>>>>>>> logic in GenCollectedHeap::do_collection().
>>>>>>>
>>>>>>> =====================================================
>>>>>>>
>>>>>>> I thought all concurrent mode failures/interrupts come from
>>>>>>> GenCollectedHeap::do_collection(). If that's the case, then it
>>>>>>> seems unnecessary to update the JMM GC counter in
>>>>>>> CMSCollector::do_compaction_work(), simply removing it
>>>>>>> should fix
>>>>>>> the problem.
>>>>>>>
>>>>>>> With that, I'd purpose the following (XS) change: (diff
>>>>>>> against HS20)
>>>>>>>
>>>>>>> diff -r f0f676c5a2c6
>>>>>>>
>>>>>>> src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>>>>>>>
>>>>>>> Tue Mar 15 19:30:16 2011 -0700
>>>>>>> +++
>>>>>>>
>>>>>>> b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>>>>>>>
>>>>>>> Thu Jul 28 00:02:41 2011 +0800
>>>>>>> @@ -2022,9 +2022,6 @@
>>>>>>>
>>>>>>> _intra_sweep_estimate.padded_average());
>>>>>>> }
>>>>>>>
>>>>>>> - {
>>>>>>> - TraceCMSMemoryManagerStats();
>>>>>>> - }
>>>>>>> GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
>>>>>>> ref_processor(), clear_all_soft_refs);
>>>>>>> #ifdef ASSERT
>>>>>>>
>>>>>>> The same goes for the changes in:
>>>>>>> 7036199: Adding a notification to the implementation of
>>>>>>> GarbageCollectorMXBeans
>>>>>>>
>>>>>>> =====================================================
>>>>>>>
>>>>>>> P.S. Is there an "official" name for the counters that I
>>>>>>> referred
>>>>>>> to as "jvmstat counters" above? Is it just jvmstat, or
>>>>>>> PerfData or
>>>>>>> HSPERFDATA?
>>>>>>
>>>>>>
>>>
>>
More information about the hotspot-gc-dev
mailing list