Review request: 7072527 CMS: JMM GC counters overcount in some cases
Ramki Ramakrishna
y.s.ramakrishna at oracle.com
Mon Aug 1 18:35:33 UTC 2011
Looks good to me; ship it!
-- ramki
On 7/29/2011 4:13 PM, Kevin Walls wrote:
>
> Actually I missed the other comment that we could go further and
> remove the unneeded constructor.
>
> http://cr.openjdk.java.net/~kevinw/7072527/webrev.01/
>
> Thanks
> Kevin
>
> On 29/07/11 17:35, Kevin Walls wrote:
>> Hi --
>>
>> I got this ready in a webrev:
>>
>> http://cr.openjdk.java.net/~kevinw/7072527/webrev.00/
>>
>> The new test is passing, as are the old ones. 8-)
>>
>> Also, this change means that a CMS cycle ending in concurrent mode
>> failure now counts as one collection. One could argue that either
>> way (should it be 2 collections?) - but I'm thinking if we are
>> counting "completed" collections then we are now counting correctly!
>>
>> Thanks
>> Kevin
>>
>>
>> On 29/07/11 02:08, Y. S. Ramakrishna wrote:
>>> I filed: 7072527 CMS: JMM GC counters overcount in some cases
>>>
>>> On 07/28/11 17:29, Y. S. Ramakrishna wrote:
>>>> Hi Kevin --
>>>>
>>>> thanks for jumping on this! More inline below ...
>>>>
>>>> On 07/28/11 09:33, Krystal Mok wrote:
>>>>> Hi Kevin,
>>>>>
>>>>> Thank you for taking care of this, and it's good to see the
>>>>> problem is verified.
>>>>>
>>>>> I think whether or not the suggested fix is sufficient depends on
>>>>> what paths can reach CMSCollector::do_compaction_work(). If all
>>>>> paths that can reach CMSCollector::do_compaction_work() come from
>>>>> GenCollectedHeap::do_collection(), then the fix should be good to
>>>>> go. Otherwise it'll need a better workaround.
>>>>>
>>>>> I believe all concurrent mode failures/interrupts (which includes
>>>>> the System.gc() case) does come from
>>>>> GenCollectedHeap::do_collection(), but I'm not exactly sure about
>>>>> this, could anybody please clarify on it?
>>>>
>>>> Yes, i believe this is indeed the case, and my browsing of the
>>>> code using cscope seemed to confirm that belief.
>>>>
>>>> More below ...
>>>>
>>>>>
>>>>> Regards,
>>>>> Kris Mok
>>>>>
>>>>> On Thu, Jul 28, 2011 at 10:12 PM, Kevin Walls
>>>>> <kevin.walls at oracle.com <mailto:kevin.walls at oracle.com>> wrote:
>>>>>
>>>>> __
>>>>> Hi --
>>>>>
>>>>> 6580448 was marked as a duplicate of 6581734, which fixed the
>>>>> fact
>>>>> that CMS collections were just not counted at all - with CMS,
>>>>> only a
>>>>> stop the world full gc would be counted in the stats.
>>>>>
>>>>> But looks like you're right... Here is a quick variation of the
>>>>> testcase from 6581734 which shows the same thing, and this
>>>>> verifies
>>>>> the same, and is solved by ExplicitGCInvokesConcurrent. If
>>>>> there is
>>>>> no other feedback I can test if the removal of the
>>>>> TraceCMSMemoryManagerStats() call in
>>>>> CMSCollector::do_compaction_work is all we need...
>>>>
>>>>
>>>> Kevin, yes, it would be great if you could verify this and push the
>>>> fix.
>>>> I am not sure if the push would need to wait for the signing of OCA
>>>> from Kris, but best to check with Those Who Would Know Such Things.
>>>>
>>>> Since the original CR has been closed, i'll open one momentarily and
>>>> can make you RE (if that's OK with you). I'll be happy to serve as
>>>> reviewer of the change.
>>>>
>>>> As regards the jstat counter reporting two pauses per concurrent
>>>> CMS cycle, I am of two minds on what the original intention
>>>> was. I'd have originally regarded the double increment as a
>>>> bug, but as you state it is really two pauses, even if part of
>>>> a single cycle. And it makes sense to count them as two. I
>>>> agree that this should be documented and left alone, given
>>>> how long we have had this behaviour, and the alternative
>>>> (of counting cycles, rather than pauses) may be no better
>>>> (or arguably worse). There's actually an open CR for this
>>>> which we can redirect into a CR to update the relevant documentation.
>>>>
>>>> -- ramki
>>>>
>>>>>
>>>>> Regards
>>>>> Kevin
>>>>>
>>>>>
>>>>> /*
>>>>> * Copyright (c) 2011, Oracle and/or its affiliates. All rights
>>>>> reserved.
>>>>> * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>>>>> *
>>>>> * This code is free software; you can redistribute it and/or
>>>>> modify it
>>>>> * under the terms of the GNU General Public License version 2
>>>>> only, as
>>>>> * published by the Free Software Foundation.
>>>>> *
>>>>> * This code is distributed in the hope that it will be
>>>>> useful, but
>>>>> WITHOUT
>>>>> * ANY WARRANTY; without even the implied warranty of
>>>>> MERCHANTABILITY or
>>>>> * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
>>>>> Public License
>>>>> * version 2 for more details (a copy is included in the LICENSE
>>>>> file that
>>>>> * accompanied this code).
>>>>> *
>>>>> * You should have received a copy of the GNU General Public
>>>>> License
>>>>> version
>>>>> * 2 along with this work; if not, write to the Free Software
>>>>> Foundation,
>>>>> * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
>>>>> *
>>>>> * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA
>>>>> 94065 USA
>>>>> * or visit www.oracle.com <http://www.oracle.com> if you need
>>>>> additional information or have any
>>>>> * questions.
>>>>> */
>>>>>
>>>>> /*
>>>>> * @test TestFullGCount.java
>>>>> * @bug
>>>>> * @summary
>>>>> * @run main/othervm -XX:+UseConcMarkSweepGC TestFullGCCount
>>>>> *
>>>>> */
>>>>> import java.util.*;
>>>>> import java.lang.management.*;
>>>>>
>>>>>
>>>>>
>>>>> public class TestFullGCCount {
>>>>>
>>>>> private String poolName = "CMS";
>>>>> private String collectorName = "ConcurrentMarkSweep";
>>>>>
>>>>> public static void main(String [] args) {
>>>>>
>>>>> TestFullGCCount t = null;
>>>>> if (args.length==2) {
>>>>> t = new TestFullGCCount(args[0], args[1]);
>>>>> } else {
>>>>> System.out.println("Defaulting to monitor CMS pool and
>>>>> collector.");
>>>>> t = new TestFullGCCount();
>>>>> }
>>>>> t.run();
>>>>>
>>>>> }
>>>>>
>>>>> public TestFullGCCount(String pool, String collector) {
>>>>> poolName = pool;
>>>>> collectorName = collector;
>>>>> }
>>>>> public TestFullGCCount() {
>>>>> }
>>>>>
>>>>> public void run() {
>>>>>
>>>>> int count = 0;
>>>>> int iterations = 20;
>>>>> long counts[] = new long[iterations];
>>>>> boolean diffAlways2 = true; // assume we will fail
>>>>>
>>>>> for (int i=0; i<iterations; i++) {
>>>>> System.gc();
>>>>> counts[i] = checkStats();
>>>>> if (i>0) {
>>>>> if (counts[i] - counts[i-1] != 2) {
>>>>> diffAlways2 = false;
>>>>> }
>>>>> }
>>>>> }
>>>>> if (diffAlways2) {
>>>>> throw new RuntimeException("FAILED: difference in
>>>>> count is
>>>>> always 2.");
>>>>> }
>>>>> System.out.println("Passed.");
>>>>> }
>>>>>
>>>>> private long checkStats() {
>>>>> long count = 0;
>>>>> List<MemoryPoolMXBean> pools =
>>>>> ManagementFactory.getMemoryPoolMXBeans();
>>>>> List<GarbageCollectorMXBean> collectors =
>>>>> ManagementFactory.getGarbageCollectorMXBeans();
>>>>> for (int i=0; i<collectors.size(); i++) {
>>>>> GarbageCollectorMXBean collector = collectors.get(i);
>>>>> String name = collector.getName();
>>>>> if (name.contains(collectorName)) {
>>>>> System.out.println(name + ": collection count = "
>>>>> + collector.getCollectionCount());
>>>>> count = collector.getCollectionCount();
>>>>> }
>>>>> }
>>>>> return count;
>>>>>
>>>>> }
>>>>>
>>>>> }
>>>>>
>>>>>
>>>>> On 27/07/11 17:12, Krystal Mok wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I've been looking at a strange inconsistency of full GC count
>>>>>> recorded by jvmstat and JMM counters. I'd like to know which
>>>>>> ones
>>>>>> of the following behaviors are by design, which ones are
>>>>>> bugs, and
>>>>>> which ones are just my misunderstanding. I apologize for
>>>>>> making a
>>>>>> short story long...
>>>>>>
>>>>>> =====================================================
>>>>>>
>>>>>> The counters involved:
>>>>>>
>>>>>> * A jvmstat counter named "sun.gc.collector.1.invocations" keeps
>>>>>> track of the number of pauses occured as a result of a major
>>>>>> collection. It is used by utilities such as jstat as the
>>>>>> source of
>>>>>> "FGC" (full collection count), and the old gen collection
>>>>>> count in
>>>>>> Visual GC. It's updated by an TraceCollectorStats object.
>>>>>> * A JMM counter, GCMemoryManager::_num_collections, keeps
>>>>>> track of
>>>>>> the number of collections that have ended. This counter is
>>>>>> used as
>>>>>> HotSpot's implementation of the JMX
>>>>>> GarbageCollectorMXBean.getCollectionCount(). It's updated by
>>>>>> either a TraceMemoryManagerStats object or a
>>>>>> TraceCMSMemoryManagerStats object.
>>>>>>
>>>>>> To show the situation, I've made a screenshot of a VisualVM
>>>>>> and a
>>>>>> JConsole running side by side, both are monitoring the
>>>>>> VisualVM's
>>>>>> GC stats:
>>>>>>
>>>>>> http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png
>>>>>>
>>>>>> (I'll upload the screenshot to somewhere else if anybody
>>>>>> can't see it)
>>>>>> The VisualVM instance is running on JDK6u26, with ParNew+CMS.
>>>>>> In the screenshot, Visual GC reports that the old gen collection
>>>>>> count is 20, while JConsole reports 10.
>>>>>>
>>>>>> I see that there was this bug:
>>>>>> 6580448: CMS: Full GC collection count mismatch between
>>>>>> GarbageCollectorMXBean and jvmstat (VisualGC)
>>>>>> I don't think the current implementation has a bug in the sense
>>>>>> that the two counters don't report the same number.
>>>>>>
>>>>>> This behavior seems reasonable, but the naming of the value in
>>>>>> these tools are confusing: both tools say "collections", but
>>>>>> apparently the number in Visual GC means "number of pauses"
>>>>>> where
>>>>>> as the number in JConsole means "number of collection cycles".
>>>>>> It'd be great if the difference could be documented
>>>>>> somewhere, if
>>>>>> that's the intended behavior.
>>>>>>
>>>>>> And then the buggy behavior. Code demo posted on gist:
>>>>>> https://gist.github.com/1106263
>>>>>> Starting from JDK6u23, when using CMS without
>>>>>> ExplicitGCInvokesConcurrent, System.gc() (or
>>>>>> Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would
>>>>>> make the JMM GC counter increment by 2 per invocation, while the
>>>>>> jvmstat counter is only incremented by 1. I believe the
>>>>>> latter is
>>>>>> correct and the former needs some fixing.
>>>>>>
>>>>>> =====================================================
>>>>>>
>>>>>> My understanding of the behavior shown above:
>>>>>>
>>>>>> 1. The concurrent GC part:
>>>>>>
>>>>>> There are 2 pauses in a CMS concurrent GC cycle, one in the
>>>>>> initial mark phase, and one in the final remark phase.
>>>>>> To trigger a concurrent GC cycle, the CMS thread wakes up
>>>>>> periodically to see if it shouldConcurrentCollect(), and
>>>>>> trigger a
>>>>>> cycle when the predicate returned true, or goes back to sleep if
>>>>>> the predicate returned false. The whole concurrent GC cycle
>>>>>> doesn't go through GenCollectedHeap::do_collection().
>>>>>>
>>>>>> The jvmstat counter for old gen pauses is updated in
>>>>>> CMSCollector::do_CMS_operation(CMS_op_type op), which exactly
>>>>>> covers both pause phases.
>>>>>>
>>>>>> The JMM counter, however, is updated in the concurrent sweep
>>>>>> phase, CMSCollector::sweep(bool asynch), if there was no
>>>>>> concurrent mode failure; or it is updated in
>>>>>> CMSCollector::do_compaction_work(bool clear_all_soft_refs) in
>>>>>> case
>>>>>> of a bailout due to concurrent mode failure (advertised as so in
>>>>>> the code comments). So that's an increment by 1 per
>>>>>> concurrent GC
>>>>>> cycle, which does reflect the "number of collection cycles",
>>>>>> fair
>>>>>> enough.
>>>>>>
>>>>>> So far so good.
>>>>>>
>>>>>> 2. The System.gc() part:
>>>>>>
>>>>>> Without ExplicitGCInvokesConcurrent set, System.gc() does a
>>>>>> stop-the-world full GC, which consists of only one pause, so
>>>>>> "number of pauses" would equal "number of collections" in
>>>>>> this case.
>>>>>> It will go into GenCollectedHeap::do_collection(); both the
>>>>>> jvmstat and the JMM GC counter gets incremented by 1 here,
>>>>>>
>>>>>> TraceCollectorStats tcs(_gens[i]->counters());
>>>>>> TraceMemoryManagerStats tmms(_gens[i]->kind());
>>>>>>
>>>>>> But, drilling down into:
>>>>>> _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);
>>>>>>
>>>>>> That'll eventually go into:
>>>>>> CMSCollector::acquire_control_and_collect(bool full, bool
>>>>>> clear_all_soft_refs)
>>>>>>
>>>>>> System.gc() is user requested so that'll go further into
>>>>>> mark-sweep-compact:
>>>>>> CMSCollector::do_compaction_work(bool clear_all_soft_refs)
>>>>>> And here, it increments the JMM GC counter again (remember it
>>>>>> was
>>>>>> in the concurrent GC path too, to handle bailouts), even though
>>>>>> this is still in the same collection. This leads to the "buggy
>>>>>> behavior" mentioned earlier.
>>>>>>
>>>>>> The JMM GC counter wasn't added to CMS until this fix got in:
>>>>>> 6581734: CMS Old Gen's collection usage is zero after GC
>>>>>> which is
>>>>>> incorrect
>>>>>>
>>>>>> The code added to CMSCollector::do_compaction_work() works
>>>>>> fine in
>>>>>> the concurrent GC path, but interacts badly with the original
>>>>>> logic in GenCollectedHeap::do_collection().
>>>>>>
>>>>>> =====================================================
>>>>>>
>>>>>> I thought all concurrent mode failures/interrupts come from
>>>>>> GenCollectedHeap::do_collection(). If that's the case, then it
>>>>>> seems unnecessary to update the JMM GC counter in
>>>>>> CMSCollector::do_compaction_work(), simply removing it should
>>>>>> fix
>>>>>> the problem.
>>>>>>
>>>>>> With that, I'd purpose the following (XS) change: (diff
>>>>>> against HS20)
>>>>>>
>>>>>> diff -r f0f676c5a2c6
>>>>>>
>>>>>> src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>>>>>>
>>>>>> Tue Mar 15 19:30:16 2011 -0700
>>>>>> +++
>>>>>>
>>>>>> b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>>>>>>
>>>>>> Thu Jul 28 00:02:41 2011 +0800
>>>>>> @@ -2022,9 +2022,6 @@
>>>>>>
>>>>>> _intra_sweep_estimate.padded_average());
>>>>>> }
>>>>>>
>>>>>> - {
>>>>>> - TraceCMSMemoryManagerStats();
>>>>>> - }
>>>>>> GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
>>>>>> ref_processor(), clear_all_soft_refs);
>>>>>> #ifdef ASSERT
>>>>>>
>>>>>> The same goes for the changes in:
>>>>>> 7036199: Adding a notification to the implementation of
>>>>>> GarbageCollectorMXBeans
>>>>>>
>>>>>> =====================================================
>>>>>>
>>>>>> P.S. Is there an "official" name for the counters that I
>>>>>> referred
>>>>>> to as "jvmstat counters" above? Is it just jvmstat, or
>>>>>> PerfData or
>>>>>> HSPERFDATA?
>>>>>
>>>>>
>>
>
More information about the hotspot-gc-dev
mailing list