Review request: 7072527 CMS: JMM GC counters overcount in some cases
Ramki Ramakrishna
y.s.ramakrishna at oracle.com
Fri Jul 29 16:42:37 UTC 2011
On 7/29/2011 9:35 AM, Kevin Walls wrote:
> Hi --
>
> I got this ready in a webrev:
>
> http://cr.openjdk.java.net/~kevinw/7072527/webrev.00/
>
> The new test is passing, as are the old ones. 8-)
>
> Also, this change means that a CMS cycle ending in concurrent mode
> failure now counts as one collection. One could argue that either way
> (should it be 2 collections?) - but I'm thinking if we are counting
> "completed" collections then we are now counting correctly!
I'd tend to agree with that interpretation.
I'll try and get to the webrev later today.
thanks!
-- ramki
>
> Thanks
> Kevin
>
>
> On 29/07/11 02:08, Y. S. Ramakrishna wrote:
>> I filed: 7072527 CMS: JMM GC counters overcount in some cases
>>
>> On 07/28/11 17:29, Y. S. Ramakrishna wrote:
>>> Hi Kevin --
>>>
>>> thanks for jumping on this! More inline below ...
>>>
>>> On 07/28/11 09:33, Krystal Mok wrote:
>>>> Hi Kevin,
>>>>
>>>> Thank you for taking care of this, and it's good to see the problem
>>>> is verified.
>>>>
>>>> I think whether or not the suggested fix is sufficient depends on
>>>> what paths can reach CMSCollector::do_compaction_work(). If all
>>>> paths that can reach CMSCollector::do_compaction_work() come from
>>>> GenCollectedHeap::do_collection(), then the fix should be good to
>>>> go. Otherwise it'll need a better workaround.
>>>>
>>>> I believe all concurrent mode failures/interrupts (which includes
>>>> the System.gc() case) does come from
>>>> GenCollectedHeap::do_collection(), but I'm not exactly sure about
>>>> this, could anybody please clarify on it?
>>>
>>> Yes, i believe this is indeed the case, and my browsing of the
>>> code using cscope seemed to confirm that belief.
>>>
>>> More below ...
>>>
>>>>
>>>> Regards,
>>>> Kris Mok
>>>>
>>>> On Thu, Jul 28, 2011 at 10:12 PM, Kevin Walls
>>>> <kevin.walls at oracle.com <mailto:kevin.walls at oracle.com>> wrote:
>>>>
>>>> __
>>>> Hi --
>>>>
>>>> 6580448 was marked as a duplicate of 6581734, which fixed the fact
>>>> that CMS collections were just not counted at all - with CMS,
>>>> only a
>>>> stop the world full gc would be counted in the stats.
>>>>
>>>> But looks like you're right... Here is a quick variation of the
>>>> testcase from 6581734 which shows the same thing, and this
>>>> verifies
>>>> the same, and is solved by ExplicitGCInvokesConcurrent. If
>>>> there is
>>>> no other feedback I can test if the removal of the
>>>> TraceCMSMemoryManagerStats() call in
>>>> CMSCollector::do_compaction_work is all we need...
>>>
>>>
>>> Kevin, yes, it would be great if you could verify this and push the
>>> fix.
>>> I am not sure if the push would need to wait for the signing of OCA
>>> from Kris, but best to check with Those Who Would Know Such Things.
>>>
>>> Since the original CR has been closed, i'll open one momentarily and
>>> can make you RE (if that's OK with you). I'll be happy to serve as
>>> reviewer of the change.
>>>
>>> As regards the jstat counter reporting two pauses per concurrent
>>> CMS cycle, I am of two minds on what the original intention
>>> was. I'd have originally regarded the double increment as a
>>> bug, but as you state it is really two pauses, even if part of
>>> a single cycle. And it makes sense to count them as two. I
>>> agree that this should be documented and left alone, given
>>> how long we have had this behaviour, and the alternative
>>> (of counting cycles, rather than pauses) may be no better
>>> (or arguably worse). There's actually an open CR for this
>>> which we can redirect into a CR to update the relevant documentation.
>>>
>>> -- ramki
>>>
>>>>
>>>> Regards
>>>> Kevin
>>>>
>>>>
>>>> /*
>>>> * Copyright (c) 2011, Oracle and/or its affiliates. All rights
>>>> reserved.
>>>> * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>>>> *
>>>> * This code is free software; you can redistribute it and/or
>>>> modify it
>>>> * under the terms of the GNU General Public License version 2
>>>> only, as
>>>> * published by the Free Software Foundation.
>>>> *
>>>> * This code is distributed in the hope that it will be useful,
>>>> but
>>>> WITHOUT
>>>> * ANY WARRANTY; without even the implied warranty of
>>>> MERCHANTABILITY or
>>>> * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
>>>> Public License
>>>> * version 2 for more details (a copy is included in the LICENSE
>>>> file that
>>>> * accompanied this code).
>>>> *
>>>> * You should have received a copy of the GNU General Public
>>>> License
>>>> version
>>>> * 2 along with this work; if not, write to the Free Software
>>>> Foundation,
>>>> * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
>>>> *
>>>> * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA
>>>> 94065 USA
>>>> * or visit www.oracle.com <http://www.oracle.com> if you need
>>>> additional information or have any
>>>> * questions.
>>>> */
>>>>
>>>> /*
>>>> * @test TestFullGCount.java
>>>> * @bug
>>>> * @summary
>>>> * @run main/othervm -XX:+UseConcMarkSweepGC TestFullGCCount
>>>> *
>>>> */
>>>> import java.util.*;
>>>> import java.lang.management.*;
>>>>
>>>>
>>>>
>>>> public class TestFullGCCount {
>>>>
>>>> private String poolName = "CMS";
>>>> private String collectorName = "ConcurrentMarkSweep";
>>>>
>>>> public static void main(String [] args) {
>>>>
>>>> TestFullGCCount t = null;
>>>> if (args.length==2) {
>>>> t = new TestFullGCCount(args[0], args[1]);
>>>> } else {
>>>> System.out.println("Defaulting to monitor CMS pool and
>>>> collector.");
>>>> t = new TestFullGCCount();
>>>> }
>>>> t.run();
>>>>
>>>> }
>>>>
>>>> public TestFullGCCount(String pool, String collector) {
>>>> poolName = pool;
>>>> collectorName = collector;
>>>> }
>>>> public TestFullGCCount() {
>>>> }
>>>>
>>>> public void run() {
>>>>
>>>> int count = 0;
>>>> int iterations = 20;
>>>> long counts[] = new long[iterations];
>>>> boolean diffAlways2 = true; // assume we will fail
>>>>
>>>> for (int i=0; i<iterations; i++) {
>>>> System.gc();
>>>> counts[i] = checkStats();
>>>> if (i>0) {
>>>> if (counts[i] - counts[i-1] != 2) {
>>>> diffAlways2 = false;
>>>> }
>>>> }
>>>> }
>>>> if (diffAlways2) {
>>>> throw new RuntimeException("FAILED: difference in count is
>>>> always 2.");
>>>> }
>>>> System.out.println("Passed.");
>>>> }
>>>>
>>>> private long checkStats() {
>>>> long count = 0;
>>>> List<MemoryPoolMXBean> pools =
>>>> ManagementFactory.getMemoryPoolMXBeans();
>>>> List<GarbageCollectorMXBean> collectors =
>>>> ManagementFactory.getGarbageCollectorMXBeans();
>>>> for (int i=0; i<collectors.size(); i++) {
>>>> GarbageCollectorMXBean collector = collectors.get(i);
>>>> String name = collector.getName();
>>>> if (name.contains(collectorName)) {
>>>> System.out.println(name + ": collection count = "
>>>> + collector.getCollectionCount());
>>>> count = collector.getCollectionCount();
>>>> }
>>>> }
>>>> return count;
>>>>
>>>> }
>>>>
>>>> }
>>>>
>>>>
>>>> On 27/07/11 17:12, Krystal Mok wrote:
>>>>> Hi all,
>>>>>
>>>>> I've been looking at a strange inconsistency of full GC count
>>>>> recorded by jvmstat and JMM counters. I'd like to know which ones
>>>>> of the following behaviors are by design, which ones are bugs,
>>>>> and
>>>>> which ones are just my misunderstanding. I apologize for making a
>>>>> short story long...
>>>>>
>>>>> =====================================================
>>>>>
>>>>> The counters involved:
>>>>>
>>>>> * A jvmstat counter named "sun.gc.collector.1.invocations" keeps
>>>>> track of the number of pauses occured as a result of a major
>>>>> collection. It is used by utilities such as jstat as the
>>>>> source of
>>>>> "FGC" (full collection count), and the old gen collection
>>>>> count in
>>>>> Visual GC. It's updated by an TraceCollectorStats object.
>>>>> * A JMM counter, GCMemoryManager::_num_collections, keeps
>>>>> track of
>>>>> the number of collections that have ended. This counter is
>>>>> used as
>>>>> HotSpot's implementation of the JMX
>>>>> GarbageCollectorMXBean.getCollectionCount(). It's updated by
>>>>> either a TraceMemoryManagerStats object or a
>>>>> TraceCMSMemoryManagerStats object.
>>>>>
>>>>> To show the situation, I've made a screenshot of a VisualVM and a
>>>>> JConsole running side by side, both are monitoring the VisualVM's
>>>>> GC stats:
>>>>>
>>>>> http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png
>>>>>
>>>>> (I'll upload the screenshot to somewhere else if anybody can't
>>>>> see it)
>>>>> The VisualVM instance is running on JDK6u26, with ParNew+CMS.
>>>>> In the screenshot, Visual GC reports that the old gen collection
>>>>> count is 20, while JConsole reports 10.
>>>>>
>>>>> I see that there was this bug:
>>>>> 6580448: CMS: Full GC collection count mismatch between
>>>>> GarbageCollectorMXBean and jvmstat (VisualGC)
>>>>> I don't think the current implementation has a bug in the sense
>>>>> that the two counters don't report the same number.
>>>>>
>>>>> This behavior seems reasonable, but the naming of the value in
>>>>> these tools are confusing: both tools say "collections", but
>>>>> apparently the number in Visual GC means "number of pauses" where
>>>>> as the number in JConsole means "number of collection cycles".
>>>>> It'd be great if the difference could be documented somewhere, if
>>>>> that's the intended behavior.
>>>>>
>>>>> And then the buggy behavior. Code demo posted on gist:
>>>>> https://gist.github.com/1106263
>>>>> Starting from JDK6u23, when using CMS without
>>>>> ExplicitGCInvokesConcurrent, System.gc() (or
>>>>> Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would
>>>>> make the JMM GC counter increment by 2 per invocation, while the
>>>>> jvmstat counter is only incremented by 1. I believe the latter is
>>>>> correct and the former needs some fixing.
>>>>>
>>>>> =====================================================
>>>>>
>>>>> My understanding of the behavior shown above:
>>>>>
>>>>> 1. The concurrent GC part:
>>>>>
>>>>> There are 2 pauses in a CMS concurrent GC cycle, one in the
>>>>> initial mark phase, and one in the final remark phase.
>>>>> To trigger a concurrent GC cycle, the CMS thread wakes up
>>>>> periodically to see if it shouldConcurrentCollect(), and
>>>>> trigger a
>>>>> cycle when the predicate returned true, or goes back to sleep if
>>>>> the predicate returned false. The whole concurrent GC cycle
>>>>> doesn't go through GenCollectedHeap::do_collection().
>>>>>
>>>>> The jvmstat counter for old gen pauses is updated in
>>>>> CMSCollector::do_CMS_operation(CMS_op_type op), which exactly
>>>>> covers both pause phases.
>>>>>
>>>>> The JMM counter, however, is updated in the concurrent sweep
>>>>> phase, CMSCollector::sweep(bool asynch), if there was no
>>>>> concurrent mode failure; or it is updated in
>>>>> CMSCollector::do_compaction_work(bool clear_all_soft_refs) in
>>>>> case
>>>>> of a bailout due to concurrent mode failure (advertised as so in
>>>>> the code comments). So that's an increment by 1 per concurrent GC
>>>>> cycle, which does reflect the "number of collection cycles", fair
>>>>> enough.
>>>>>
>>>>> So far so good.
>>>>>
>>>>> 2. The System.gc() part:
>>>>>
>>>>> Without ExplicitGCInvokesConcurrent set, System.gc() does a
>>>>> stop-the-world full GC, which consists of only one pause, so
>>>>> "number of pauses" would equal "number of collections" in this
>>>>> case.
>>>>> It will go into GenCollectedHeap::do_collection(); both the
>>>>> jvmstat and the JMM GC counter gets incremented by 1 here,
>>>>>
>>>>> TraceCollectorStats tcs(_gens[i]->counters());
>>>>> TraceMemoryManagerStats tmms(_gens[i]->kind());
>>>>>
>>>>> But, drilling down into:
>>>>> _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);
>>>>>
>>>>> That'll eventually go into:
>>>>> CMSCollector::acquire_control_and_collect(bool full, bool
>>>>> clear_all_soft_refs)
>>>>>
>>>>> System.gc() is user requested so that'll go further into
>>>>> mark-sweep-compact:
>>>>> CMSCollector::do_compaction_work(bool clear_all_soft_refs)
>>>>> And here, it increments the JMM GC counter again (remember it was
>>>>> in the concurrent GC path too, to handle bailouts), even though
>>>>> this is still in the same collection. This leads to the "buggy
>>>>> behavior" mentioned earlier.
>>>>>
>>>>> The JMM GC counter wasn't added to CMS until this fix got in:
>>>>> 6581734: CMS Old Gen's collection usage is zero after GC which is
>>>>> incorrect
>>>>>
>>>>> The code added to CMSCollector::do_compaction_work() works
>>>>> fine in
>>>>> the concurrent GC path, but interacts badly with the original
>>>>> logic in GenCollectedHeap::do_collection().
>>>>>
>>>>> =====================================================
>>>>>
>>>>> I thought all concurrent mode failures/interrupts come from
>>>>> GenCollectedHeap::do_collection(). If that's the case, then it
>>>>> seems unnecessary to update the JMM GC counter in
>>>>> CMSCollector::do_compaction_work(), simply removing it should fix
>>>>> the problem.
>>>>>
>>>>> With that, I'd purpose the following (XS) change: (diff
>>>>> against HS20)
>>>>>
>>>>> diff -r f0f676c5a2c6
>>>>>
>>>>> src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>>>>>
>>>>> ---
>>>>>
>>>>> a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>>>>>
>>>>> Tue Mar 15 19:30:16 2011 -0700
>>>>> +++
>>>>>
>>>>> b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>>>>>
>>>>> Thu Jul 28 00:02:41 2011 +0800
>>>>> @@ -2022,9 +2022,6 @@
>>>>>
>>>>> _intra_sweep_estimate.padded_average());
>>>>> }
>>>>>
>>>>> - {
>>>>> - TraceCMSMemoryManagerStats();
>>>>> - }
>>>>> GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
>>>>> ref_processor(), clear_all_soft_refs);
>>>>> #ifdef ASSERT
>>>>>
>>>>> The same goes for the changes in:
>>>>> 7036199: Adding a notification to the implementation of
>>>>> GarbageCollectorMXBeans
>>>>>
>>>>> =====================================================
>>>>>
>>>>> P.S. Is there an "official" name for the counters that I referred
>>>>> to as "jvmstat counters" above? Is it just jvmstat, or
>>>>> PerfData or
>>>>> HSPERFDATA?
>>>>
>>>>
>
More information about the hotspot-gc-dev
mailing list