Number of CMS collections and the JMM GC counters

Y. Srinivas Ramakrishna y.s.ramakrishna at oracle.com
Wed Jul 27 10:07:57 PDT 2011


Hi Kris --

I agree with your analysis and will review & shepherd your suggested fix.
I'll file a CR to cover this fix and send out a CR id shortly.

Thanks for the fix. If you have not already done so, please
complete the contributor agreement (OCA) so that your
contribution may be used; see:-
    http://openjdk.java.net/contribute/

-- ramki

On 7/27/2011 9:42 AM, Y. Srinivas Ramakrishna wrote:
> Cross-posting to serviceability-dev list. Please include both lists
> in your responses to this thread.
>
> -- ramki
>
> On 7/27/2011 9:12 AM, Krystal Mok wrote:
>> Hi all,
>>
>> I've been looking at a strange inconsistency of full GC count recorded by
>> jvmstat and JMM counters. I'd like to know which ones of the following
>> behaviors are by design, which ones are bugs, and which ones are just my
>> misunderstanding. I apologize for making a short story long...
>>
>> =====================================================
>>
>> The counters involved:
>>
>> * A jvmstat counter named "sun.gc.collector.1.invocations" keeps track of
>> the number of pauses occured as a result of a major collection. It is used
>> by utilities such as jstat as the source of "FGC" (full collection count),
>> and the old gen collection count in Visual GC. It's updated by an
>> TraceCollectorStats object.
>> * A JMM counter, GCMemoryManager::_num_collections, keeps track of the
>> number of collections that have ended. This counter is used as HotSpot's
>> implementation of the JMX GarbageCollectorMXBean.getCollectionCount(). It's
>> updated by either a TraceMemoryManagerStats object or a
>> TraceCMSMemoryManagerStats object.
>>
>> To show the situation, I've made a screenshot of a VisualVM and a JConsole
>> running side by side, both are monitoring the VisualVM's GC stats:
>> http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png
>> (I'll upload the screenshot to somewhere else if anybody can't see it)
>> The VisualVM instance is running on JDK6u26, with ParNew+CMS.
>> In the screenshot, Visual GC reports that the old gen collection count is
>> 20, while JConsole reports 10.
>>
>> I see that there was this bug:
>> 6580448: CMS: Full GC collection count mismatch between
>> GarbageCollectorMXBean and jvmstat (VisualGC)
>> I don't think the current implementation has a bug in the sense that the two
>> counters don't report the same number.
>>
>> This behavior seems reasonable, but the naming of the value in these tools
>> are confusing: both tools say "collections", but apparently the number in
>> Visual GC means "number of pauses" where as the number in JConsole means
>> "number of collection cycles". It'd be great if the difference could be
>> documented somewhere, if that's the intended behavior.
>>
>> And then the buggy behavior. Code demo posted on gist:
>> https://gist.github.com/1106263
>> Starting from JDK6u23, when using CMS without ExplicitGCInvokesConcurrent,
>> System.gc() (or Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX)
>> would make the JMM GC counter increment by 2 per invocation, while the
>> jvmstat counter is only incremented by 1. I believe the latter is correct
>> and the former needs some fixing.
>>
>> =====================================================
>>
>> My understanding of the behavior shown above:
>>
>> 1. The concurrent GC part:
>>
>> There are 2 pauses in a CMS concurrent GC cycle, one in the initial mark
>> phase, and one in the final remark phase.
>> To trigger a concurrent GC cycle, the CMS thread wakes up periodically to
>> see if it shouldConcurrentCollect(), and trigger a cycle when the predicate
>> returned true, or goes back to sleep if the predicate returned false. The
>> whole concurrent GC cycle doesn't go through
>> GenCollectedHeap::do_collection().
>>
>> The jvmstat counter for old gen pauses is updated in
>> CMSCollector::do_CMS_operation(CMS_op_type op), which exactly covers both
>> pause phases.
>>
>> The JMM counter, however, is updated in the concurrent sweep phase,
>> CMSCollector::sweep(bool asynch), if there was no concurrent mode failure;
>> or it is updated in CMSCollector::do_compaction_work(bool
>> clear_all_soft_refs) in case of a bailout due to concurrent mode failure
>> (advertised as so in the code comments). So that's an increment by 1 per
>> concurrent GC cycle, which does reflect the "number of collection cycles",
>> fair enough.
>>
>> So far so good.
>>
>> 2. The System.gc() part:
>>
>> Without ExplicitGCInvokesConcurrent set, System.gc() does a stop-the-world
>> full GC, which consists of only one pause, so "number of pauses" would equal
>> "number of collections" in this case.
>> It will go into GenCollectedHeap::do_collection(); both the jvmstat and the
>> JMM GC counter gets incremented by 1 here,
>>
>> TraceCollectorStats tcs(_gens[i]->counters());
>> TraceMemoryManagerStats tmms(_gens[i]->kind());
>>
>> But, drilling down into:
>> _gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);
>>
>> That'll eventually go into:
>> CMSCollector::acquire_control_and_collect(bool full, bool
>> clear_all_soft_refs)
>>
>> System.gc() is user requested so that'll go further into mark-sweep-compact:
>> CMSCollector::do_compaction_work(bool clear_all_soft_refs)
>> And here, it increments the JMM GC counter again (remember it was in the
>> concurrent GC path too, to handle bailouts), even though this is still in
>> the same collection. This leads to the "buggy behavior" mentioned earlier.
>>
>> The JMM GC counter wasn't added to CMS until this fix got in:
>> 6581734: CMS Old Gen's collection usage is zero after GC which is incorrect
>>
>> The code added to CMSCollector::do_compaction_work() works fine in the
>> concurrent GC path, but interacts badly with the original logic in
>> GenCollectedHeap::do_collection().
>>
>> =====================================================
>>
>> I thought all concurrent mode failures/interrupts come from
>> GenCollectedHeap::do_collection(). If that's the case, then it seems
>> unnecessary to update the JMM GC counter in
>> CMSCollector::do_compaction_work(), simply removing it should fix the
>> problem.
>>
>> With that, I'd purpose the following (XS) change: (diff against HS20)
>>
>> diff -r f0f676c5a2c6
>> src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>> ---
>> a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>> Tue Mar 15 19:30:16 2011 -0700
>> +++
>> b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>> Thu Jul 28 00:02:41 2011 +0800
>> @@ -2022,9 +2022,6 @@
>>
>> _intra_sweep_estimate.padded_average());
>> }
>>
>> - {
>> - TraceCMSMemoryManagerStats();
>> - }
>> GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
>> ref_processor(), clear_all_soft_refs);
>> #ifdef ASSERT
>>
>> The same goes for the changes in:
>> 7036199: Adding a notification to the implementation of
>> GarbageCollectorMXBeans
>>
>> =====================================================
>>
>> P.S. Is there an "official" name for the counters that I referred to as
>> "jvmstat counters" above? Is it just jvmstat, or PerfData or HSPERFDATA?
>>
>



More information about the serviceability-dev mailing list