Number of CMS collections and the JMM GC counters

Krystal Mok rednaxelafx at gmail.com
Wed Jul 27 16:12:21 UTC 2011


Hi all,

I've been looking at a strange inconsistency of full GC count recorded by
jvmstat and JMM counters. I'd like to know which ones of the following
behaviors are by design, which ones are bugs, and which ones are just my
misunderstanding. I apologize for making a short story long...

=====================================================

The counters involved:

* A jvmstat counter named "sun.gc.collector.1.invocations" keeps track of
the number of pauses occured as a result of a major collection. It is used
by utilities such as jstat as the source of "FGC" (full collection count),
and the old gen collection count in Visual GC. It's updated by an
TraceCollectorStats object.
* A JMM counter, GCMemoryManager::_num_collections, keeps track of the
number of collections that have ended. This counter is used as HotSpot's
implementation of the JMX GarbageCollectorMXBean.getCollectionCount(). It's
updated by either a TraceMemoryManagerStats object or a
TraceCMSMemoryManagerStats object.

To show the situation, I've made a screenshot of a VisualVM and a JConsole
running side by side, both are monitoring the VisualVM's GC stats:
http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png
(I'll upload the screenshot to somewhere else if anybody can't see it)
The VisualVM instance is running on JDK6u26, with ParNew+CMS.
In the screenshot, Visual GC reports that the old gen collection count is
20, while JConsole reports 10.

I see that there was this bug:
6580448: CMS: Full GC collection count mismatch between
GarbageCollectorMXBean and jvmstat (VisualGC)
I don't think the current implementation has a bug in the sense that the two
counters don't report the same number.

This behavior seems reasonable, but the naming of the value in these tools
are confusing: both tools say "collections", but apparently the number in
Visual GC means "number of pauses" where as the number in JConsole means
"number of collection cycles". It'd be great if the difference could be
documented somewhere, if that's the intended behavior.

And then the buggy behavior. Code demo posted on gist:
https://gist.github.com/1106263
Starting from JDK6u23, when using CMS without ExplicitGCInvokesConcurrent,
System.gc() (or Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX)
would make the JMM GC counter increment by 2 per invocation, while the
jvmstat counter is only incremented by 1. I believe the latter is correct
and the former needs some fixing.

=====================================================

My understanding of the behavior shown above:

1. The concurrent GC part:

There are 2 pauses in a CMS concurrent GC cycle, one in the initial mark
phase, and one in the final remark phase.
To trigger a concurrent GC cycle, the CMS thread wakes up periodically to
see if it shouldConcurrentCollect(), and trigger a cycle when the predicate
returned true, or goes back to sleep if the predicate returned false. The
whole concurrent GC cycle doesn't go through
GenCollectedHeap::do_collection().

The jvmstat counter for old gen pauses is updated in
CMSCollector::do_CMS_operation(CMS_op_type op), which exactly covers both
pause phases.

The JMM counter, however, is updated in the concurrent sweep phase,
CMSCollector::sweep(bool asynch), if there was no concurrent mode failure;
or it is updated in CMSCollector::do_compaction_work(bool
clear_all_soft_refs) in case of a bailout due to concurrent mode failure
(advertised as so in the code comments). So that's an increment by 1 per
concurrent GC cycle, which does reflect the "number of collection cycles",
fair enough.

So far so good.

2. The System.gc() part:

Without ExplicitGCInvokesConcurrent set, System.gc() does a stop-the-world
full GC, which consists of only one pause, so "number of pauses" would equal
"number of collections" in this case.
It will go into GenCollectedHeap::do_collection(); both the jvmstat and the
JMM GC counter gets incremented by 1 here,

TraceCollectorStats tcs(_gens[i]->counters());
TraceMemoryManagerStats tmms(_gens[i]->kind());

But, drilling down into:
_gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);

That'll eventually go into:
CMSCollector::acquire_control_and_collect(bool full, bool
clear_all_soft_refs)

System.gc() is user requested so that'll go further into mark-sweep-compact:
CMSCollector::do_compaction_work(bool clear_all_soft_refs)
And here, it increments the JMM GC counter again (remember it was in the
concurrent GC path too, to handle bailouts), even though this is still in
the same collection. This leads to the "buggy behavior" mentioned earlier.

The JMM GC counter wasn't added to CMS until this fix got in:
6581734: CMS Old Gen's collection usage is zero after GC which is incorrect

The code added to CMSCollector::do_compaction_work() works fine in the
concurrent GC path, but interacts badly with the original logic in
GenCollectedHeap::do_collection().

=====================================================

I thought all concurrent mode failures/interrupts come from
GenCollectedHeap::do_collection(). If that's the case, then it seems
unnecessary to update the JMM GC counter in
CMSCollector::do_compaction_work(), simply removing it should fix the
problem.

With that, I'd purpose the following (XS) change: (diff against HS20)

diff -r f0f676c5a2c6
src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
---
a/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
     Tue Mar 15 19:30:16 2011 -0700
+++
b/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
     Thu Jul 28 00:02:41 2011 +0800
@@ -2022,9 +2022,6 @@

 _intra_sweep_estimate.padded_average());
   }

-  {
-    TraceCMSMemoryManagerStats();
-  }
   GenMarkSweep::invoke_at_safepoint(_cmsGen->level(),
     ref_processor(), clear_all_soft_refs);
   #ifdef ASSERT

The same goes for the changes in:
7036199: Adding a notification to the implementation of
GarbageCollectorMXBeans

=====================================================

P.S. Is there an "official" name for the counters that I referred to as
"jvmstat counters" above? Is it just jvmstat, or PerfData or HSPERFDATA?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20110728/5e57298d/attachment.htm>


More information about the hotspot-gc-dev mailing list