Hi Ramki and Kevin,<div><br></div><div><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
Kevin, yes, it would be great if you could verify this and push the fix.<br>I am not sure if the push would need to wait for the signing of OCA<br>from Kris, but best to check with Those Who Would Know Such Things.</blockquote>
<div>It'll take some time before we can sign the OCA. I heard that the internal process is moving but I'm not sure how long it's going to take. Sorry about that. Will report our progress on it ASAP.</div><div>
<br></div><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
As regards the jstat counter reporting two pauses per concurrent<br>CMS cycle, I am of two minds on what the original intention<br>was. I'd have originally regarded the double increment as a<br>bug, but as you state it is really two pauses, even if part of<br>
a single cycle. And it makes sense to count them as two. I<br>agree that this should be documented and left alone, given<br>how long we have had this behaviour, and the alternative<br>(of counting cycles, rather than pauses) may be no better<br>
(or arguably worse). There's actually an open CR for this<br>which we can redirect into a CR to update the relevant documentation.</blockquote><div>Cool! I'd love to see improved documentation on this. Then it's much easier to explain about it to other people.</div>
<div><br></div><div>Thank you all,</div><div>Kris Mok</div><br><div class="gmail_quote">On Fri, Jul 29, 2011 at 8:29 AM, Y. S. Ramakrishna <span dir="ltr"><<a href="mailto:y.s.ramakrishna@oracle.com">y.s.ramakrishna@oracle.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi Kevin --<br>
<br>
thanks for jumping on this! More inline below ...<div class="im"><br>
<br>
On 07/28/11 09:33, Krystal Mok wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Kevin,<br>
<br>
Thank you for taking care of this, and it's good to see the problem is verified.<br>
<br>
I think whether or not the suggested fix is sufficient depends on what paths can reach CMSCollector::do_compaction_<u></u>work(). If all paths that can reach CMSCollector::do_compaction_<u></u>work() come from GenCollectedHeap::do_<u></u>collection(), then the fix should be good to go. Otherwise it'll need a better workaround.<br>
<br>
I believe all concurrent mode failures/interrupts (which includes the System.gc() case) does come from GenCollectedHeap::do_<u></u>collection(), but I'm not exactly sure about this, could anybody please clarify on it?<br>
</blockquote>
<br></div>
Yes, i believe this is indeed the case, and my browsing of the<br>
code using cscope seemed to confirm that belief.<br>
<br>
More below ...<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Regards,<br>
Kris Mok<div class="im"><br>
<br>
On Thu, Jul 28, 2011 at 10:12 PM, Kevin Walls <<a href="mailto:kevin.walls@oracle.com" target="_blank">kevin.walls@oracle.com</a> <mailto:<a href="mailto:kevin.walls@oracle.com" target="_blank">kevin.walls@oracle.com</a><u></u>>> wrote:<br>
<br>
__<br>
Hi --<br>
<br>
6580448 was marked as a duplicate of 6581734, which fixed the fact<br>
that CMS collections were just not counted at all - with CMS, only a<br>
stop the world full gc would be counted in the stats.<br>
<br>
But looks like you're right... Here is a quick variation of the<br>
testcase from 6581734 which shows the same thing, and this verifies<br>
the same, and is solved by ExplicitGCInvokesConcurrent. If there is<br>
no other feedback I can test if the removal of the<br>
TraceCMSMemoryManagerStats() call in<br>
CMSCollector::do_compaction_<u></u>work is all we need...<br>
</div></blockquote>
<br>
<br>
Kevin, yes, it would be great if you could verify this and push the fix.<br>
I am not sure if the push would need to wait for the signing of OCA<br>
from Kris, but best to check with Those Who Would Know Such Things.<br>
<br>
Since the original CR has been closed, i'll open one momentarily and<br>
can make you RE (if that's OK with you). I'll be happy to serve as<br>
reviewer of the change.<br>
<br>
As regards the jstat counter reporting two pauses per concurrent<br>
CMS cycle, I am of two minds on what the original intention<br>
was. I'd have originally regarded the double increment as a<br>
bug, but as you state it is really two pauses, even if part of<br>
a single cycle. And it makes sense to count them as two. I<br>
agree that this should be documented and left alone, given<br>
how long we have had this behaviour, and the alternative<br>
(of counting cycles, rather than pauses) may be no better<br>
(or arguably worse). There's actually an open CR for this<br>
which we can redirect into a CR to update the relevant documentation.<br>
<br>
-- ramki<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
<br>
Regards<br>
Kevin<br>
<br>
<br>
/*<br>
* Copyright (c) 2011, Oracle and/or its affiliates. All rights<br>
reserved.<br>
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.<br>
*<br>
* This code is free software; you can redistribute it and/or modify it<br>
* under the terms of the GNU General Public License version 2 only, as<br>
* published by the Free Software Foundation.<br>
*<br>
* This code is distributed in the hope that it will be useful, but<br>
WITHOUT<br>
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or<br>
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License<br>
* version 2 for more details (a copy is included in the LICENSE<br>
file that<br>
* accompanied this code).<br>
*<br>
* You should have received a copy of the GNU General Public License<br>
version<br>
* 2 along with this work; if not, write to the Free Software<br>
Foundation,<br>
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.<br>
*<br>
* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA<br>
94065 USA<br></div>
* or visit <a href="http://www.oracle.com" target="_blank">www.oracle.com</a> <<a href="http://www.oracle.com" target="_blank">http://www.oracle.com</a>> if you need<div><div></div><div class="h5"><br>
additional information or have any<br>
* questions.<br>
*/<br>
<br>
/*<br>
* @test TestFullGCount.java<br>
* @bug<br>
* @summary<br>
* @run main/othervm -XX:+UseConcMarkSweepGC TestFullGCCount<br>
*<br>
*/<br>
import java.util.*;<br>
import java.lang.management.*;<br>
<br>
<br>
<br>
public class TestFullGCCount {<br>
<br>
private String poolName = "CMS";<br>
private String collectorName = "ConcurrentMarkSweep";<br>
<br>
public static void main(String [] args) {<br>
<br>
TestFullGCCount t = null;<br>
if (args.length==2) {<br>
t = new TestFullGCCount(args[0], args[1]);<br>
} else {<br>
System.out.println("Defaulting to monitor CMS pool and<br>
collector.");<br>
t = new TestFullGCCount();<br>
}<br>
t.run();<br>
<br>
}<br>
<br>
public TestFullGCCount(String pool, String collector) {<br>
poolName = pool;<br>
collectorName = collector;<br>
}<br>
public TestFullGCCount() {<br>
}<br>
<br>
public void run() {<br>
<br>
int count = 0;<br>
int iterations = 20;<br>
long counts[] = new long[iterations];<br>
boolean diffAlways2 = true; // assume we will fail<br>
<br>
for (int i=0; i<iterations; i++) {<br>
System.gc();<br>
counts[i] = checkStats();<br>
if (i>0) {<br>
if (counts[i] - counts[i-1] != 2) {<br>
diffAlways2 = false;<br>
}<br>
}<br>
}<br>
if (diffAlways2) {<br>
throw new RuntimeException("FAILED: difference in count is<br>
always 2.");<br>
}<br>
System.out.println("Passed.");<br>
}<br>
<br>
private long checkStats() {<br>
long count = 0;<br>
List<MemoryPoolMXBean> pools =<br>
ManagementFactory.<u></u>getMemoryPoolMXBeans();<br>
List<GarbageCollectorMXBean> collectors =<br>
ManagementFactory.<u></u>getGarbageCollectorMXBeans();<br>
for (int i=0; i<collectors.size(); i++) {<br>
GarbageCollectorMXBean collector = collectors.get(i);<br>
String name = collector.getName();<br>
if (name.contains(collectorName)) {<br>
System.out.println(name + ": collection count = "<br>
+ collector.getCollectionCount()<u></u>);<br>
count = collector.getCollectionCount()<u></u>;<br>
}<br>
}<br>
return count;<br>
<br>
}<br>
<br>
}<br>
<br>
<br>
On 27/07/11 17:12, Krystal Mok wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi all,<br>
<br>
I've been looking at a strange inconsistency of full GC count<br>
recorded by jvmstat and JMM counters. I'd like to know which ones<br>
of the following behaviors are by design, which ones are bugs, and<br>
which ones are just my misunderstanding. I apologize for making a<br>
short story long...<br>
<br>
==============================<u></u>=======================<br>
<br>
The counters involved:<br>
<br>
* A jvmstat counter named "sun.gc.collector.1.<u></u>invocations" keeps<br>
track of the number of pauses occured as a result of a major<br>
collection. It is used by utilities such as jstat as the source of<br>
"FGC" (full collection count), and the old gen collection count in<br>
Visual GC. It's updated by an TraceCollectorStats object.<br>
* A JMM counter, GCMemoryManager::_num_<u></u>collections, keeps track of<br>
the number of collections that have ended. This counter is used as<br>
HotSpot's implementation of the JMX<br>
GarbageCollectorMXBean.<u></u>getCollectionCount(). It's updated by<br>
either a TraceMemoryManagerStats object or a<br>
TraceCMSMemoryManagerStats object.<br>
<br>
To show the situation, I've made a screenshot of a VisualVM and a<br>
JConsole running side by side, both are monitoring the VisualVM's<br>
GC stats:<br>
<a href="http://dl.iteye.com/upload/attachment/524811/913cb0e1-7add-3ac0-a718-24ca705cad22.png" target="_blank">http://dl.iteye.com/upload/<u></u>attachment/524811/913cb0e1-<u></u>7add-3ac0-a718-24ca705cad22.<u></u>png</a><br>
(I'll upload the screenshot to somewhere else if anybody can't see it)<br>
The VisualVM instance is running on JDK6u26, with ParNew+CMS.<br>
In the screenshot, Visual GC reports that the old gen collection<br>
count is 20, while JConsole reports 10.<br>
<br>
I see that there was this bug:<br>
6580448: CMS: Full GC collection count mismatch between<br>
GarbageCollectorMXBean and jvmstat (VisualGC)<br>
I don't think the current implementation has a bug in the sense<br>
that the two counters don't report the same number.<br>
<br>
This behavior seems reasonable, but the naming of the value in<br>
these tools are confusing: both tools say "collections", but<br>
apparently the number in Visual GC means "number of pauses" where<br>
as the number in JConsole means "number of collection cycles".<br>
It'd be great if the difference could be documented somewhere, if<br>
that's the intended behavior.<br>
<br>
And then the buggy behavior. Code demo posted on gist:<br>
<a href="https://gist.github.com/1106263" target="_blank">https://gist.github.com/<u></u>1106263</a><br>
Starting from JDK6u23, when using CMS without<br>
ExplicitGCInvokesConcurrent, System.gc() (or<br>
Runtime.getRuntime().gc(), or MemoryMXBean.gc() via JMX) would<br>
make the JMM GC counter increment by 2 per invocation, while the<br>
jvmstat counter is only incremented by 1. I believe the latter is<br>
correct and the former needs some fixing.<br>
<br>
==============================<u></u>=======================<br>
<br>
My understanding of the behavior shown above:<br>
<br>
1. The concurrent GC part:<br>
<br>
There are 2 pauses in a CMS concurrent GC cycle, one in the<br>
initial mark phase, and one in the final remark phase.<br>
To trigger a concurrent GC cycle, the CMS thread wakes up<br>
periodically to see if it shouldConcurrentCollect(), and trigger a<br>
cycle when the predicate returned true, or goes back to sleep if<br>
the predicate returned false. The whole concurrent GC cycle<br>
doesn't go through GenCollectedHeap::do_<u></u>collection().<br>
<br>
The jvmstat counter for old gen pauses is updated in<br>
CMSCollector::do_CMS_<u></u>operation(CMS_op_type op), which exactly<br>
covers both pause phases.<br>
<br>
The JMM counter, however, is updated in the concurrent sweep<br>
phase, CMSCollector::sweep(bool asynch), if there was no<br>
concurrent mode failure; or it is updated in<br>
CMSCollector::do_compaction_<u></u>work(bool clear_all_soft_refs) in case<br>
of a bailout due to concurrent mode failure (advertised as so in<br>
the code comments). So that's an increment by 1 per concurrent GC<br>
cycle, which does reflect the "number of collection cycles", fair<br>
enough.<br>
<br>
So far so good.<br>
<br>
2. The System.gc() part:<br>
<br>
Without ExplicitGCInvokesConcurrent set, System.gc() does a<br>
stop-the-world full GC, which consists of only one pause, so<br>
"number of pauses" would equal "number of collections" in this case.<br>
It will go into GenCollectedHeap::do_<u></u>collection(); both the<br>
jvmstat and the JMM GC counter gets incremented by 1 here,<br>
<br>
TraceCollectorStats tcs(_gens[i]->counters());<br>
TraceMemoryManagerStats tmms(_gens[i]->kind());<br>
<br>
But, drilling down into:<br>
_gens[i]->collect(full, do_clear_all_soft_refs, size, is_tlab);<br>
<br>
That'll eventually go into:<br>
CMSCollector::acquire_control_<u></u>and_collect(bool full, bool<br>
clear_all_soft_refs)<br>
<br>
System.gc() is user requested so that'll go further into<br>
mark-sweep-compact:<br>
CMSCollector::do_compaction_<u></u>work(bool clear_all_soft_refs)<br>
And here, it increments the JMM GC counter again (remember it was<br>
in the concurrent GC path too, to handle bailouts), even though<br>
this is still in the same collection. This leads to the "buggy<br>
behavior" mentioned earlier.<br>
<br>
The JMM GC counter wasn't added to CMS until this fix got in:<br>
6581734: CMS Old Gen's collection usage is zero after GC which is<br>
incorrect<br>
<br>
The code added to CMSCollector::do_compaction_<u></u>work() works fine in<br>
the concurrent GC path, but interacts badly with the original<br>
logic in GenCollectedHeap::do_<u></u>collection().<br>
<br>
==============================<u></u>=======================<br>
<br>
I thought all concurrent mode failures/interrupts come from<br>
GenCollectedHeap::do_<u></u>collection(). If that's the case, then it<br>
seems unnecessary to update the JMM GC counter in<br>
CMSCollector::do_compaction_<u></u>work(), simply removing it should fix<br>
the problem.<br>
<br>
With that, I'd purpose the following (XS) change: (diff against HS20)<br>
<br>
diff -r f0f676c5a2c6<br>
src/share/vm/gc_<u></u>implementation/<u></u>concurrentMarkSweep/<u></u>concurrentMarkSweepGeneration.<u></u>cpp<br>
---<br>
a/src/share/vm/gc_<u></u>implementation/<u></u>concurrentMarkSweep/<u></u>concurrentMarkSweepGeneration.<u></u>cpp<br>
Tue Mar 15 19:30:16 2011 -0700<br>
+++<br>
b/src/share/vm/gc_<u></u>implementation/<u></u>concurrentMarkSweep/<u></u>concurrentMarkSweepGeneration.<u></u>cpp<br>
Thu Jul 28 00:02:41 2011 +0800<br>
@@ -2022,9 +2022,6 @@<br>
_intra_sweep_estimate.padded_<u></u>average());<br>
}<br>
<br>
- {<br>
- TraceCMSMemoryManagerStats();<br>
- }<br>
GenMarkSweep::invoke_at_<u></u>safepoint(_cmsGen->level(),<br>
ref_processor(), clear_all_soft_refs);<br>
#ifdef ASSERT<br>
<br>
The same goes for the changes in:<br>
7036199: Adding a notification to the implementation of<br>
GarbageCollectorMXBeans<br>
<br>
==============================<u></u>=======================<br>
<br>
P.S. Is there an "official" name for the counters that I referred<br>
to as "jvmstat counters" above? Is it just jvmstat, or PerfData or<br>
HSPERFDATA?<br>
</blockquote>
<br>
<br>
</div></div></blockquote>
</blockquote></div><br></div>