From tanman12345 at yahoo.com  Tue Jan  4 08:02:42 2011
From: tanman12345 at yahoo.com (Erwin)
Date: Tue, 4 Jan 2011 08:02:42 -0800 (PST)
Subject: High Par New collection times 
Message-ID: <329892.52364.qm@web111103.mail.gq1.yahoo.com>

Hello,

>From time to time, we would get a high collection time for ParNew and it would indicate promotion failed. This would then show CPU starvation message on the SystemOut.log. We're using Solaris10 and WAS 7.0.0.9 NDE.

Below is our JVM arguments. Any help would be appreciated.
-server -Xmn1000m -XX:PermSize=512m -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -DUseSunHttpHandler=true -Djavax.xml.soap.MessageFactory=weblogic.xml.saaj.MessageFactoryImpl -Doracle.jdbc.V8Compatible=true -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:-TraceClassUnloading -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC  -XX:MaxPermSize=694m -XX:+DisableExplicitGC -XX:+CMSParallelRemarkEnabled

{Heap before GC invocations=12800 (full 1163):
 par new generation   total 921600K, used 906365K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
  eden space 819200K, 100% used [0xfffffffdc0000000, 0xfffffffdf2000000, 0xfffffffdf2000000)
  from space 102400K,  85% used [0xfffffffdf2000000, 0xfffffffdf751f498, 0xfffffffdf8400000)
  to   space 102400K,   0% used [0xfffffffdf8400000, 0xfffffffdf8400000, 0xfffffffdfe800000)
 concurrent mark-sweep generation total 5136384K, used 2809586K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
 concurrent-mark-sweep perm gen total 524288K, used 249119K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
1598527.649: [GC 1598527.649: [ParNew (promotion failed): 906365K->921600K(921600K), 0.6504370 secs]1598528.300: [CMS: 2969343K->2818470K(5136384K), 30.8057620 secs] 3715951K->2818470K(6057984K), [CMS Perm : 249119K->248229K(524288K)], 31.4567688 secs] [Times: user=33.09 sys=0.26, real=31.46 secs] 
Heap after GC invocations=12801 (full 1164):
 par new generation   total 921600K, used 0K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
  eden space 819200K,   0% used [0xfffffffdc0000000, 0xfffffffdc0000000, 0xfffffffdf2000000)
  from space 102400K,   0% used [0xfffffffdf8400000, 0xfffffffdf8400000, 0xfffffffdfe800000)
  to   space 102400K,   0% used [0xfffffffdf2000000, 0xfffffffdf2000000, 0xfffffffdf8400000)
 concurrent mark-sweep generation total 5136384K, used 2818470K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
 concurrent-mark-sweep perm gen total 524288K, used 248229K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
}

Other times are fine
{Heap before GC invocations=12798 (full 1163):
 par new generation   total 921600K, used 921600K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
  eden space 819200K, 100% used [0xfffffffdc0000000, 0xfffffffdf2000000, 0xfffffffdf2000000)
  from space 102400K, 100% used [0xfffffffdf2000000, 0xfffffffdf8400000, 0xfffffffdf8400000)
  to   space 102400K,   0% used [0xfffffffdf8400000, 0xfffffffdf8400000, 0xfffffffdfe800000)
 concurrent mark-sweep generation total 5136384K, used 2677208K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
 concurrent-mark-sweep perm gen total 524288K, used 248923K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
1598495.024: [GC 1598495.024: [ParNew: 921600K->88036K(921600K), 0.2590699 secs] 3598808K->2871275K(6057984K), 0.2594312 secs] [Times: user=2.73 sys=0.10, real=0.26 secs] 
Heap after GC invocations=12799 (full 1163):
 par new generation   total 921600K, used 88036K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
  eden space 819200K,   0% used [0xfffffffdc0000000, 0xfffffffdc0000000, 0xfffffffdf2000000)
  from space 102400K,  85% used [0xfffffffdf8400000, 0xfffffffdfd9f9168, 0xfffffffdfe800000)
  to   space 102400K,   0% used [0xfffffffdf2000000, 0xfffffffdf2000000, 0xfffffffdf8400000)
 concurrent mark-sweep generation total 5136384K, used 2783239K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
 concurrent-mark-sweep perm gen total 524288K, used 248923K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
}
Total time for which application threads were stopped: 0.2631313 seconds
Total time for which application threads were stopped: 0.0089191 seconds
Total time for which application threads were stopped: 0.0065184 seconds
Total time for which application threads were stopped: 0.0022136 seconds
Total time for which application threads were stopped: 0.0072144 seconds
Total time for which application threads were stopped: 0.0093864 seconds
Total time for which application threads were stopped: 0.0022413 seconds
Total time for which application threads were stopped: 0.0022476 seconds
{Heap before GC invocations=12799 (full 1163):
 par new generation   total 921600K, used 907236K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
  eden space 819200K, 100% used [0xfffffffdc0000000, 0xfffffffdf2000000, 0xfffffffdf2000000)
  from space 102400K,  85% used [0xfffffffdf8400000, 0xfffffffdfd9f9168, 0xfffffffdfe800000)
  to   space 102400K,   0% used [0xfffffffdf2000000, 0xfffffffdf2000000, 0xfffffffdf8400000)
 concurrent mark-sweep generation total 5136384K, used 2783239K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
 concurrent-mark-sweep perm gen total 524288K, used 249075K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
1598514.042: [GC 1598514.043: [ParNew: 907236K->87165K(921600K), 0.1946563 secs] 3690475K->2896751K(6057984K), 0.1950345 secs] [Times: user=1.55 sys=0.07, real=0.20 secs] 
Heap after GC invocations=12800 (full 1163):
 par new generation   total 921600K, used 87165K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
  eden space 819200K,   0% used [0xfffffffdc0000000, 0xfffffffdc0000000, 0xfffffffdf2000000)
  from space 102400K,  85% used [0xfffffffdf2000000, 0xfffffffdf751f498, 0xfffffffdf8400000)
  to   space 102400K,   0% used [0xfffffffdf8400000, 0xfffffffdf8400000, 0xfffffffdfe800000)
 concurrent mark-sweep generation total 5136384K, used 2809586K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
 concurrent-mark-sweep perm gen total 524288K, used 249075K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
}
Total time for which application threads were stopped: 0.2034239 seconds
Total time for which application threads were stopped: 0.0075922 seconds
Total time for which application threads were stopped: 0.0022152 seconds

Thanks,
Erwin


From y.s.ramakrishna at oracle.com  Tue Jan  4 09:27:19 2011
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Tue, 04 Jan 2011 09:27:19 -0800
Subject: High Par New collection times
In-Reply-To: <329892.52364.qm@web111103.mail.gq1.yahoo.com>
References: <329892.52364.qm@web111103.mail.gq1.yahoo.com>
Message-ID: <4D235877.4050900@oracle.com>

Don't know which version of the JVM you are using, but the following may be relevant:-

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6631166
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6999988

The  latter is still under investigation.
-- ramki

On 01/04/11 08:02, Erwin wrote:
> Hello,
> 
>>From time to time, we would get a high collection time for ParNew and it would indicate promotion failed. This would then show CPU starvation message on the SystemOut.log. We're using Solaris10 and WAS 7.0.0.9 NDE.
> 
> Below is our JVM arguments. Any help would be appreciated.
> -server -Xmn1000m -XX:PermSize=512m -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -DUseSunHttpHandler=true -Djavax.xml.soap.MessageFactory=weblogic.xml.saaj.MessageFactoryImpl -Doracle.jdbc.V8Compatible=true -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:-TraceClassUnloading -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC  -XX:MaxPermSize=694m -XX:+DisableExplicitGC -XX:+CMSParallelRemarkEnabled
> 
> {Heap before GC invocations=12800 (full 1163):
>  par new generation   total 921600K, used 906365K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
>   eden space 819200K, 100% used [0xfffffffdc0000000, 0xfffffffdf2000000, 0xfffffffdf2000000)
>   from space 102400K,  85% used [0xfffffffdf2000000, 0xfffffffdf751f498, 0xfffffffdf8400000)
>   to   space 102400K,   0% used [0xfffffffdf8400000, 0xfffffffdf8400000, 0xfffffffdfe800000)
>  concurrent mark-sweep generation total 5136384K, used 2809586K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
>  concurrent-mark-sweep perm gen total 524288K, used 249119K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
> 1598527.649: [GC 1598527.649: [ParNew (promotion failed): 906365K->921600K(921600K), 0.6504370 secs]1598528.300: [CMS: 2969343K->2818470K(5136384K), 30.8057620 secs] 3715951K->2818470K(6057984K), [CMS Perm : 249119K->248229K(524288K)], 31.4567688 secs] [Times: user=33.09 sys=0.26, real=31.46 secs] 
> Heap after GC invocations=12801 (full 1164):
>  par new generation   total 921600K, used 0K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
>   eden space 819200K,   0% used [0xfffffffdc0000000, 0xfffffffdc0000000, 0xfffffffdf2000000)
>   from space 102400K,   0% used [0xfffffffdf8400000, 0xfffffffdf8400000, 0xfffffffdfe800000)
>   to   space 102400K,   0% used [0xfffffffdf2000000, 0xfffffffdf2000000, 0xfffffffdf8400000)
>  concurrent mark-sweep generation total 5136384K, used 2818470K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
>  concurrent-mark-sweep perm gen total 524288K, used 248229K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
> }
> 
> Other times are fine
> {Heap before GC invocations=12798 (full 1163):
>  par new generation   total 921600K, used 921600K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
>   eden space 819200K, 100% used [0xfffffffdc0000000, 0xfffffffdf2000000, 0xfffffffdf2000000)
>   from space 102400K, 100% used [0xfffffffdf2000000, 0xfffffffdf8400000, 0xfffffffdf8400000)
>   to   space 102400K,   0% used [0xfffffffdf8400000, 0xfffffffdf8400000, 0xfffffffdfe800000)
>  concurrent mark-sweep generation total 5136384K, used 2677208K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
>  concurrent-mark-sweep perm gen total 524288K, used 248923K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
> 1598495.024: [GC 1598495.024: [ParNew: 921600K->88036K(921600K), 0.2590699 secs] 3598808K->2871275K(6057984K), 0.2594312 secs] [Times: user=2.73 sys=0.10, real=0.26 secs] 
> Heap after GC invocations=12799 (full 1163):
>  par new generation   total 921600K, used 88036K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
>   eden space 819200K,   0% used [0xfffffffdc0000000, 0xfffffffdc0000000, 0xfffffffdf2000000)
>   from space 102400K,  85% used [0xfffffffdf8400000, 0xfffffffdfd9f9168, 0xfffffffdfe800000)
>   to   space 102400K,   0% used [0xfffffffdf2000000, 0xfffffffdf2000000, 0xfffffffdf8400000)
>  concurrent mark-sweep generation total 5136384K, used 2783239K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
>  concurrent-mark-sweep perm gen total 524288K, used 248923K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
> }
> Total time for which application threads were stopped: 0.2631313 seconds
> Total time for which application threads were stopped: 0.0089191 seconds
> Total time for which application threads were stopped: 0.0065184 seconds
> Total time for which application threads were stopped: 0.0022136 seconds
> Total time for which application threads were stopped: 0.0072144 seconds
> Total time for which application threads were stopped: 0.0093864 seconds
> Total time for which application threads were stopped: 0.0022413 seconds
> Total time for which application threads were stopped: 0.0022476 seconds
> {Heap before GC invocations=12799 (full 1163):
>  par new generation   total 921600K, used 907236K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
>   eden space 819200K, 100% used [0xfffffffdc0000000, 0xfffffffdf2000000, 0xfffffffdf2000000)
>   from space 102400K,  85% used [0xfffffffdf8400000, 0xfffffffdfd9f9168, 0xfffffffdfe800000)
>   to   space 102400K,   0% used [0xfffffffdf2000000, 0xfffffffdf2000000, 0xfffffffdf8400000)
>  concurrent mark-sweep generation total 5136384K, used 2783239K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
>  concurrent-mark-sweep perm gen total 524288K, used 249075K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
> 1598514.042: [GC 1598514.043: [ParNew: 907236K->87165K(921600K), 0.1946563 secs] 3690475K->2896751K(6057984K), 0.1950345 secs] [Times: user=1.55 sys=0.07, real=0.20 secs] 
> Heap after GC invocations=12800 (full 1163):
>  par new generation   total 921600K, used 87165K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
>   eden space 819200K,   0% used [0xfffffffdc0000000, 0xfffffffdc0000000, 0xfffffffdf2000000)
>   from space 102400K,  85% used [0xfffffffdf2000000, 0xfffffffdf751f498, 0xfffffffdf8400000)
>   to   space 102400K,   0% used [0xfffffffdf8400000, 0xfffffffdf8400000, 0xfffffffdfe800000)
>  concurrent mark-sweep generation total 5136384K, used 2809586K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
>  concurrent-mark-sweep perm gen total 524288K, used 249075K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
> }
> Total time for which application threads were stopped: 0.2034239 seconds
> Total time for which application threads were stopped: 0.0075922 seconds
> Total time for which application threads were stopped: 0.0022152 seconds
> 
> Thanks,
> Erwin
> 
> 
>       
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From jon.masamitsu at oracle.com  Tue Jan  4 10:41:54 2011
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Tue, 04 Jan 2011 10:41:54 -0800
Subject: High Par New collection times
In-Reply-To: <329892.52364.qm@web111103.mail.gq1.yahoo.com>
References: <329892.52364.qm@web111103.mail.gq1.yahoo.com>
Message-ID: <4D2369F2.30306@oracle.com>


Erwin,

Are you asking why this time is so long?

[Times: user=33.09 sys=0.26, real=31.46 secs]


Jon

On 01/04/11 08:02, Erwin wrote:
> Hello,
>
> > From time to time, we would get a high collection time for ParNew and it would indicate promotion failed. This would then show CPU starvation message on the SystemOut.log. We're using Solaris10 and WAS 7.0.0.9 NDE.
>
> Below is our JVM arguments. Any help would be appreciated.
> -server -Xmn1000m -XX:PermSize=512m -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -DUseSunHttpHandler=true -Djavax.xml.soap.MessageFactory=weblogic.xml.saaj.MessageFactoryImpl -Doracle.jdbc.V8Compatible=true -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:-TraceClassUnloading -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC  -XX:MaxPermSize=694m -XX:+DisableExplicitGC -XX:+CMSParallelRemarkEnabled
>
> {Heap before GC invocations=12800 (full 1163):
>   par new generation   total 921600K, used 906365K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
>    eden space 819200K, 100% used [0xfffffffdc0000000, 0xfffffffdf2000000, 0xfffffffdf2000000)
>    from space 102400K,  85% used [0xfffffffdf2000000, 0xfffffffdf751f498, 0xfffffffdf8400000)
>    to   space 102400K,   0% used [0xfffffffdf8400000, 0xfffffffdf8400000, 0xfffffffdfe800000)
>   concurrent mark-sweep generation total 5136384K, used 2809586K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
>   concurrent-mark-sweep perm gen total 524288K, used 249119K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
> 1598527.649: [GC 1598527.649: [ParNew (promotion failed): 906365K->921600K(921600K), 0.6504370 secs]1598528.300: [CMS: 2969343K->2818470K(5136384K), 30.8057620 secs] 3715951K->2818470K(6057984K), [CMS Perm : 249119K->248229K(524288K)], 31.4567688 secs] [Times: user=33.09 sys=0.26, real=31.46 secs]
> Heap after GC invocations=12801 (full 1164):
>   par new generation   total 921600K, used 0K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
>    eden space 819200K,   0% used [0xfffffffdc0000000, 0xfffffffdc0000000, 0xfffffffdf2000000)
>    from space 102400K,   0% used [0xfffffffdf8400000, 0xfffffffdf8400000, 0xfffffffdfe800000)
>    to   space 102400K,   0% used [0xfffffffdf2000000, 0xfffffffdf2000000, 0xfffffffdf8400000)
>   concurrent mark-sweep generation total 5136384K, used 2818470K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
>   concurrent-mark-sweep perm gen total 524288K, used 248229K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
> }
>
> Other times are fine
> {Heap before GC invocations=12798 (full 1163):
>   par new generation   total 921600K, used 921600K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
>    eden space 819200K, 100% used [0xfffffffdc0000000, 0xfffffffdf2000000, 0xfffffffdf2000000)
>    from space 102400K, 100% used [0xfffffffdf2000000, 0xfffffffdf8400000, 0xfffffffdf8400000)
>    to   space 102400K,   0% used [0xfffffffdf8400000, 0xfffffffdf8400000, 0xfffffffdfe800000)
>   concurrent mark-sweep generation total 5136384K, used 2677208K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
>   concurrent-mark-sweep perm gen total 524288K, used 248923K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
> 1598495.024: [GC 1598495.024: [ParNew: 921600K->88036K(921600K), 0.2590699 secs] 3598808K->2871275K(6057984K), 0.2594312 secs] [Times: user=2.73 sys=0.10, real=0.26 secs]
> Heap after GC invocations=12799 (full 1163):
>   par new generation   total 921600K, used 88036K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
>    eden space 819200K,   0% used [0xfffffffdc0000000, 0xfffffffdc0000000, 0xfffffffdf2000000)
>    from space 102400K,  85% used [0xfffffffdf8400000, 0xfffffffdfd9f9168, 0xfffffffdfe800000)
>    to   space 102400K,   0% used [0xfffffffdf2000000, 0xfffffffdf2000000, 0xfffffffdf8400000)
>   concurrent mark-sweep generation total 5136384K, used 2783239K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
>   concurrent-mark-sweep perm gen total 524288K, used 248923K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
> }
> Total time for which application threads were stopped: 0.2631313 seconds
> Total time for which application threads were stopped: 0.0089191 seconds
> Total time for which application threads were stopped: 0.0065184 seconds
> Total time for which application threads were stopped: 0.0022136 seconds
> Total time for which application threads were stopped: 0.0072144 seconds
> Total time for which application threads were stopped: 0.0093864 seconds
> Total time for which application threads were stopped: 0.0022413 seconds
> Total time for which application threads were stopped: 0.0022476 seconds
> {Heap before GC invocations=12799 (full 1163):
>   par new generation   total 921600K, used 907236K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
>    eden space 819200K, 100% used [0xfffffffdc0000000, 0xfffffffdf2000000, 0xfffffffdf2000000)
>    from space 102400K,  85% used [0xfffffffdf8400000, 0xfffffffdfd9f9168, 0xfffffffdfe800000)
>    to   space 102400K,   0% used [0xfffffffdf2000000, 0xfffffffdf2000000, 0xfffffffdf8400000)
>   concurrent mark-sweep generation total 5136384K, used 2783239K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
>   concurrent-mark-sweep perm gen total 524288K, used 249075K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
> 1598514.042: [GC 1598514.043: [ParNew: 907236K->87165K(921600K), 0.1946563 secs] 3690475K->2896751K(6057984K), 0.1950345 secs] [Times: user=1.55 sys=0.07, real=0.20 secs]
> Heap after GC invocations=12800 (full 1163):
>   par new generation   total 921600K, used 87165K [0xfffffffdc0000000, 0xfffffffdfe800000, 0xfffffffdfe800000)
>    eden space 819200K,   0% used [0xfffffffdc0000000, 0xfffffffdc0000000, 0xfffffffdf2000000)
>    from space 102400K,  85% used [0xfffffffdf2000000, 0xfffffffdf751f498, 0xfffffffdf8400000)
>    to   space 102400K,   0% used [0xfffffffdf8400000, 0xfffffffdf8400000, 0xfffffffdfe800000)
>   concurrent mark-sweep generation total 5136384K, used 2809586K [0xfffffffdfe800000, 0xffffffff38000000, 0xffffffff38000000)
>   concurrent-mark-sweep perm gen total 524288K, used 249075K [0xffffffff38000000, 0xffffffff58000000, 0xffffffff63800000)
> }
> Total time for which application threads were stopped: 0.2034239 seconds
> Total time for which application threads were stopped: 0.0075922 seconds
> Total time for which application threads were stopped: 0.0022152 seconds
>
> Thanks,
> Erwin
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From ycraig at cysystems.com  Wed Jan  5 08:37:50 2011
From: ycraig at cysystems.com (craig yeldell)
Date: Wed, 5 Jan 2011 11:37:50 -0500
Subject: CPU differences and CMS GC performance
Message-ID: <F3315D73-7AE4-4157-86C7-B6075585D1AC@cysystems.com>

I have a production environment that consists of two different  
hardware configurations.  We have just replaced half our environment  
with new  servers that have more RAM and different CPU's.  We left the  
GC params the same,  the GC behavior has been quite surprising.  I  
expected to see faster collections on the New and CMS due to the raw  
cpu test results of the Nehalem, and this held true.  However I also  
found fewer New collections, and increased CMS collections.
In a nutshell, I assumed upgrading the CPU alone without touching the  
GC args would result in improved GC performance.


Java info:

Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)

-server -Xmx2500m -Xms2500m -Dsun.rmi.dgc.client.gcInterval=3600000 - 
Dsun.rmi.dgc.server.gcInterval=3600000 -Xss256k -XX:+DisableExplicitGC  
-XX:PermSize=384m -XX:MaxPermSize=384m -XX:MaxNewSize=888m - 
XX:NewSize=888m -XX:ParallelGCThreads=6 -verbose:gc -Xloggc:vm01.gc - 
XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX: 
+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX: 
+CMSClassUnloadingEnabled -XX:SurvivorRatio=4 -XX:+UseTLAB -XX: 
+UseParNewGC -XX:+UseBiasedLocking -XX:TargetSurvivorRatio=75 -XX: 
+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=80

NOTE: Each server has 4 jvm's running and all vm's in the environment  
are equally load balanced.

CPU info:

OLD
16GB

CPU frequency:    3.003 GHz (Xeon)
Number of CPUs: 4
Number of cores: 8
Number of threads: 16


NEW
32GB

CPU frequency:    2.533 GHz (Xeon E5540 Nehalem)
Number of CPUs: 2
Number of cores: 8
Number of threads: 16


Example GC snippet:

Old server:

2011-01-04T18:24:02.073-0500: 316390.125: [CMS-concurrent-reset:  
0.061/0.061 secs] [Times: user=0.06 sys=0.00, real=0.06 secs]
2011-01-04T18:24:12.890-0500: 316400.941: [GC 316400.942: [ParNew:  
690252K->93718K(757760K), 0.1300730 secs] 1968696K->1372788K 
(2408448K), 0.1308690 secs] [Times: user=0.66 sys=0.00, real=0.13 secs]
2011-01-04T18:24:50.647-0500: 316438.698: [GC 316438.699: [ParNew:  
699926K->74785K(757760K), 0.1296920 secs] 1978996K->1355277K 
(2408448K), 0.1304530 secs] [Times: user=0.61 sys=0.00, real=0.13 secs]
2011-01-04T18:25:17.854-0500: 316465.905: [GC 316465.906: [ParNew:  
680928K->86498K(757760K), 0.1505500 secs] 1961420K->1369830K 
(2408448K), 0.1513450 secs] [Times: user=0.71 sys=0.00, real=0.15 secs]
2011-01-04T18:25:50.762-0500: 316498.814: [GC 316498.814: [ParNew:  
692706K->69120K(757760K), 0.1385690 secs] 1976038K->1352581K 
(2408448K), 0.1393620 secs] [Times: user=0.63 sys=0.00, real=0.14 secs]
2011-01-04T18:26:22.644-0500: 316530.695: [GC 316530.696: [ParNew:  
675328K->86476K(757760K), 0.1493580 secs] 1958789K->1372863K 
(2408448K), 0.1501220 secs] [Times: user=0.66 sys=0.00, real=0.15 secs]
2011-01-04T18:27:01.951-0500: 316570.003: [GC 316570.003: [ParNew:  
692684K->70259K(757760K), 0.1250090 secs] 1979071K->1357514K 
(2408448K), 0.1257610 secs] [Times: user=0.62 sys=0.00, real=0.13 secs]
2011-01-04T18:27:28.542-0500: 316596.594: [GC 316596.594: [ParNew:  
676467K->88180K(757760K), 0.1030450 secs] 1963722K->1375664K 
(2408448K), 0.1038090 secs] [Times: user=0.54 sys=0.00, real=0.10 secs]
2011-01-04T18:27:52.200-0500: 316620.251: [GC 316620.252: [ParNew:  
694388K->64015K(757760K), 0.1551250 secs] 1981872K->1351774K 
(2408448K), 0.1560860 secs] [Times: user=0.79 sys=0.00, real=0.15 secs]
2011-01-04T18:28:07.789-0500: 316635.841: [GC 316635.841: [ParNew:  
670223K->94609K(757760K), 0.1321720 secs] 1957982K->1385077K 
(2408448K), 0.1329350 secs] [Times: user=0.65 sys=0.01, real=0.13 secs]
2011-01-04T18:28:38.020-0500: 316666.071: [GC 316666.072: [ParNew:  
700817K->64339K(757760K), 0.1169430 secs] 1991285K->1358764K 
(2408448K), 0.1177070 secs] [Times: user=0.59 sys=0.00, real=0.12 secs]
2011-01-04T18:29:13.319-0500: 316701.370: [GC 316701.371: [ParNew:  
670547K->75587K(757760K), 0.1352400 secs] 1964972K->1370278K 
(2408448K), 0.1360140 secs] [Times: user=0.63 sys=0.00, real=0.13 secs]
2011-01-04T18:29:46.586-0500: 316734.637: [GC 316734.638: [ParNew:  
681795K->67573K(757760K), 0.1257050 secs] 1976486K->1364337K 
(2408448K), 0.1265250 secs] [Times: user=0.56 sys=0.01, real=0.12 secs]
2011-01-04T18:30:31.525-0500: 316779.577: [GC 316779.578: [ParNew:  
673781K->65516K(757760K), 0.1143930 secs] 1970545K->1363488K 
(2408448K), 0.1154980 secs] [Times: user=0.54 sys=0.00, real=0.12 secs]
2011-01-04T18:30:56.159-0500: 316804.211: [GC 316804.211: [ParNew:  
671724K->54331K(757760K), 0.1067750 secs] 1969696K->1353537K 
(2408448K), 0.1075690 secs] [Times: user=0.53 sys=0.00, real=0.11 secs]
2011-01-04T18:32:01.256-0500: 316869.308: [GC 316869.308: [ParNew:  
660539K->66703K(757760K), 0.1036880 secs] 1959745K->1367053K 
(2408448K), 0.1044590 secs] [Times: user=0.49 sys=0.00, real=0.11 secs]
2011-01-04T18:32:31.967-0500: 316900.019: [GC 316900.019: [ParNew:  
672911K->54742K(757760K), 0.1121020 secs] 1973261K->1355232K 
(2408448K), 0.1128580 secs] [Times: user=0.51 sys=0.00, real=0.11 secs]
2011-01-04T18:33:05.187-0500: 316933.239: [GC 316933.239: [ParNew:  
660950K->66592K(757760K), 0.0990360 secs] 1961440K->1368889K 
(2408448K), 0.0998210 secs] [Times: user=0.49 sys=0.00, real=0.10 secs]
2011-01-04T18:33:38.693-0500: 316966.745: [GC 316966.745: [ParNew:  
672800K->57860K(757760K), 0.1132210 secs] 1975097K->1361914K 
(2408448K), 0.1139990 secs] [Times: user=0.53 sys=0.01, real=0.12 secs]
2011-01-04T18:34:33.087-0500: 317021.139: [GC 317021.139: [ParNew:  
664068K->74055K(757760K), 0.1268070 secs] 1968122K->1379190K 
(2408448K), 0.1275670 secs] [Times: user=0.52 sys=0.00, real=0.12 secs]
2011-01-04T18:34:59.371-0500: 317047.423: [GC 317047.423: [ParNew:  
680263K->67454K(757760K), 0.1354760 secs] 1985398K->1374886K 
(2408448K), 0.1363060 secs] [Times: user=0.61 sys=0.01, real=0.14 secs]
2011-01-04T18:35:23.757-0500: 317071.809: [GC 317071.809: [ParNew:  
673662K->78394K(757760K), 0.1624010 secs] 1981094K->1385967K 
(2408448K), 0.1632150 secs] [Times: user=0.78 sys=0.00, real=0.17 secs]
2011-01-04T18:35:45.117-0500: 317093.169: [GC 317093.169: [ParNew:  
684602K->66073K(757760K), 0.1161380 secs] 1992175K->1373969K 
(2408448K), 0.1170380 secs] [Times: user=0.60 sys=0.00, real=0.12 secs]
2011-01-04T18:36:29.794-0500: 317137.845: [GC 317137.846: [ParNew:  
672281K->97964K(757760K), 0.1333770 secs] 1980177K->1406249K 
(2408448K), 0.1341740 secs] [Times: user=0.68 sys=0.00, real=0.13 secs]
2011-01-04T18:36:58.710-0500: 317166.762: [GC 317166.762: [ParNew:  
704172K->76194K(757760K), 0.1261530 secs] 2012457K->1385682K 
(2408448K), 0.1269720 secs] [Times: user=0.64 sys=0.00, real=0.12 secs]
2011-01-04T18:37:27.242-0500: 317195.294: [GC 317195.294: [ParNew:  
682402K->88786K(757760K), 0.1270880 secs] 1991890K->1399799K 
(2408448K), 0.1278570 secs] [Times: user=0.63 sys=0.00, real=0.13 secs]
2011-01-04T18:38:06.888-0500: 317234.940: [GC 317234.940: [ParNew:  
694994K->71919K(757760K), 0.1298460 secs] 2006007K->1385713K 
(2408448K), 0.1305940 secs] [Times: user=0.63 sys=0.00, real=0.13 secs]
2011-01-04T18:38:27.888-0500: 317255.940: [GC 317255.941: [ParNew:  
678127K->94010K(757760K), 0.1288970 secs] 1991921K->1410066K 
(2408448K), 0.1296270 secs] [Times: user=0.67 sys=0.00, real=0.13 secs]
2011-01-04T18:38:58.904-0500: 317286.956: [GC 317286.957: [ParNew:  
700218K->71826K(757760K), 0.1199630 secs] 2016274K->1389379K 
(2408448K), 0.1207380 secs] [Times: user=0.63 sys=0.01, real=0.12 secs]
2011-01-04T18:39:21.896-0500: 317309.947: [GC 317309.948: [ParNew:  
678034K->89229K(757760K), 0.1211860 secs] 1995587K->1407260K 
(2408448K), 0.1219360 secs] [Times: user=0.62 sys=0.00, real=0.12 secs]
2011-01-04T18:39:53.845-0500: 317341.897: [GC 317341.897: [ParNew:  
695437K->76437K(757760K), 0.1253990 secs] 2013468K->1396004K 
(2408448K), 0.1261630 secs] [Times: user=0.67 sys=0.00, real=0.13 secs]
2011-01-04T18:40:18.938-0500: 317366.990: [GC 317366.990: [ParNew:  
682645K->91160K(757760K), 0.1523130 secs] 2002212K->1410757K 
(2408448K), 0.1531110 secs] [Times: user=0.83 sys=0.01, real=0.15 secs]
2011-01-04T18:40:42.708-0500: 317390.760: [GC 317390.760: [ParNew:  
697002K->87022K(757760K), 0.1530570 secs] 2016600K->1408083K 
(2408448K), 0.1538470 secs] [Times: user=0.71 sys=0.01, real=0.16 secs]
2011-01-04T18:40:42.867-0500: 317390.919: [GC [1 CMS-initial-mark:  
1321060K(1650688K)] 1408857K(2408448K), 0.2067320 secs] [Times:  
user=0.21 sys=0.00, real=0.21 secs]
2011-01-04T18:40:43.075-0500: 317391.126: [CMS-concurrent-mark-start]
2011-01-04T18:40:47.573-0500: 317395.625: [CMS-concurrent-mark:  
4.486/4.499 secs] [Times: user=12.20 sys=0.13, real=4.50 secs]
2011-01-04T18:40:47.574-0500: 317395.625: [CMS-concurrent-preclean- 
start]
2011-01-04T18:40:47.636-0500: 317395.688: [CMS-concurrent-preclean:  
0.059/0.063 secs] [Times: user=0.06 sys=0.00, real=0.06 secs]
2011-01-04T18:40:47.637-0500: 317395.688: [CMS-concurrent-abortable- 
preclean-start]
  CMS: abort preclean due to time 2011-01-04T18:40:52.701-0500:  
317400.753: [CMS-concurrent-abortable-preclean: 4.139/5.065 secs]  
[Times: user=5.46 sys=0.05, real=5.07 secs]
2011-01-04T18:40:52.707-0500: 317400.758: [GC[YG occupancy: 432613 K  
(757760 K)]317400.759: [Rescan (parallel) , 0.9054620 secs]317401.664:  
[weak refs processing, 0.0017920 secs]317401.666: [class unloading,  
0.1888550 secs]317401.855: [scrub symbol & string tables, 0.1271210  
secs] [1 CMS-remark: 1321060K(1650688K)] 1753674K(2408448K), 1.2454560  
secs] [Times: user=5.33 sys=0.02, real=1.25 secs]
2011-01-04T18:40:53.953-0500: 317402.005: [CMS-concurrent-sweep-start]
2011-01-04T18:40:57.534-0500: 317405.586: [CMS-concurrent-sweep:  
3.565/3.581 secs] [Times: user=4.90 sys=0.07, real=3.58 secs]
2011-01-04T18:40:57.535-0500: 317405.586: [CMS-concurrent-reset-start]
2011-01-04T18:40:57.595-0500: 317405.646: [CMS-concurrent-reset:  
0.060/0.060 secs] [Times: user=0.06 sys=0.00, real=0.06 secs]

New Server

2011-01-04T18:33:40.473-0500: 223993.306: [CMS-concurrent-reset:  
0.004/0.004 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2011-01-04T18:33:54.357-0500: 224007.190: [GC 224007.190: [ParNew:  
719918K->130085K(757760K), 0.0754960 secs] 2004520K->1415071K 
(2408448K), 0.0758300 secs] [Times: user=0.42 sys=0.00, real=0.07 secs]
2011-01-04T18:34:47.106-0500: 224059.939: [GC 224059.939: [ParNew:  
736293K->107200K(757760K), 0.0819180 secs] 2021279K->1393051K 
(2408448K), 0.0822590 secs] [Times: user=0.45 sys=0.00, real=0.08 secs]
2011-01-04T18:35:07.057-0500: 224079.891: [GC 224079.891: [ParNew:  
713408K->134972K(757760K), 0.0886030 secs] 1999259K->1421467K 
(2408448K), 0.0889400 secs] [Times: user=0.42 sys=0.00, real=0.09 secs]
2011-01-04T18:35:41.454-0500: 224114.287: [GC 224114.288: [ParNew:  
741180K->117493K(757760K), 0.0785680 secs] 2027675K->1404100K 
(2408448K), 0.0789010 secs] [Times: user=0.44 sys=0.00, real=0.08 secs]
2011-01-04T18:36:07.806-0500: 224140.640: [GC 224140.640: [ParNew:  
723701K->139373K(757760K), 0.0919290 secs] 2010308K->1426255K 
(2408448K), 0.0922630 secs] [Times: user=0.49 sys=0.00, real=0.10 secs]
2011-01-04T18:36:40.539-0500: 224173.372: [GC 224173.372: [ParNew:  
745581K->118113K(757760K), 0.0912580 secs] 2032463K->1405613K 
(2408448K), 0.0916060 secs] [Times: user=0.49 sys=0.00, real=0.09 secs]
2011-01-04T18:37:02.915-0500: 224195.748: [GC 224195.748: [ParNew:  
724321K->145077K(757760K), 0.0928130 secs] 2011821K->1434160K 
(2408448K), 0.0931450 secs] [Times: user=0.48 sys=0.00, real=0.09 secs]
2011-01-04T18:37:22.451-0500: 224215.284: [GC 224215.284: [ParNew:  
751285K->107168K(757760K), 0.0779940 secs] 2040368K->1399901K 
(2408448K), 0.0783240 secs] [Times: user=0.40 sys=0.00, real=0.08 secs]
2011-01-04T18:37:53.390-0500: 224246.223: [GC 224246.223: [ParNew:  
713376K->123307K(757760K), 0.0920660 secs] 2006109K->1418928K 
(2408448K), 0.0924470 secs] [Times: user=0.45 sys=0.00, real=0.10 secs]
2011-01-04T18:38:16.100-0500: 224268.934: [GC 224268.934: [ParNew:  
729515K->105143K(757760K), 0.0863940 secs] 2025136K->1402705K 
(2408448K), 0.0867110 secs] [Times: user=0.43 sys=0.00, real=0.09 secs]
2011-01-04T18:38:30.499-0500: 224283.332: [GC 224283.333: [ParNew:  
711351K->127726K(757760K), 0.0918920 secs] 2008913K->1426844K 
(2408448K), 0.0922350 secs] [Times: user=0.48 sys=0.00, real=0.09 secs]
2011-01-04T18:39:09.092-0500: 224321.925: [GC 224321.926: [ParNew:  
733934K->106828K(757760K), 0.0962380 secs] 2033052K->1408490K 
(2408448K), 0.0965730 secs] [Times: user=0.51 sys=0.00, real=0.10 secs]
2011-01-04T18:39:36.359-0500: 224349.192: [GC 224349.192: [ParNew:  
713036K->110615K(757760K), 0.0924370 secs] 2014698K->1413885K 
(2408448K), 0.0927620 secs] [Times: user=0.49 sys=0.00, real=0.09 secs]
2011-01-04T18:39:41.489-0500: 224354.322: [GC 224354.322: [ParNew:  
716823K->130526K(757760K), 0.1148370 secs] 2020093K->1435557K 
(2408448K), 0.1152070 secs] [Times: user=0.51 sys=0.00, real=0.11 secs]
2011-01-04T18:39:58.265-0500: 224371.099: [GC 224371.099: [ParNew:  
736734K->112410K(757760K), 0.1139600 secs] 2041765K->1421693K 
(2408448K), 0.1143070 secs] [Times: user=0.59 sys=0.00, real=0.11 secs]
2011-01-04T18:40:22.876-0500: 224395.709: [GC 224395.709: [ParNew:  
718618K->151551K(757760K), 0.1168680 secs] 2027901K->1464667K 
(2408448K), 0.1172010 secs] [Times: user=0.58 sys=0.00, real=0.12 secs]
2011-01-04T18:40:41.288-0500: 224414.121: [GC 224414.121: [ParNew:  
757759K->131092K(757760K), 0.1072970 secs] 2070875K->1445457K 
(2408448K), 0.1076500 secs] [Times: user=0.61 sys=0.00, real=0.11 secs]
2011-01-04T18:41:06.756-0500: 224439.589: [GC 224439.590: [ParNew:  
737300K->151552K(757760K), 0.1244220 secs] 2051665K->1476639K 
(2408448K), 0.1247530 secs] [Times: user=0.59 sys=0.00, real=0.13 secs]
2011-01-04T18:41:06.882-0500: 224439.715: [GC [1 CMS-initial-mark:  
1325087K(1650688K)] 1477191K(2408448K), 0.1086530 secs] [Times:  
user=0.10 sys=0.00, real=0.10 secs]
2011-01-04T18:41:06.991-0500: 224439.824: [CMS-concurrent-mark-start]
2011-01-04T18:41:08.521-0500: 224441.354: [CMS-concurrent-mark:  
1.529/1.529 secs] [Times: user=3.31 sys=0.02, real=1.54 secs]
2011-01-04T18:41:08.521-0500: 224441.354: [CMS-concurrent-preclean- 
start]
2011-01-04T18:41:08.535-0500: 224441.368: [CMS-concurrent-preclean:  
0.014/0.014 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2011-01-04T18:41:08.535-0500: 224441.368: [CMS-concurrent-abortable- 
preclean-start]
  CMS: abort preclean due to time 2011-01-04T18:41:13.646-0500:  
224446.479: [CMS-concurrent-abortable-preclean: 3.679/5.111 secs]  
[Times: user=3.99 sys=0.03, real=5.11 secs]
2011-01-04T18:41:13.648-0500: 224446.482: [GC[YG occupancy: 220397 K  
(757760 K)]224446.482: [Rescan (parallel) , 0.0440200 secs]224446.526:  
[weak refs processing, 0.0059890 secs]224446.532: [class unloading,  
0.1417810 secs]224446.674: [scrub symbol & string tables, 0.0489470  
secs] [1 CMS-remark: 1325087K(1650688K)] 1545485K(2408448K), 0.2519760  
secs] [Times: user=0.44 sys=0.01, real=0.25 secs]
2011-01-04T18:41:13.901-0500: 224446.734: [CMS-concurrent-sweep-start]
2011-01-04T18:41:14.791-0500: 224447.624: [CMS-concurrent-sweep:  
0.888/0.890 secs] [Times: user=0.91 sys=0.01, real=0.88 secs]
2011-01-04T18:41:14.791-0500: 224447.624: [CMS-concurrent-reset-start]
2011-01-04T18:41:14.795-0500: 224447.628: [CMS-concurrent-reset:  
0.004/0.004 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2011-01-04T18:41:40.336-0500: 224473.169: [GC 224473.170: [ParNew:  
757760K->151551K(757760K), 0.1246970 secs] 2061601K->1458549K 
(2408448K), 0.1250470 secs] [Times: user=0.67 sys=0.00, real=0.13 secs]
2011-01-04T18:42:15.493-0500: 224508.327: [GC 224508.327: [ParNew:  
757759K->151552K(757760K), 0.1264960 secs] 2064757K->1468487K 
(2408448K), 0.1269000 secs] [Times: user=0.58 sys=0.00, real=0.13 secs]
2011-01-04T18:42:30.747-0500: 224523.580: [GC 224523.580: [ParNew:  
757760K->150109K(757760K), 0.1121970 secs] 2074695K->1469299K 
(2408448K), 0.1126290 secs] [Times: user=0.56 sys=0.00, real=0.12 secs]
2011-01-04T18:43:00.779-0500: 224553.612: [GC 224553.612: [ParNew:  
756317K->151552K(757760K), 0.1331190 secs] 2075507K->1479534K 
(2408448K), 0.1334580 secs] [Times: user=0.61 sys=0.00, real=0.13 secs]
2011-01-04T18:43:00.914-0500: 224553.748: [GC [1 CMS-initial-mark:  
1327982K(1650688K)] 1479538K(2408448K), 0.1464550 secs] [Times:  
user=0.15 sys=0.00, real=0.15 secs]
2011-01-04T18:43:01.061-0500: 224553.894: [CMS-concurrent-mark-start]
2011-01-04T18:43:02.661-0500: 224555.494: [CMS-concurrent-mark:  
1.583/1.600 secs] [Times: user=3.70 sys=0.04, real=1.60 secs]
2011-01-04T18:43:02.661-0500: 224555.494: [CMS-concurrent-preclean- 
start]
2011-01-04T18:43:02.675-0500: 224555.509: [CMS-concurrent-preclean:  
0.014/0.014 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2011-01-04T18:43:02.676-0500: 224555.509: [CMS-concurrent-abortable- 
preclean-start]
  CMS: abort preclean due to time 2011-01-04T18:43:07.909-0500:  
224560.742: [CMS-concurrent-abortable-preclean: 3.899/5.234 secs]  
[Times: user=4.31 sys=0.08, real=5.24 secs]
2011-01-04T18:43:07.912-0500: 224560.745: [GC[YG occupancy: 289988 K  
(757760 K)]224560.745: [Rescan (parallel) , 0.1425160 secs]224560.888:  
[weak refs processing, 0.0000900 secs]224560.888: [class unloading,  
0.1369840 secs]224561.025: [scrub symbol & string tables, 0.0474080  
secs] [1 CMS-remark: 1327982K(1650688K)] 1617971K(2408448K), 0.3377450  
secs] [Times: user=0.75 sys=0.00, real=0.34 secs]
2011-01-04T18:43:08.250-0500: 224561.083: [CMS-concurrent-sweep-start]
2011-01-04T18:43:09.138-0500: 224561.971: [CMS-concurrent-sweep:  
0.888/0.888 secs] [Times: user=0.90 sys=0.01, real=0.89 secs]
2011-01-04T18:43:09.138-0500: 224561.972: [CMS-concurrent-reset-start]
2011-01-04T18:43:09.142-0500: 224561.976: [CMS-concurrent-reset:  
0.004/0.004 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2011-01-04T18:43:21.068-0500: 224573.902: [GC 224573.902: [ParNew:  
757360K->151551K(757760K), 0.1132670 secs] 2060610K->1458489K 
(2408448K), 0.1135890 secs] [Times: user=0.59 sys=0.00, real=0.11 secs]
2011-01-04T18:43:46.517-0500: 224599.350: [GC 224599.350: [ParNew:  
757759K->151552K(757760K), 0.1540840 secs] 2064697K->1475010K 
(2408448K), 0.1544610 secs] [Times: user=0.72 sys=0.00, real=0.15 secs]
2011-01-04T18:43:46.673-0500: 224599.506: [GC [1 CMS-initial-mark:  
1323458K(1650688K)] 1480665K(2408448K), 0.1355380 secs] [Times:  
user=0.13 sys=0.00, real=0.14 secs]
2011-01-04T18:43:46.809-0500: 224599.642: [CMS-concurrent-mark-start]
2011-01-04T18:43:48.464-0500: 224601.297: [CMS-concurrent-mark:  
1.656/1.656 secs] [Times: user=3.44 sys=0.04, real=1.66 secs]
2011-01-04T18:43:48.464-0500: 224601.297: [CMS-concurrent-preclean- 
start]
2011-01-04T18:43:48.478-0500: 224601.311: [CMS-concurrent-preclean:  
0.013/0.014 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2011-01-04T18:43:48.478-0500: 224601.311: [CMS-concurrent-abortable- 
preclean-start]
  CMS: abort preclean due to time 2011-01-04T18:43:53.604-0500:  
224606.437: [CMS-concurrent-abortable-preclean: 4.402/5.126 secs]  
[Times: user=5.51 sys=0.13, real=5.13 secs]
2011-01-04T18:43:53.606-0500: 224606.439: [GC[YG occupancy: 354174 K  
(757760 K)]224606.439: [Rescan (parallel) , 0.0620250 secs]224606.501:  
[weak refs processing, 0.0001870 secs]224606.501: [class unloading,  
0.1428830 secs]224606.644: [scrub symbol & string tables, 0.0630710  
secs] [1 CMS-remark: 1323458K(1650688K)] 1677633K(2408448K), 0.2790260  
secs] [Times: user=0.60 sys=0.00, real=0.28 secs]
2011-01-04T18:43:53.886-0500: 224606.719: [CMS-concurrent-sweep-start]
2011-01-04T18:43:54.763-0500: 224607.596: [CMS-concurrent-sweep:  
0.877/0.877 secs] [Times: user=0.96 sys=0.01, real=0.88 secs]
2011-01-04T18:43:54.763-0500: 224607.596: [CMS-concurrent-reset-start]
2011-01-04T18:43:54.767-0500: 224607.600: [CMS-concurrent-reset:  
0.004/0.004 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110105/fcc5fa9c/attachment-0001.html 

From Brian.Harris at morganstanley.com  Wed Jan  5 08:47:21 2011
From: Brian.Harris at morganstanley.com (Brian Harris)
Date: Wed, 5 Jan 2011 11:47:21 -0500
Subject: accessing memory management details
In-Reply-To: <AANLkTinDR2mzHOuRcjRYuWjFSwdMsHf=krXPx9q60i0L@mail.gmail.com>
References: <AANLkTinDR2mzHOuRcjRYuWjFSwdMsHf=krXPx9q60i0L@mail.gmail.com>
Message-ID: <AANLkTinH+7t6Qy3RBypbjyUT-iM3uZxG3tj-Kw7d010C@mail.gmail.com>

Hi again,

An idea has come to mind which may allow rough estimates for some of these
figures. Instrumentation could be used as a hook to register phantom
references of all newly allocated objects with a reference queue. Verbose GC
logs could also be captured and parsed, and the GC timestamps could be fuzzy
matched with the reference enqueue timestamps. Which GC run claimed which
object could(?) then be roughly inferred.

Obviously this is far from ideal on many fronts and I'm not sure it would
produce good results.

On Mon, Dec 20, 2010 at 12:02 PM, Brian Harris
<harribri at morganstanley.com>wrote:

> Hello,
>
> Is it possible for my app to learn where objects are allocated? Young or
> old generation, if young which survivor space? In a LAB? I'm interested in
> where an object was initially allocated, but also any movements (from
> where?, to where?, when?) that happen thereafter.
>
> These use cases illustrate what I had in mind:
>   * JUnit test asserting >90% of allocations of type com.mycompany.Entity
> are done in TLAB
>   * Benchmark tool showing the effect that various JVM tuning parameters
> have on memory management. Displayed as graphs, timelines, etc.
>
> I looked through the 1.6 JVMTI demos and didn't see these sort of details
> being exposed. Also nothing similar looking through BTrace examples. Where
> should I look next?.
>
> Happy holidays,
> Brian Harris
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110105/2a448606/attachment.html 

From jon.masamitsu at oracle.com  Thu Jan  6 07:53:52 2011
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Thu, 06 Jan 2011 07:53:52 -0800
Subject: CPU differences and CMS GC performance
In-Reply-To: <F3315D73-7AE4-4157-86C7-B6075585D1AC@cysystems.com>
References: <F3315D73-7AE4-4157-86C7-B6075585D1AC@cysystems.com>
Message-ID: <4D25E590.7040202@oracle.com>

Craig,

Please send me a few of the ParNew collections for the old server after the
CMS sweep - after this point

2011-01-04T18:40:53.953-0500: 317402.005: [CMS-concurrent-sweep-start]
2011-01-04T18:40:57.534-0500: 317405.586: [CMS-concurrent-sweep: 
3.565/3.581 secs] [Times: user=4.90 sys=0.07, real=3.58 secs]
2011-01-04T18:40:57.535-0500: 317405.586: [CMS-concurrent-reset-start]
2011-01-04T18:40:57.595-0500: 317405.646: [CMS-concurrent-reset: 
0.060/0.060 secs] [Times: user=0.06 sys=0.00, real=0.06 secs]

Jon

On 01/05/11 08:37, craig yeldell wrote:
> I have a production environment that consists of two different 
> hardware configurations.  We have just replaced half our environment 
> with new  servers that have more RAM and different CPU's.  We left the 
> GC params the same,  the GC behavior has been quite surprising.  I 
> expected to see faster collections on the New and CMS due to the raw 
> cpu test results of the Nehalem, and this held true.  However I also 
> found fewer New collections, and increased CMS collections.
> In a nutshell, I assumed upgrading the CPU alone without touching the 
> GC args would result in improved GC performance.
>
>
> Java info:
>
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
> -server -Xmx2500m -Xms2500m -Dsun.rmi.dgc.client.gcInterval=3600000 
> -Dsun.rmi.dgc.server.gcInterval=3600000 -Xss256k 
> -XX:+DisableExplicitGC -XX:PermSize=384m -XX:MaxPermSize=384m 
> -XX:MaxNewSize=888m -XX:NewSize=888m -XX:ParallelGCThreads=6 
> -verbose:gc -Xloggc:vm01.gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
> -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC 
> -XX:+CMSClassUnloadingEnabled -XX:SurvivorRatio=4 -XX:+UseTLAB 
> -XX:+UseParNewGC -XX:+UseBiasedLocking -XX:TargetSurvivorRatio=75 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=80
>
> NOTE: Each server has 4 jvm's running and all vm's in the environment 
> are equally load balanced.
>
> CPU info:
>
> OLD
> 16GB
> CPU frequency:    3.003 GHz (Xeon)
> Number of CPUs: 4
> Number of cores: 8
> Number of threads: 16
>
> NEW
> *32GB*
> CPU frequency:    2.533 GHz (Xeon E5540 Nehalem)
> Number of CPUs: 2
> Number of cores: 8
> Number of threads: 16
>
> Example GC snippet:
>
> Old server:
> 2011-01-04T18:24:02.073-0500: 316390.125: [CMS-concurrent-reset: 
> 0.061/0.061 secs] [Times: user=0.06 sys=0.00, real=0.06 secs]
> 2011-01-04T18:24:12.890-0500: 316400.941: [GC 316400.942: [ParNew: 
> 690252K->93718K(757760K), 0.1300730 secs] 
> 1968696K->1372788K(2408448K), 0.1308690 secs] [Times: user=0.66 
> sys=0.00, real=0.13 secs]
> 2011-01-04T18:24:50.647-0500: 316438.698: [GC 316438.699: [ParNew: 
> 699926K->74785K(757760K), 0.1296920 secs] 
> 1978996K->1355277K(2408448K), 0.1304530 secs] [Times: user=0.61 
> sys=0.00, real=0.13 secs]
> 2011-01-04T18:25:17.854-0500: 316465.905: [GC 316465.906: [ParNew: 
> 680928K->86498K(757760K), 0.1505500 secs] 
> 1961420K->1369830K(2408448K), 0.1513450 secs] [Times: user=0.71 
> sys=0.00, real=0.15 secs]
> 2011-01-04T18:25:50.762-0500: 316498.814: [GC 316498.814: [ParNew: 
> 692706K->69120K(757760K), 0.1385690 secs] 
> 1976038K->1352581K(2408448K), 0.1393620 secs] [Times: user=0.63 
> sys=0.00, real=0.14 secs]
> 2011-01-04T18:26:22.644-0500: 316530.695: [GC 316530.696: [ParNew: 
> 675328K->86476K(757760K), 0.1493580 secs] 
> 1958789K->1372863K(2408448K), 0.1501220 secs] [Times: user=0.66 
> sys=0.00, real=0.15 secs]
> 2011-01-04T18:27:01.951-0500: 316570.003: [GC 316570.003: [ParNew: 
> 692684K->70259K(757760K), 0.1250090 secs] 
> 1979071K->1357514K(2408448K), 0.1257610 secs] [Times: user=0.62 
> sys=0.00, real=0.13 secs]
> 2011-01-04T18:27:28.542-0500: 316596.594: [GC 316596.594: [ParNew: 
> 676467K->88180K(757760K), 0.1030450 secs] 
> 1963722K->1375664K(2408448K), 0.1038090 secs] [Times: user=0.54 
> sys=0.00, real=0.10 secs]
> 2011-01-04T18:27:52.200-0500: 316620.251: [GC 316620.252: [ParNew: 
> 694388K->64015K(757760K), 0.1551250 secs] 
> 1981872K->1351774K(2408448K), 0.1560860 secs] [Times: user=0.79 
> sys=0.00, real=0.15 secs]
> 2011-01-04T18:28:07.789-0500: 316635.841: [GC 316635.841: [ParNew: 
> 670223K->94609K(757760K), 0.1321720 secs] 
> 1957982K->1385077K(2408448K), 0.1329350 secs] [Times: user=0.65 
> sys=0.01, real=0.13 secs]
> 2011-01-04T18:28:38.020-0500: 316666.071: [GC 316666.072: [ParNew: 
> 700817K->64339K(757760K), 0.1169430 secs] 
> 1991285K->1358764K(2408448K), 0.1177070 secs] [Times: user=0.59 
> sys=0.00, real=0.12 secs]
> 2011-01-04T18:29:13.319-0500: 316701.370: [GC 316701.371: [ParNew: 
> 670547K->75587K(757760K), 0.1352400 secs] 
> 1964972K->1370278K(2408448K), 0.1360140 secs] [Times: user=0.63 
> sys=0.00, real=0.13 secs]
> 2011-01-04T18:29:46.586-0500: 316734.637: [GC 316734.638: [ParNew: 
> 681795K->67573K(757760K), 0.1257050 secs] 
> 1976486K->1364337K(2408448K), 0.1265250 secs] [Times: user=0.56 
> sys=0.01, real=0.12 secs]
> 2011-01-04T18:30:31.525-0500: 316779.577: [GC 316779.578: [ParNew: 
> 673781K->65516K(757760K), 0.1143930 secs] 
> 1970545K->1363488K(2408448K), 0.1154980 secs] [Times: user=0.54 
> sys=0.00, real=0.12 secs]
> 2011-01-04T18:30:56.159-0500: 316804.211: [GC 316804.211: [ParNew: 
> 671724K->54331K(757760K), 0.1067750 secs] 
> 1969696K->1353537K(2408448K), 0.1075690 secs] [Times: user=0.53 
> sys=0.00, real=0.11 secs]
> 2011-01-04T18:32:01.256-0500: 316869.308: [GC 316869.308: [ParNew: 
> 660539K->66703K(757760K), 0.1036880 secs] 
> 1959745K->1367053K(2408448K), 0.1044590 secs] [Times: user=0.49 
> sys=0.00, real=0.11 secs]
> 2011-01-04T18:32:31.967-0500: 316900.019: [GC 316900.019: [ParNew: 
> 672911K->54742K(757760K), 0.1121020 secs] 
> 1973261K->1355232K(2408448K), 0.1128580 secs] [Times: user=0.51 
> sys=0.00, real=0.11 secs]
> 2011-01-04T18:33:05.187-0500: 316933.239: [GC 316933.239: [ParNew: 
> 660950K->66592K(757760K), 0.0990360 secs] 
> 1961440K->1368889K(2408448K), 0.0998210 secs] [Times: user=0.49 
> sys=0.00, real=0.10 secs]
> 2011-01-04T18:33:38.693-0500: 316966.745: [GC 316966.745: [ParNew: 
> 672800K->57860K(757760K), 0.1132210 secs] 
> 1975097K->1361914K(2408448K), 0.1139990 secs] [Times: user=0.53 
> sys=0.01, real=0.12 secs]
> 2011-01-04T18:34:33.087-0500: 317021.139: [GC 317021.139: [ParNew: 
> 664068K->74055K(757760K), 0.1268070 secs] 
> 1968122K->1379190K(2408448K), 0.1275670 secs] [Times: user=0.52 
> sys=0.00, real=0.12 secs]
> 2011-01-04T18:34:59.371-0500: 317047.423: [GC 317047.423: [ParNew: 
> 680263K->67454K(757760K), 0.1354760 secs] 
> 1985398K->1374886K(2408448K), 0.1363060 secs] [Times: user=0.61 
> sys=0.01, real=0.14 secs]
> 2011-01-04T18:35:23.757-0500: 317071.809: [GC 317071.809: [ParNew: 
> 673662K->78394K(757760K), 0.1624010 secs] 
> 1981094K->1385967K(2408448K), 0.1632150 secs] [Times: user=0.78 
> sys=0.00, real=0.17 secs]
> 2011-01-04T18:35:45.117-0500: 317093.169: [GC 317093.169: [ParNew: 
> 684602K->66073K(757760K), 0.1161380 secs] 
> 1992175K->1373969K(2408448K), 0.1170380 secs] [Times: user=0.60 
> sys=0.00, real=0.12 secs]
> 2011-01-04T18:36:29.794-0500: 317137.845: [GC 317137.846: [ParNew: 
> 672281K->97964K(757760K), 0.1333770 secs] 
> 1980177K->1406249K(2408448K), 0.1341740 secs] [Times: user=0.68 
> sys=0.00, real=0.13 secs]
> 2011-01-04T18:36:58.710-0500: 317166.762: [GC 317166.762: [ParNew: 
> 704172K->76194K(757760K), 0.1261530 secs] 
> 2012457K->1385682K(2408448K), 0.1269720 secs] [Times: user=0.64 
> sys=0.00, real=0.12 secs]
> 2011-01-04T18:37:27.242-0500: 317195.294: [GC 317195.294: [ParNew: 
> 682402K->88786K(757760K), 0.1270880 secs] 
> 1991890K->1399799K(2408448K), 0.1278570 secs] [Times: user=0.63 
> sys=0.00, real=0.13 secs]
> 2011-01-04T18:38:06.888-0500: 317234.940: [GC 317234.940: [ParNew: 
> 694994K->71919K(757760K), 0.1298460 secs] 
> 2006007K->1385713K(2408448K), 0.1305940 secs] [Times: user=0.63 
> sys=0.00, real=0.13 secs]
> 2011-01-04T18:38:27.888-0500: 317255.940: [GC 317255.941: [ParNew: 
> 678127K->94010K(757760K), 0.1288970 secs] 
> 1991921K->1410066K(2408448K), 0.1296270 secs] [Times: user=0.67 
> sys=0.00, real=0.13 secs]
> 2011-01-04T18:38:58.904-0500: 317286.956: [GC 317286.957: [ParNew: 
> 700218K->71826K(757760K), 0.1199630 secs] 
> 2016274K->1389379K(2408448K), 0.1207380 secs] [Times: user=0.63 
> sys=0.01, real=0.12 secs]
> 2011-01-04T18:39:21.896-0500: 317309.947: [GC 317309.948: [ParNew: 
> 678034K->89229K(757760K), 0.1211860 secs] 
> 1995587K->1407260K(2408448K), 0.1219360 secs] [Times: user=0.62 
> sys=0.00, real=0.12 secs]
> 2011-01-04T18:39:53.845-0500: 317341.897: [GC 317341.897: [ParNew: 
> 695437K->76437K(757760K), 0.1253990 secs] 
> 2013468K->1396004K(2408448K), 0.1261630 secs] [Times: user=0.67 
> sys=0.00, real=0.13 secs]
> 2011-01-04T18:40:18.938-0500: 317366.990: [GC 317366.990: [ParNew: 
> 682645K->91160K(757760K), 0.1523130 secs] 
> 2002212K->1410757K(2408448K), 0.1531110 secs] [Times: user=0.83 
> sys=0.01, real=0.15 secs]
> 2011-01-04T18:40:42.708-0500: 317390.760: [GC 317390.760: [ParNew: 
> 697002K->87022K(757760K), 0.1530570 secs] 
> 2016600K->1408083K(2408448K), 0.1538470 secs] [Times: user=0.71 
> sys=0.01, real=0.16 secs]
> 2011-01-04T18:40:42.867-0500: 317390.919: [GC [1 CMS-initial-mark: 
> 1321060K(1650688K)] 1408857K(2408448K), 0.2067320 secs] [Times: 
> user=0.21 sys=0.00, real=0.21 secs]
> 2011-01-04T18:40:43.075-0500: 317391.126: [CMS-concurrent-mark-start]
> 2011-01-04T18:40:47.573-0500: 317395.625: [CMS-concurrent-mark: 
> 4.486/4.499 secs] [Times: user=12.20 sys=0.13, real=4.50 secs]
> 2011-01-04T18:40:47.574-0500: 317395.625: [CMS-concurrent-preclean-start]
> 2011-01-04T18:40:47.636-0500: 317395.688: [CMS-concurrent-preclean: 
> 0.059/0.063 secs] [Times: user=0.06 sys=0.00, real=0.06 secs]
> 2011-01-04T18:40:47.637-0500: 317395.688: 
> [CMS-concurrent-abortable-preclean-start]
>  CMS: abort preclean due to time 2011-01-04T18:40:52.701-0500: 
> 317400.753: [CMS-concurrent-abortable-preclean: 4.139/5.065 secs] 
> [Times: user=5.46 sys=0.05, real=5.07 secs]
> 2011-01-04T18:40:52.707-0500: 317400.758: [GC[YG occupancy: 432613 K 
> (757760 K)]317400.759: [Rescan (parallel) , 0.9054620 secs]317401.664: 
> [weak refs processing, 0.0017920 secs]317401.666: [class unloading, 
> 0.1888550 secs]317401.855: [scrub symbol & string tables, 0.1271210 
> secs] [1 CMS-remark: 1321060K(1650688K)] 1753674K(2408448K), 1.2454560 
> secs] [Times: user=5.33 sys=0.02, real=1.25 secs]
> 2011-01-04T18:40:53.953-0500: 317402.005: [CMS-concurrent-sweep-start]
> 2011-01-04T18:40:57.534-0500: 317405.586: [CMS-concurrent-sweep: 
> 3.565/3.581 secs] [Times: user=4.90 sys=0.07, real=3.58 secs]
> 2011-01-04T18:40:57.535-0500: 317405.586: [CMS-concurrent-reset-start]
> 2011-01-04T18:40:57.595-0500: 317405.646: [CMS-concurrent-reset: 
> 0.060/0.060 secs] [Times: user=0.06 sys=0.00, real=0.06 secs]
>
> New Server
> 2011-01-04T18:33:40.473-0500: 223993.306: [CMS-concurrent-reset: 
> 0.004/0.004 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> 2011-01-04T18:33:54.357-0500: 224007.190: [GC 224007.190: [ParNew: 
> 719918K->130085K(757760K), 0.0754960 secs] 
> 2004520K->1415071K(2408448K), 0.0758300 secs] [Times: user=0.42 
> sys=0.00, real=0.07 secs]
> 2011-01-04T18:34:47.106-0500: 224059.939: [GC 224059.939: [ParNew: 
> 736293K->107200K(757760K), 0.0819180 secs] 
> 2021279K->1393051K(2408448K), 0.0822590 secs] [Times: user=0.45 
> sys=0.00, real=0.08 secs]
> 2011-01-04T18:35:07.057-0500: 224079.891: [GC 224079.891: [ParNew: 
> 713408K->134972K(757760K), 0.0886030 secs] 
> 1999259K->1421467K(2408448K), 0.0889400 secs] [Times: user=0.42 
> sys=0.00, real=0.09 secs]
> 2011-01-04T18:35:41.454-0500: 224114.287: [GC 224114.288: [ParNew: 
> 741180K->117493K(757760K), 0.0785680 secs] 
> 2027675K->1404100K(2408448K), 0.0789010 secs] [Times: user=0.44 
> sys=0.00, real=0.08 secs]
> 2011-01-04T18:36:07.806-0500: 224140.640: [GC 224140.640: [ParNew: 
> 723701K->139373K(757760K), 0.0919290 secs] 
> 2010308K->1426255K(2408448K), 0.0922630 secs] [Times: user=0.49 
> sys=0.00, real=0.10 secs]
> 2011-01-04T18:36:40.539-0500: 224173.372: [GC 224173.372: [ParNew: 
> 745581K->118113K(757760K), 0.0912580 secs] 
> 2032463K->1405613K(2408448K), 0.0916060 secs] [Times: user=0.49 
> sys=0.00, real=0.09 secs]
> 2011-01-04T18:37:02.915-0500: 224195.748: [GC 224195.748: [ParNew: 
> 724321K->145077K(757760K), 0.0928130 secs] 
> 2011821K->1434160K(2408448K), 0.0931450 secs] [Times: user=0.48 
> sys=0.00, real=0.09 secs]
> 2011-01-04T18:37:22.451-0500: 224215.284: [GC 224215.284: [ParNew: 
> 751285K->107168K(757760K), 0.0779940 secs] 
> 2040368K->1399901K(2408448K), 0.0783240 secs] [Times: user=0.40 
> sys=0.00, real=0.08 secs]
> 2011-01-04T18:37:53.390-0500: 224246.223: [GC 224246.223: [ParNew: 
> 713376K->123307K(757760K), 0.0920660 secs] 
> 2006109K->1418928K(2408448K), 0.0924470 secs] [Times: user=0.45 
> sys=0.00, real=0.10 secs]
> 2011-01-04T18:38:16.100-0500: 224268.934: [GC 224268.934: [ParNew: 
> 729515K->105143K(757760K), 0.0863940 secs] 
> 2025136K->1402705K(2408448K), 0.0867110 secs] [Times: user=0.43 
> sys=0.00, real=0.09 secs]
> 2011-01-04T18:38:30.499-0500: 224283.332: [GC 224283.333: [ParNew: 
> 711351K->127726K(757760K), 0.0918920 secs] 
> 2008913K->1426844K(2408448K), 0.0922350 secs] [Times: user=0.48 
> sys=0.00, real=0.09 secs]
> 2011-01-04T18:39:09.092-0500: 224321.925: [GC 224321.926: [ParNew: 
> 733934K->106828K(757760K), 0.0962380 secs] 
> 2033052K->1408490K(2408448K), 0.0965730 secs] [Times: user=0.51 
> sys=0.00, real=0.10 secs]
> 2011-01-04T18:39:36.359-0500: 224349.192: [GC 224349.192: [ParNew: 
> 713036K->110615K(757760K), 0.0924370 secs] 
> 2014698K->1413885K(2408448K), 0.0927620 secs] [Times: user=0.49 
> sys=0.00, real=0.09 secs]
> 2011-01-04T18:39:41.489-0500: 224354.322: [GC 224354.322: [ParNew: 
> 716823K->130526K(757760K), 0.1148370 secs] 
> 2020093K->1435557K(2408448K), 0.1152070 secs] [Times: user=0.51 
> sys=0.00, real=0.11 secs]
> 2011-01-04T18:39:58.265-0500: 224371.099: [GC 224371.099: [ParNew: 
> 736734K->112410K(757760K), 0.1139600 secs] 
> 2041765K->1421693K(2408448K), 0.1143070 secs] [Times: user=0.59 
> sys=0.00, real=0.11 secs]
> 2011-01-04T18:40:22.876-0500: 224395.709: [GC 224395.709: [ParNew: 
> 718618K->151551K(757760K), 0.1168680 secs] 
> 2027901K->1464667K(2408448K), 0.1172010 secs] [Times: user=0.58 
> sys=0.00, real=0.12 secs]
> 2011-01-04T18:40:41.288-0500: 224414.121: [GC 224414.121: [ParNew: 
> 757759K->131092K(757760K), 0.1072970 secs] 
> 2070875K->1445457K(2408448K), 0.1076500 secs] [Times: user=0.61 
> sys=0.00, real=0.11 secs]
> 2011-01-04T18:41:06.756-0500: 224439.589: [GC 224439.590: [ParNew: 
> 737300K->151552K(757760K), 0.1244220 secs] 
> 2051665K->1476639K(2408448K), 0.1247530 secs] [Times: user=0.59 
> sys=0.00, real=0.13 secs]
> 2011-01-04T18:41:06.882-0500: 224439.715: [GC [1 CMS-initial-mark: 
> 1325087K(1650688K)] 1477191K(2408448K), 0.1086530 secs] [Times: 
> user=0.10 sys=0.00, real=0.10 secs]
> 2011-01-04T18:41:06.991-0500: 224439.824: [CMS-concurrent-mark-start]
> 2011-01-04T18:41:08.521-0500: 224441.354: [CMS-concurrent-mark: 
> 1.529/1.529 secs] [Times: user=3.31 sys=0.02, real=1.54 secs]
> 2011-01-04T18:41:08.521-0500: 224441.354: [CMS-concurrent-preclean-start]
> 2011-01-04T18:41:08.535-0500: 224441.368: [CMS-concurrent-preclean: 
> 0.014/0.014 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
> 2011-01-04T18:41:08.535-0500: 224441.368: 
> [CMS-concurrent-abortable-preclean-start]
>  CMS: abort preclean due to time 2011-01-04T18:41:13.646-0500: 
> 224446.479: [CMS-concurrent-abortable-preclean: 3.679/5.111 secs] 
> [Times: user=3.99 sys=0.03, real=5.11 secs]
> 2011-01-04T18:41:13.648-0500: 224446.482: [GC[YG occupancy: 220397 K 
> (757760 K)]224446.482: [Rescan (parallel) , 0.0440200 secs]224446.526: 
> [weak refs processing, 0.0059890 secs]224446.532: [class unloading, 
> 0.1417810 secs]224446.674: [scrub symbol & string tables, 0.0489470 
> secs] [1 CMS-remark: 1325087K(1650688K)] 1545485K(2408448K), 0.2519760 
> secs] [Times: user=0.44 sys=0.01, real=0.25 secs]
> 2011-01-04T18:41:13.901-0500: 224446.734: [CMS-concurrent-sweep-start]
> 2011-01-04T18:41:14.791-0500: 224447.624: [CMS-concurrent-sweep: 
> 0.888/0.890 secs] [Times: user=0.91 sys=0.01, real=0.88 secs]
> 2011-01-04T18:41:14.791-0500: 224447.624: [CMS-concurrent-reset-start]
> 2011-01-04T18:41:14.795-0500: 224447.628: [CMS-concurrent-reset: 
> 0.004/0.004 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
> 2011-01-04T18:41:40.336-0500: 224473.169: [GC 224473.170: [ParNew: 
> 757760K->151551K(757760K), 0.1246970 secs] 
> 2061601K->1458549K(2408448K), 0.1250470 secs] [Times: user=0.67 
> sys=0.00, real=0.13 secs]
> 2011-01-04T18:42:15.493-0500: 224508.327: [GC 224508.327: [ParNew: 
> 757759K->151552K(757760K), 0.1264960 secs] 
> 2064757K->1468487K(2408448K), 0.1269000 secs] [Times: user=0.58 
> sys=0.00, real=0.13 secs]
> 2011-01-04T18:42:30.747-0500: 224523.580: [GC 224523.580: [ParNew: 
> 757760K->150109K(757760K), 0.1121970 secs] 
> 2074695K->1469299K(2408448K), 0.1126290 secs] [Times: user=0.56 
> sys=0.00, real=0.12 secs]
> 2011-01-04T18:43:00.779-0500: 224553.612: [GC 224553.612: [ParNew: 
> 756317K->151552K(757760K), 0.1331190 secs] 
> 2075507K->1479534K(2408448K), 0.1334580 secs] [Times: user=0.61 
> sys=0.00, real=0.13 secs]
> 2011-01-04T18:43:00.914-0500: 224553.748: [GC [1 CMS-initial-mark: 
> 1327982K(1650688K)] 1479538K(2408448K), 0.1464550 secs] [Times: 
> user=0.15 sys=0.00, real=0.15 secs]
> 2011-01-04T18:43:01.061-0500: 224553.894: [CMS-concurrent-mark-start]
> 2011-01-04T18:43:02.661-0500: 224555.494: [CMS-concurrent-mark: 
> 1.583/1.600 secs] [Times: user=3.70 sys=0.04, real=1.60 secs]
> 2011-01-04T18:43:02.661-0500: 224555.494: [CMS-concurrent-preclean-start]
> 2011-01-04T18:43:02.675-0500: 224555.509: [CMS-concurrent-preclean: 
> 0.014/0.014 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
> 2011-01-04T18:43:02.676-0500: 224555.509: 
> [CMS-concurrent-abortable-preclean-start]
>  CMS: abort preclean due to time 2011-01-04T18:43:07.909-0500: 
> 224560.742: [CMS-concurrent-abortable-preclean: 3.899/5.234 secs] 
> [Times: user=4.31 sys=0.08, real=5.24 secs]
> 2011-01-04T18:43:07.912-0500: 224560.745: [GC[YG occupancy: 289988 K 
> (757760 K)]224560.745: [Rescan (parallel) , 0.1425160 secs]224560.888: 
> [weak refs processing, 0.0000900 secs]224560.888: [class unloading, 
> 0.1369840 secs]224561.025: [scrub symbol & string tables, 0.0474080 
> secs] [1 CMS-remark: 1327982K(1650688K)] 1617971K(2408448K), 0.3377450 
> secs] [Times: user=0.75 sys=0.00, real=0.34 secs]
> 2011-01-04T18:43:08.250-0500: 224561.083: [CMS-concurrent-sweep-start]
> 2011-01-04T18:43:09.138-0500: 224561.971: [CMS-concurrent-sweep: 
> 0.888/0.888 secs] [Times: user=0.90 sys=0.01, real=0.89 secs]
> 2011-01-04T18:43:09.138-0500: 224561.972: [CMS-concurrent-reset-start]
> 2011-01-04T18:43:09.142-0500: 224561.976: [CMS-concurrent-reset: 
> 0.004/0.004 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> 2011-01-04T18:43:21.068-0500: 224573.902: [GC 224573.902: [ParNew: 
> 757360K->151551K(757760K), 0.1132670 secs] 
> 2060610K->1458489K(2408448K), 0.1135890 secs] [Times: user=0.59 
> sys=0.00, real=0.11 secs]
> 2011-01-04T18:43:46.517-0500: 224599.350: [GC 224599.350: [ParNew: 
> 757759K->151552K(757760K), 0.1540840 secs] 
> 2064697K->1475010K(2408448K), 0.1544610 secs] [Times: user=0.72 
> sys=0.00, real=0.15 secs]
> 2011-01-04T18:43:46.673-0500: 224599.506: [GC [1 CMS-initial-mark: 
> 1323458K(1650688K)] 1480665K(2408448K), 0.1355380 secs] [Times: 
> user=0.13 sys=0.00, real=0.14 secs]
> 2011-01-04T18:43:46.809-0500: 224599.642: [CMS-concurrent-mark-start]
> 2011-01-04T18:43:48.464-0500: 224601.297: [CMS-concurrent-mark: 
> 1.656/1.656 secs] [Times: user=3.44 sys=0.04, real=1.66 secs]
> 2011-01-04T18:43:48.464-0500: 224601.297: [CMS-concurrent-preclean-start]
> 2011-01-04T18:43:48.478-0500: 224601.311: [CMS-concurrent-preclean: 
> 0.013/0.014 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
> 2011-01-04T18:43:48.478-0500: 224601.311: 
> [CMS-concurrent-abortable-preclean-start]
>  CMS: abort preclean due to time 2011-01-04T18:43:53.604-0500: 
> 224606.437: [CMS-concurrent-abortable-preclean: 4.402/5.126 secs] 
> [Times: user=5.51 sys=0.13, real=5.13 secs]
> 2011-01-04T18:43:53.606-0500: 224606.439: [GC[YG occupancy: 354174 K 
> (757760 K)]224606.439: [Rescan (parallel) , 0.0620250 secs]224606.501: 
> [weak refs processing, 0.0001870 secs]224606.501: [class unloading, 
> 0.1428830 secs]224606.644: [scrub symbol & string tables, 0.0630710 
> secs] [1 CMS-remark: 1323458K(1650688K)] 1677633K(2408448K), 0.2790260 
> secs] [Times: user=0.60 sys=0.00, real=0.28 secs]
> 2011-01-04T18:43:53.886-0500: 224606.719: [CMS-concurrent-sweep-start]
> 2011-01-04T18:43:54.763-0500: 224607.596: [CMS-concurrent-sweep: 
> 0.877/0.877 secs] [Times: user=0.96 sys=0.01, real=0.88 secs]
> 2011-01-04T18:43:54.763-0500: 224607.596: [CMS-concurrent-reset-start]
> 2011-01-04T18:43:54.767-0500: 224607.600: [CMS-concurrent-reset: 
> 0.004/0.004 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110106/0f8b86ad/attachment-0001.html 

From ycraig at cysystems.com  Thu Jan  6 11:45:05 2011
From: ycraig at cysystems.com (craig yeldell)
Date: Thu, 6 Jan 2011 14:45:05 -0500
Subject: CPU differences and CMS GC performance
In-Reply-To: <4D25E590.7040202@oracle.com>
References: <F3315D73-7AE4-4157-86C7-B6075585D1AC@cysystems.com>
	<4D25E590.7040202@oracle.com>
Message-ID: <6F344849-76A2-4405-AF85-401810BCE786@cysystems.com>

2011-01-04T18:40:57.595-0500: 317405.646: [CMS-concurrent-reset:  
0.060/0.060 secs] [Times: user=0.06 sys=0.00, real=0.06 secs]
2011-01-04T18:41:13.786-0500: 317421.837: [GC 317421.838: [ParNew:  
693227K->86037K(757760K), 0.1202290 secs] 1969847K->1363829K 
(2408448K), 0.1209860 secs] [Times: user=0.60 sys=0.00, real=0.12 secs]
2011-01-04T18:41:55.496-0500: 317463.548: [GC 317463.548: [ParNew:  
692245K->72000K(757760K), 0.1097190 secs] 1970037K->1351019K 
(2408448K), 0.1104820 secs] [Times: user=0.55 sys=0.00, real=0.11 secs]
2011-01-04T18:42:16.564-0500: 317484.615: [GC 317484.616: [ParNew:  
678208K->77171K(757760K), 0.1477570 secs] 1957227K->1356301K 
(2408448K), 0.1485110 secs] [Times: user=0.65 sys=0.00, real=0.15 secs]
2011-01-04T18:42:36.084-0500: 317504.135: [GC 317504.136: [ParNew:  
683379K->95237K(757760K), 0.1488350 secs] 1962509K->1378558K 
(2408448K), 0.1496090 secs] [Times: user=0.75 sys=0.00, real=0.15 secs]
2011-01-04T18:43:05.470-0500: 317533.521: [GC 317533.522: [ParNew:  
701445K->78163K(757760K), 0.1416930 secs] 1984766K->1362521K 
(2408448K), 0.1424460 secs] [Times: user=0.71 sys=0.00, real=0.15 secs]
2011-01-04T18:43:45.488-0500: 317573.540: [GC 317573.540: [ParNew:  
684371K->77448K(757760K), 0.1396650 secs] 1968729K->1363169K 
(2408448K), 0.1404410 secs] [Times: user=0.64 sys=0.00, real=0.14 secs]
2011-01-04T18:44:23.197-0500: 317611.249: [GC 317611.249: [ParNew:  
683656K->101827K(757760K), 0.1286880 secs] 1969377K->1387892K 
(2408448K), 0.1295040 secs] [Times: user=0.64 sys=0.00, real=0.13 secs]
2011-01-04T18:45:03.946-0500: 317651.997: [GC 317651.998: [ParNew:  
708035K->82138K(757760K), 0.1496900 secs] 1994100K->1368999K 
(2408448K), 0.1504710 secs]


On Jan 6, 2011, at 10:53 AM, Jon Masamitsu wrote:

> 2011-01-04T18:40:57

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110106/68d039cf/attachment.html 

From jon.masamitsu at oracle.com  Thu Jan  6 13:09:27 2011
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Thu, 06 Jan 2011 13:09:27 -0800
Subject: CPU differences and CMS GC performance
In-Reply-To: <6F344849-76A2-4405-AF85-401810BCE786@cysystems.com>
References: <F3315D73-7AE4-4157-86C7-B6075585D1AC@cysystems.com>
	<4D25E590.7040202@oracle.com>
	<6F344849-76A2-4405-AF85-401810BCE786@cysystems.com>
Message-ID: <4D262F87.2070504@oracle.com>

Craig,

I looked at the heap occupancy in the first ParNew collection after the 
CMS sweep
and it looks like there is about 1.36g of live data for the old server 
(for example
from the first entry below)

2011-01-04T18:41:13.786-0500: 317421.837: [GC 317421.838: [ParNew: 693227K->86037K(757760K), 0.1202290 secs]
1969847K->1363829K(2408448K), 0.1209860 secs] [Times: user=0.60
sys=0.00, real=0.12 secs]

In the new server the number looks more like 1.46g

2011-01-04T18:41:40.336-0500: 224473.169: [GC 224473.170: [ParNew: 
757760K->151551K(757760K), 0.1246970 secs] 2061601K->1458549K(2408448K), 
0.1250470 secs] [Times: user=0.67 sys=0.00, real=0.13 secs]

*Maybe your application has a higher throughput on the new server
and has more live data in flight.  If you think that might be true a
larger heap would help.

You should verify what I've seen from these snippets.  Or better yet if you
have a way of telling how much data the application is using at any 
particular
instant and that number is larger on the new server, then use
a larger heap.

Jon*

On 01/06/11 11:45, craig yeldell wrote:
> 2011-01-04T18:40:57.595-0500: 317405.646: [CMS-concurrent-reset: 
> 0.060/0.060 secs] [Times: user=0.06 sys=0.00, real=0.06 secs]
> 2011-01-04T18:41:13.786-0500: 317421.837: [GC 317421.838: [ParNew: 
> 693227K->86037K(757760K), 0.1202290 secs] 
> 1969847K->1363829K(2408448K), 0.1209860 secs] [Times: user=0.60 
> sys=0.00, real=0.12 secs]
> 2011-01-04T18:41:55.496-0500: 317463.548: [GC 317463.548: [ParNew: 
> 692245K->72000K(757760K), 0.1097190 secs] 
> 1970037K->1351019K(2408448K), 0.1104820 secs] [Times: user=0.55 
> sys=0.00, real=0.11 secs]
> 2011-01-04T18:42:16.564-0500: 317484.615: [GC 317484.616: [ParNew: 
> 678208K->77171K(757760K), 0.1477570 secs] 
> 1957227K->1356301K(2408448K), 0.1485110 secs] [Times: user=0.65 
> sys=0.00, real=0.15 secs]
> 2011-01-04T18:42:36.084-0500: 317504.135: [GC 317504.136: [ParNew: 
> 683379K->95237K(757760K), 0.1488350 secs] 
> 1962509K->1378558K(2408448K), 0.1496090 secs] [Times: user=0.75 
> sys=0.00, real=0.15 secs]
> 2011-01-04T18:43:05.470-0500: 317533.521: [GC 317533.522: [ParNew: 
> 701445K->78163K(757760K), 0.1416930 secs] 
> 1984766K->1362521K(2408448K), 0.1424460 secs] [Times: user=0.71 
> sys=0.00, real=0.15 secs]
> 2011-01-04T18:43:45.488-0500: 317573.540: [GC 317573.540: [ParNew: 
> 684371K->77448K(757760K), 0.1396650 secs] 
> 1968729K->1363169K(2408448K), 0.1404410 secs] [Times: user=0.64 
> sys=0.00, real=0.14 secs]
> 2011-01-04T18:44:23.197-0500: 317611.249: [GC 317611.249: [ParNew: 
> 683656K->101827K(757760K), 0.1286880 secs] 
> 1969377K->1387892K(2408448K), 0.1295040 secs] [Times: user=0.64 
> sys=0.00, real=0.13 secs]
> 2011-01-04T18:45:03.946-0500: 317651.997: [GC 317651.998: [ParNew: 
> 708035K->82138K(757760K), 0.1496900 secs] 
> 1994100K->1368999K(2408448K), 0.1504710 secs]
>
>
> On Jan 6, 2011, at 10:53 AM, Jon Masamitsu wrote:
>
>> 2011-01-04T18:40:57
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110106/89360d31/attachment.html 

From y.s.ramakrishna at oracle.com  Fri Jan  7 12:12:39 2011
From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna)
Date: Fri, 07 Jan 2011 12:12:39 -0800
Subject: accessing memory management details
In-Reply-To: <AANLkTinH+7t6Qy3RBypbjyUT-iM3uZxG3tj-Kw7d010C@mail.gmail.com>
References: <AANLkTinDR2mzHOuRcjRYuWjFSwdMsHf=krXPx9q60i0L@mail.gmail.com>
	<AANLkTinH+7t6Qy3RBypbjyUT-iM3uZxG3tj-Kw7d010C@mail.gmail.com>
Message-ID: <4D2773B7.7010501@oracle.com>

These are most easily/conveniently done inside the JVM rather than
via the use of Reference queues etc. But even then the overheads
for tracking these things can add up, so you may be better off
instrumenting them directly into the JVM (or perhaps having Dtrace hooks
or like at strategic points which could then be leveraged suitably)
and which would hopefully not impose too much additonal overhead when
not enabled (and a facility to enable it on-the-fly/on-demand).

Features like this have been discussed a number of times in the
last few years, but have apparently not happened because of the
non-trivial overheads that one does not want in production JVM's.

-- ramki

On 1/5/2011 8:47 AM, Brian Harris wrote:
> Hi again,
>
> An idea has come to mind which may allow rough estimates for some of these
> figures. Instrumentation could be used as a hook to register phantom
> references of all newly allocated objects with a reference queue. Verbose GC
> logs could also be captured and parsed, and the GC timestamps could be fuzzy
> matched with the reference enqueue timestamps. Which GC run claimed which
> object could(?) then be roughly inferred.
>
> Obviously this is far from ideal on many fronts and I'm not sure it would
> produce good results.
>
> On Mon, Dec 20, 2010 at 12:02 PM, Brian Harris
> <harribri at morganstanley.com>wrote:
>
>> Hello,
>>
>> Is it possible for my app to learn where objects are allocated? Young or
>> old generation, if young which survivor space? In a LAB? I'm interested in
>> where an object was initially allocated, but also any movements (from
>> where?, to where?, when?) that happen thereafter.
>>
>> These use cases illustrate what I had in mind:
>>    * JUnit test asserting>90% of allocations of type com.mycompany.Entity
>> are done in TLAB
>>    * Benchmark tool showing the effect that various JVM tuning parameters
>> have on memory management. Displayed as graphs, timelines, etc.
>>
>> I looked through the 1.6 JVMTI demos and didn't see these sort of details
>> being exposed. Also nothing similar looking through BTrace examples. Where
>> should I look next?.
>>
>> Happy holidays,
>> Brian Harris
>>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From kimchy at gmail.com  Mon Jan 10 22:30:51 2011
From: kimchy at gmail.com (Shay Banon)
Date: Tue, 11 Jan 2011 08:30:51 +0200
Subject: Long ParNew Collection Time
Message-ID: <04C34FE5DDAA457EADA57035C183E9B8@gmail.com>

Hi, 


Running a JVM (1.6u23) on ubuntu 10.0 (machine has 32gb) getting this GC log output:


3856.471: [GC 3856.471: [ParNew: 118016K->13056K(118016K), 37.8958700 secs] 689035K->593566K(8375552K), 37.8959970 secs] [Times: user=0.00 sys=0.00, real=37.90 secs] 


As you can see, the it seems like time wise, its all real.


Flags used are:


-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xms8g -Xmx8g


Tried also with:


* -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
* -XX:-CMSConcurrentMTEnabled


Checked vmstat during the run, and don't see any swapping going on (si/so are 0, no swap).


Any idea on how to try and solve this?


cheers,
-shay.banon


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110111/f2341420/attachment.html 

From y.s.ramakrishna at oracle.com  Mon Jan 10 23:40:38 2011
From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna)
Date: Mon, 10 Jan 2011 23:40:38 -0800
Subject: Long ParNew Collection Time
In-Reply-To: <04C34FE5DDAA457EADA57035C183E9B8@gmail.com>
References: <04C34FE5DDAA457EADA57035C183E9B8@gmail.com>
Message-ID: <4D2C0976.4030208@oracle.com>

Is the system TOD being changed? May be an overzealous NTP daemon?
Can you make sure NTP is disabled and see if the problem goes away?

-- ramki


On 1/10/2011 10:30 PM, Shay Banon wrote:
> Hi,
>
>
> Running a JVM (1.6u23) on ubuntu 10.0 (machine has 32gb) getting this GC log output:
>
>
> 3856.471: [GC 3856.471: [ParNew: 118016K->13056K(118016K), 37.8958700 secs] 689035K->593566K(8375552K), 37.8959970 secs] [Times: user=0.00 sys=0.00, real=37.90 secs]
>
>
> As you can see, the it seems like time wise, its all real.
>
>
> Flags used are:
>
>
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xms8g -Xmx8g
>
>
> Tried also with:
>
>
> * -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> * -XX:-CMSConcurrentMTEnabled
>
>
> Checked vmstat during the run, and don't see any swapping going on (si/so are 0, no swap).
>
>
> Any idea on how to try and solve this?
>
>
> cheers,
> -shay.banon
>
>
>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From kimchy at gmail.com  Tue Jan 11 05:05:02 2011
From: kimchy at gmail.com (Shay Banon)
Date: Tue, 11 Jan 2011 15:05:02 +0200
Subject: Long ParNew Collection Time
In-Reply-To: <4D2C0976.4030208@oracle.com>
References: <04C34FE5DDAA457EADA57035C183E9B8@gmail.com>
	<4D2C0976.4030208@oracle.com>
Message-ID: <D72F2059985F45D58F07112CBA139CCB@gmail.com>

I will give it a go. Are you suggesting that its just a time reporting problem, and there is no actual long GC? It does correlate with the system hanging and not doing anything during that time (not responding to any requests from clients).


-shay.banon


On Tuesday, January 11, 2011 at 9:40 AM, Y. Srinivas Ramakrishna wrote:

> Is the system TOD being changed? May be an overzealous NTP daemon?
> Can you make sure NTP is disabled and see if the problem goes away?
> 
> -- ramki
> 
> 
> On 1/10/2011 10:30 PM, Shay Banon wrote:
> 
> > Hi,
> > 
> > 
> > Running a JVM (1.6u23) on ubuntu 10.0 (machine has 32gb) getting this GC log output:
> > 
> > 
> > 3856.471: [GC 3856.471: [ParNew: 118016K->13056K(118016K), 37.8958700 secs] 689035K->593566K(8375552K), 37.8959970 secs] [Times: user=0.00 sys=0.00, real=37.90 secs]
> > 
> > 
> > As you can see, the it seems like time wise, its all real.
> > 
> > 
> > Flags used are:
> > 
> > 
> > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xms8g -Xmx8g
> > 
> > 
> > Tried also with:
> > 
> > 
> > * -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> > * -XX:-CMSConcurrentMTEnabled
> > 
> > 
> > Checked vmstat during the run, and don't see any swapping going on (si/so are 0, no swap).
> > 
> > 
> > Any idea on how to try and solve this?
> > 
> > 
> > cheers,
> > -shay.banon
> > 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > hotspot-gc-use mailing list
> > hotspot-gc-use at openjdk.java.net
> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> > 
> > 
> 
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110111/40d50f4c/attachment.html 

From chkwok at digibites.nl  Tue Jan 11 05:21:56 2011
From: chkwok at digibites.nl (Chi Ho Kwok)
Date: Tue, 11 Jan 2011 14:21:56 +0100
Subject: Long ParNew Collection Time
In-Reply-To: <D72F2059985F45D58F07112CBA139CCB@gmail.com>
References: <04C34FE5DDAA457EADA57035C183E9B8@gmail.com>
	<4D2C0976.4030208@oracle.com>
	<D72F2059985F45D58F07112CBA139CCB@gmail.com>
Message-ID: <AANLkTi=Rn94OTXBcojDZK=f4a0oO+yds0qdX5UN+dNzA@mail.gmail.com>

On Tue, Jan 11, 2011 at 2:05 PM, Shay Banon <kimchy at gmail.com> wrote:

>  I will give it a go. Are you suggesting that its just a time reporting
> problem, and there is no actual long GC? It does correlate with the system
> hanging and not doing anything during that time (not responding to any
> requests from clients).
>
> -shay.banon
>

Did you set vm.swappiness = 0 in sysctl? Linux tends to swap out inactive
regions of RAM, which takes forever to swap in again if the garbage
collector touches that area after a while. free -m must show swap used at
near zero, otherwise you're the victim of pro active swapping.

Chi Ho Kwok
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110111/0de58e89/attachment.html 

From kimchy at gmail.com  Tue Jan 11 05:25:05 2011
From: kimchy at gmail.com (Shay Banon)
Date: Tue, 11 Jan 2011 15:25:05 +0200
Subject: Long ParNew Collection Time
In-Reply-To: <AANLkTi=Rn94OTXBcojDZK=f4a0oO+yds0qdX5UN+dNzA@mail.gmail.com>
References: <04C34FE5DDAA457EADA57035C183E9B8@gmail.com>
	<4D2C0976.4030208@oracle.com>
	<D72F2059985F45D58F07112CBA139CCB@gmail.com>
	<AANLkTi=Rn94OTXBcojDZK=f4a0oO+yds0qdX5UN+dNzA@mail.gmail.com>
Message-ID: <3845F9F32AC448E1A4CD44977A4A4992@gmail.com>

 I ran vmstat and did not see any swap going on (si/s0 are 0, swap is 0), are you suggesting that it might still be swapping?
On Tuesday, January 11, 2011 at 3:21 PM, Chi Ho Kwok wrote:

> On Tue, Jan 11, 2011 at 2:05 PM, Shay Banon <kimchy at gmail.com> wrote:
> 
> > I will give it a go. Are you suggesting that its just a time reporting problem, and there is no actual long GC? It does correlate with the system hanging and not doing anything during that time (not responding to any requests from clients).
> > 
> > 
> > -shay.banon
> > 
> > 
> > 
> > 
> 
> 
> Did you set vm.swappiness = 0 in sysctl? Linux tends to swap out inactive regions of RAM, which takes forever to swap in again if the garbage collector touches that area after a while. free -m must show swap used at near zero, otherwise you're the victim of pro active swapping.
> 
> 
> Chi Ho Kwok
> 
> 
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110111/1d6142a0/attachment.html 

From chkwok at digibites.nl  Tue Jan 11 05:29:26 2011
From: chkwok at digibites.nl (Chi Ho Kwok)
Date: Tue, 11 Jan 2011 14:29:26 +0100
Subject: Long ParNew Collection Time
In-Reply-To: <3845F9F32AC448E1A4CD44977A4A4992@gmail.com>
References: <04C34FE5DDAA457EADA57035C183E9B8@gmail.com>
	<4D2C0976.4030208@oracle.com>
	<D72F2059985F45D58F07112CBA139CCB@gmail.com>
	<AANLkTi=Rn94OTXBcojDZK=f4a0oO+yds0qdX5UN+dNzA@mail.gmail.com>
	<3845F9F32AC448E1A4CD44977A4A4992@gmail.com>
Message-ID: <AANLkTikCvun9f+MwBEsffZDGOov6U-p4Szg558geL8Ao@mail.gmail.com>

You have vmstat installed? How about sar? sar -B [-f /var/log/sysstat/sa10]
and scan the majflt/s column, it should hover around zero, but before the
adjustment of vm.swappiness, it was in the hundreds on a "bad GC".

On Tue, Jan 11, 2011 at 2:25 PM, Shay Banon <kimchy at gmail.com> wrote:

>  I ran vmstat and did not see any swap going on (si/s0 are 0, swap is 0),
> are you suggesting that it might still be swapping?
>
> On Tuesday, January 11, 2011 at 3:21 PM, Chi Ho Kwok wrote:
>
> On Tue, Jan 11, 2011 at 2:05 PM, Shay Banon <kimchy at gmail.com> wrote:
>
>   I will give it a go. Are you suggesting that its just a time reporting
> problem, and there is no actual long GC? It does correlate with the system
> hanging and not doing anything during that time (not responding to any
> requests from clients).
>
> -shay.banon
>
>
> Did you set vm.swappiness = 0 in sysctl? Linux tends to swap out inactive
> regions of RAM, which takes forever to swap in again if the garbage
> collector touches that area after a while. free -m must show swap used at
> near zero, otherwise you're the victim of pro active swapping.
>
> Chi Ho Kwok
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110111/7fae04d1/attachment.html 

From y.s.ramakrishna at oracle.com  Tue Jan 11 08:48:59 2011
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Tue, 11 Jan 2011 08:48:59 -0800
Subject: Long ParNew Collection Time
In-Reply-To: <D72F2059985F45D58F07112CBA139CCB@gmail.com>
References: <04C34FE5DDAA457EADA57035C183E9B8@gmail.com>
	<4D2C0976.4030208@oracle.com>
	<D72F2059985F45D58F07112CBA139CCB@gmail.com>
Message-ID: <4D2C89FB.1050902@oracle.com>


Hi Shay --

On 01/11/11 05:05, Shay Banon wrote:
> 
> I will give it a go. Are you suggesting that its just a time reporting 
> problem, and there is no actual long GC? It does correlate with the 
> system hanging and not doing anything during that time (not responding 
> to any requests from clients).

Yes, that was what I was suggesting, but from your description
it sounds like it's a "real" problem (pun intended). So we'll need to
see what's happening. Disabling NTP would be good anyway in order
to investigate further. If you already have the data, you might want
to lay the GC pause data, the client response time data and
the vmstat data in an (aligned) time series plot (or other) to see
if there's any correlation (rather than focusing on single events).

-- ramki

> 
> -shay.banon
> 
> On Tuesday, January 11, 2011 at 9:40 AM, Y. Srinivas Ramakrishna wrote:
> 
>> Is the system TOD being changed? May be an overzealous NTP daemon?
>> Can you make sure NTP is disabled and see if the problem goes away?
>>
>> -- ramki
>>
>>
>> On 1/10/2011 10:30 PM, Shay Banon wrote:
>>> Hi,
>>>
>>>
>>> Running a JVM (1.6u23) on ubuntu 10.0 (machine has 32gb) getting this 
>>> GC log output:
>>>
>>>
>>> 3856.471: [GC 3856.471: [ParNew: 118016K->13056K(118016K), 37.8958700 
>>> secs] 689035K->593566K(8375552K), 37.8959970 secs] [Times: user=0.00 
>>> sys=0.00, real=37.90 secs]
>>>
>>>
>>> As you can see, the it seems like time wise, its all real.
>>>
>>>
>>> Flags used are:
>>>
>>>
>>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xms8g -Xmx8g
>>>
>>>
>>> Tried also with:
>>>
>>>
>>> * -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
>>> -XX:CMSInitiatingOccupancyFraction=75
>>> * -XX:-CMSConcurrentMTEnabled
>>>
>>>
>>> Checked vmstat during the run, and don't see any swapping going on 
>>> (si/so are 0, no swap).
>>>
>>>
>>> Any idea on how to try and solve this?
>>>
>>>
>>> cheers,
>>> -shay.banon
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> 

From y.s.ramakrishna at oracle.com  Tue Jan 11 09:22:22 2011
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Tue, 11 Jan 2011 09:22:22 -0800
Subject: Long ParNew Collection Time
In-Reply-To: <4D2C89FB.1050902@oracle.com>
References: <04C34FE5DDAA457EADA57035C183E9B8@gmail.com>
	<4D2C0976.4030208@oracle.com>
	<D72F2059985F45D58F07112CBA139CCB@gmail.com>
	<4D2C89FB.1050902@oracle.com>
Message-ID: <4D2C91CE.3060809@oracle.com>

I'd do all of this after following Chi-Ho's suggestions
on swappiness settings, of course. Also, -XX:+AlwaysPreTouch
to touch all of the heap pages up front, just in case this is an artifact
of touching the occasional new page(s) for the first time in the
life of the process and hitting a Linux virtual memory pothole.
But this is all conjecture of course...

-- ramki

On 01/11/11 08:48, Y. S. Ramakrishna wrote:
> 
> Hi Shay --
> 
> On 01/11/11 05:05, Shay Banon wrote:
>>
>> I will give it a go. Are you suggesting that its just a time reporting 
>> problem, and there is no actual long GC? It does correlate with the 
>> system hanging and not doing anything during that time (not responding 
>> to any requests from clients).
> 
> Yes, that was what I was suggesting, but from your description
> it sounds like it's a "real" problem (pun intended). So we'll need to
> see what's happening. Disabling NTP would be good anyway in order
> to investigate further. If you already have the data, you might want
> to lay the GC pause data, the client response time data and
> the vmstat data in an (aligned) time series plot (or other) to see
> if there's any correlation (rather than focusing on single events).
> 
> -- ramki
> 
>>
>> -shay.banon
>>
>> On Tuesday, January 11, 2011 at 9:40 AM, Y. Srinivas Ramakrishna wrote:
>>
>>> Is the system TOD being changed? May be an overzealous NTP daemon?
>>> Can you make sure NTP is disabled and see if the problem goes away?
>>>
>>> -- ramki
>>>
>>>
>>> On 1/10/2011 10:30 PM, Shay Banon wrote:
>>>> Hi,
>>>>
>>>>
>>>> Running a JVM (1.6u23) on ubuntu 10.0 (machine has 32gb) getting 
>>>> this GC log output:
>>>>
>>>>
>>>> 3856.471: [GC 3856.471: [ParNew: 118016K->13056K(118016K), 
>>>> 37.8958700 secs] 689035K->593566K(8375552K), 37.8959970 secs] 
>>>> [Times: user=0.00 sys=0.00, real=37.90 secs]
>>>>
>>>>
>>>> As you can see, the it seems like time wise, its all real.
>>>>
>>>>
>>>> Flags used are:
>>>>
>>>>
>>>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xms8g -Xmx8g
>>>>
>>>>
>>>> Tried also with:
>>>>
>>>>
>>>> * -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
>>>> -XX:CMSInitiatingOccupancyFraction=75
>>>> * -XX:-CMSConcurrentMTEnabled
>>>>
>>>>
>>>> Checked vmstat during the run, and don't see any swapping going on 
>>>> (si/so are 0, no swap).
>>>>
>>>>
>>>> Any idea on how to try and solve this?
>>>>
>>>>
>>>> cheers,
>>>> -shay.banon
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net 
>>>> <mailto:hotspot-gc-use at openjdk.java.net>
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
> 

From y.s.ramakrishna at oracle.com  Tue Jan 11 09:40:48 2011
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Tue, 11 Jan 2011 09:40:48 -0800
Subject: Long ParNew Collection Time
In-Reply-To: <4D2C91CE.3060809@oracle.com>
References: <04C34FE5DDAA457EADA57035C183E9B8@gmail.com>	<4D2C0976.4030208@oracle.com>	<D72F2059985F45D58F07112CBA139CCB@gmail.com>	<4D2C89FB.1050902@oracle.com>
	<4D2C91CE.3060809@oracle.com>
Message-ID: <4D2C9620.4070300@oracle.com>


On 01/11/11 09:22, Y. S. Ramakrishna wrote:
> I'd do all of this after following Chi-Ho's suggestions
> on swappiness settings, of course. Also, -XX:+AlwaysPreTouch
> to touch all of the heap pages up front, just in case this is an artifact
> of touching the occasional new page(s) for the first time in the
> life of the process and hitting a Linux virtual memory pothole.

Hmm, in that case, i'd normally expect the time to be attributed to system time
(as i'd expect any swapping to as well), so i'm not sure if this will help...

I'll shut up now :-)
-- ramki


> But this is all conjecture of course...

From kimchy at gmail.com  Tue Jan 11 12:18:41 2011
From: kimchy at gmail.com (Shay Banon)
Date: Tue, 11 Jan 2011 22:18:41 +0200
Subject: Long ParNew Collection Time
In-Reply-To: <4D2C9620.4070300@oracle.com>
References: <04C34FE5DDAA457EADA57035C183E9B8@gmail.com>
	<4D2C0976.4030208@oracle.com>
	<D72F2059985F45D58F07112CBA139CCB@gmail.com>
	<4D2C89FB.1050902@oracle.com> <4D2C91CE.3060809@oracle.com>
	<4D2C9620.4070300@oracle.com>
Message-ID: <B4D7787E2AAB4BF6BBC95BAE7593798E@gmail.com>

Ended up being a problem with ubuntu 10.04 (not sure what). Upgrading to Ubuntu 10.10 fixed it...
On Tuesday, January 11, 2011 at 7:40 PM, Y. S. Ramakrishna wrote:

> 
> 
> On 01/11/11 09:22, Y. S. Ramakrishna wrote:
> 
> > I'd do all of this after following Chi-Ho's suggestions
> > on swappiness settings, of course. Also, -XX:+AlwaysPreTouch
> > to touch all of the heap pages up front, just in case this is an artifact
> > of touching the occasional new page(s) for the first time in the
> > life of the process and hitting a Linux virtual memory pothole.
> > 
> > 
> 
> Hmm, in that case, i'd normally expect the time to be attributed to system time
> (as i'd expect any swapping to as well), so i'm not sure if this will help...
> 
> I'll shut up now :-)
> -- ramki
> 
> 
> 
> > But this is all conjecture of course...
> > 
> > 
> 
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110111/75389a76/attachment.html 

From todd at cloudera.com  Fri Jan 21 11:38:52 2011
From: todd at cloudera.com (Todd Lipcon)
Date: Fri, 21 Jan 2011 11:38:52 -0800
Subject: G1GC Full GCs
In-Reply-To: <AANLkTimKNhq7vvx0-S1yiJj6GA_O1ySin2Oys-LeYA8H@mail.gmail.com>
References: <AANLkTiniKm1ghON3AIl5JPNzOVqyasTxxQaRsGCPBnba@mail.gmail.com>
	<AANLkTinjYAdubXNSV2tdic1Xn-jlTFs_iswOBpU3xNGQ@mail.gmail.com>
	<4C3CB3E9.4040305@oracle.com>
	<AANLkTim63XWKdILkuMcALP5rOPL_ga_rbCfYN3AAYalT@mail.gmail.com>
	<AANLkTimKNhq7vvx0-S1yiJj6GA_O1ySin2Oys-LeYA8H@mail.gmail.com>
Message-ID: <AANLkTikM5uwyBuYAfASbdbf9v_VibOWnwq6EUs19NkSM@mail.gmail.com>

Hey folks,

Took some time over the last day or two to follow up on this on the latest
checkout of JDK7. I added some more instrumentation and my findings so far
are:

1) CMS is definitely hitting a fragmentation problem. Our workload is pretty
much guaranteed to fragment, and I don't think there's anything CMS can do
about it - see the following graphs:
http://people.apache.org/~todd/hbase-fragmentation/

<http://people.apache.org/~todd/hbase-fragmentation/>2) G1GC is hitting full
pauses because the "other" pause time predictions end up higher than the
minimum pause length. I'm seeing the following sequence:

- A single choose_cset operation for a non_young region takes a long time
(unclear yet why this is happening, see below)
- This inflates the predict_non_young_other_time_ms(1) result to a value
greater than my pause goal
- From then on, it doesn't collect any more non-young regions (ever!)
because any region will be considered expensive regardless of the estimated
rset or collection costs
- The heap slowly fills up with non-young regions until we reach a full GC

3) So the question is why the choose_cset is taking a long time. I added
getrusage() calls to wrap the choose_cset operation. Here's some output with
extra logging:

--> About to choose cset at   725.458
Adding 1 young regions to the CSet
  Added 0x0000000000000001 Young Regions to CS.
    (3596288 KB left in heap.)
    (picked region; 9.948053ms predicted; 21.164738ms remaining; 2448kb
marked; 2448kb maxlive; 59-59% liveness)
    (3593839 KB left in heap.)
predict_region_elapsed_time_ms: 10.493828ms total, 5.486072ms rs scan (14528
cnum), 4.828102 copy time (2333800 bytes), 0.179654 other time
    (picked region; 10.493828ms predicted; 11.216685ms remaining; 2279kb
marked; 2279kb maxlive; 55-55% liveness)
predict_region_elapsed_time_ms: 10.493828ms total, 5.486072ms rs scan (14528
cnum), 4.828102 copy time (2333800 bytes), 0.179654 other time
    (3591560 KB left in heap.)
predict_region_elapsed_time_ms: 10.346346ms total, 5.119780ms rs scan (13558
cnum), 5.046912 copy time (2439568 bytes), 0.179654 other time
predict_region_elapsed_time_ms: 10.407672ms total, 5.333135ms rs scan (14123
cnum), 4.894882 copy time (2366080 bytes), 0.179654 other time
    (no more marked regions; next region too expensive (adaptive; predicted
10.407672ms > remaining 0.722857ms))
Resource usage of choose_cset:majflt: 0 nswap: 0  nvcsw: 6  nivcsw: 0
--> About to prepare RS scan at   725.657

The resource usage line with nvcsw=6 indicates there were 6 voluntary
context switches while choosing cset. This choose_cset operation took
198.9ms all in choosing non-young.

So, why are there voluntary context switches while choosing cset? This isn't
swapping -- that should show under majflt, right? My only theories are:
- are any locks acquired in choose_cset?
- maybe the gc logging itself is blocking on IO to the log file? ie the
instrumentation itself is interfering with the algorithm?


Regardless, I think a single length choose_non_young_cset operation
shouldn't be allowed to push the prediction above the time boundary and
trigger this issue. Perhaps a simple workaround is that, whenever a
collection chooses no non_young regions, it should contribute a value of 0
to the average?

I'll give this heuristic a try on my build and see if it solves the issue.

-Todd

On Tue, Jul 27, 2010 at 3:08 PM, Todd Lipcon <todd at cloudera.com> wrote:

> Hi all,
>
> Back from my vacation and took some time yesterday and today to build a
> fresh JDK 7 with some additional debug printouts from Peter's patch.
>
> What I found was a bit different - the rset scanning estimates are low, but
> I consistently am seeing "Other time" estimates in the >40ms range. Given my
> pause time goal of 20ms, these estimates are I think excluding most of the
> regions from collectability. I haven't been able to dig around yet to figure
> out where the long estimate for "other time" is coming from - in the
> collections logged it sometimes shows fairly high "Other" but the "Choose
> CSet" component is very short. I'll try to add some more debug info to the
> verbose logging and rerun some tests over the next couple of days.
>
> At the moment I'm giving the JRockit VM a try to see how its deterministic
> GC stacks up against G1 and CMS.
>
> Thanks
> -Todd
>
>
> On Tue, Jul 13, 2010 at 5:15 PM, Peter Schuller <
> peter.schuller at infidyne.com> wrote:
>
>> Ramki/Tony,
>>
>> > Any chance of setting +PrintHeapAtGC and -XX:+PrintHeapAtGCExtended and
>> > sending us the log, or part of it (say between two Full GCs)? Be
>> prepared:
>> > this will generate piles of output. But it will give us per-region
>> > information that might shed more light on the cause of the issue....
>> thanks,
>>
>> So what I have in terms of data is (see footnotes for urls references in
>> []):
>>
>> (a) A patch[1] that prints some additional information about estimated
>> costs of region eviction, and disables the GC efficiency check that
>> normally terminates selection of regions. (Note: This is a throw-away
>> patch for debugging; it's not intended as a suggested change for
>> inclusion.)
>>
>> (b) A log[2] showing the output of a test run I did just now, with
>> both your flags above and my patch enabled (but without disabling the
>> efficiency check). It shows fallback to full GC when the actual live
>> set size is 252 MB, and the maximum heap size is 2 GB (in other words,
>> ~ 12% liveness). An easy way to find the point of full gc is to search
>> for the string 'full 1'.
>>
>> (c) A file[3] with the effective VM options during the test.
>>
>> (d) Instructions for how to run the test to reproduce it (I'll get to
>> that at the end; it's simplified relative to previously).
>>
>> (e) Nature of the test.
>>
>> Discussion:
>>
>> WIth respect to region information: I originally tried it in response
>> to your recommendation earlier, but I found I did not see the
>> information I was after. Perhaps I was just misreading it, but I
>> mostly just saw either 0% or 100% fullness, and never the actual
>> liveness estimate as produced by the mark phase. In the log I am
>> referring to in this E-Mail, you can see that the last printout of
>> region information just before the live GC fits this pattern; I just
>> don't see anything that looks like legitimate liveness information
>> being printed. (I don't have time to dig back into it right now to
>> double-check what it's printing.)
>>
>> If you scroll up from the point of the full gc until you find a bunch
>> of output starting with "predict_region_elapsed_time_ms" you see some
>> output resulting from the patch, with pretty extreme values such as:
>>
>> predict_region_elapsed_time_ms: 34.378642ms total, 34.021154ms rs scan
>> (46542 cnum), 0.040069 copy time (20704 bytes), 0.317419 other time
>> predict_region_elapsed_time_ms: 45.020866ms total, 44.653222ms rs scan
>> (61087 cnum), 0.050225 copy time (25952 bytes), 0.317419 other time
>> predict_region_elapsed_time_ms: 16.250033ms total, 15.887065ms rs scan
>> (21734 cnum), 0.045550 copy time (23536 bytes), 0.317419 other time
>> predict_region_elapsed_time_ms: 226.942877ms total, 226.559163ms rs
>> scan (309940 cnum), 0.066296 copy time (34256 bytes), 0.317419 other
>> time
>> predict_region_elapsed_time_ms: 542.344828ms total, 541.954750ms rs
>> scan (741411 cnum), 0.072659 copy time (37544 bytes), 0.317419 other
>> time
>> predict_region_elapsed_time_ms: 668.595597ms total, 668.220877ms rs
>> scan (914147 cnum), 0.057301 copy time (29608 bytes), 0.317419 other
>> time
>>
>> So in the most extreme case in the excerpt above, that's > half a
>> second of estimate rset scanning time for a single region with 914147
>> cards to be scanned. While not all are that extreme, lots and lots of
>> regions are very expensive and almost only due to rset scanning costs.
>>
>> If you scroll down a bit to the first (and ONLY) partial that happened
>> after the statistics accumulating from the marking phase, we see more
>> output resulting form the patch. At the end, we see:
>>
>>    (picked region; 0.345890ms predicted; 1.317244ms remaining; 15kb
>> marked; 15kb maxlive; 1-1% liveness)
>>    (393380 KB left in heap.)
>>    (picked region; 0.345963ms predicted; 0.971354ms remaining; 15kb
>> marked; 15kb maxlive; 1-1% liveness)
>>    (393365 KB left in heap.)
>>    (picked region; 0.346036ms predicted; 0.625391ms remaining; 15kb
>> marked; 15kb maxlive; 1-1% liveness)
>>    (393349 KB left in heap.)
>>    (no more marked regions; next region too expensive (adaptive;
>> predicted 0.346036ms > remaining 0.279355ms))
>>
>> So in other words, it picked a bunch of regions in order of "lowest
>> hanging fruit". The *least* low hanging fruit picked still had
>> liveness at 1%; in other words, there's plenty of further regions that
>> ideally should be collected because they contain almost no garbage
>> (ignoring the cost of collecting them).
>>
>> In this case, it stopped picking regions because the next region to be
>> picked, though cheap, was the straw that broke the camel's neck and we
>> simply exceeded the alloted time for this particular GC.
>>
>> However, after this partial completes, it reverts back to doing just
>> young gc:s. In other words, even though there's *plenty* of regions
>> with very low liveness, further partials aren't happening.
>>
>> By applying this part of the patch:
>>
>>  -           (adaptive_young_list_length() &&
>>  +           (adaptive_young_list_length() && false && // scodetodo
>>
>> I artificially force g1 to not fall back to doing young gc:s for
>> efficiency reasons. When I run with that change, I don't experience
>> the slow perpetual growth until fallback to full GC. If I remember
>> correctly though, the rset scanning cost is in fact high, but I don't
>> have details saved and I'm afraid I don't have time to re-run those
>> tests right now and compare numbers.
>>
>> Reproducing it:
>>
>> I made some changes and the test case should now hopefully be easy to
>> run assuming you have maven installed. The github project is at:
>>
>>   http://github.com/scode/httpgctest
>>
>> There is a README, but the shortest possible instructions to
>> re-produce the test that I did:
>>
>>  git clone git://github.com/scode/httpgctest.git
>>  cd httpgctest.git
>>  git checkout 20100714_1       # grab from appropriate tag, in case I
>> change master
>>  mvn package
>>  HTTPGCTEST_LOGGC=gc.log ./run.sh
>>
>> That should start the http server; then run concurrently:
>>
>>  while [ 1 ] ; do curl 'http://localhost:9191/dropdata?ratio=0.10' ;
>> curl 'http://localhost:9191/gendata?amount=25000' ; sleep 0.1  ; done
>>
>> And then just wait and observe.
>>
>> Nature of the test:
>>
>> So the test if run as above will essentially reach a steady state of
>> equilibrium with about 25000 pieces of data in a clojure immutable
>> map. The result is that a significant amount of new data is being
>> allocated, but very little writing to old regions is happening. The
>> garbage generated is very well spread out over the entire heap because
>> it goes through all objects and drops 10% (the ratio=0.10) for each
>> iteration, after which it adds 25000 new items.
>>
>> In other words; not a lot of old gen writing, but lots of writes to
>> the young gen referencing objects in the old gen.
>>
>> [1] http://distfiles.scode.org/mlref/g1/g1_region_live_stats_hack.patch
>> [2]
>> http://distfiles.scode.org/mlref/gctest/httpgctest-g1-fullgc-20100714/gc-fullfallback.log
>> [3]
>> http://distfiles.scode.org/mlref/gctest/httpgctest-g1-fullgc-20100714/vmoptions.txt
>>
>> --
>> / Peter Schuller
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


-- 
Todd Lipcon
Software Engineer, Cloudera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110121/40e6c946/attachment.html 

From todd at cloudera.com  Fri Jan 21 16:51:42 2011
From: todd at cloudera.com (Todd Lipcon)
Date: Fri, 21 Jan 2011 16:51:42 -0800
Subject: G1GC Full GCs
In-Reply-To: <AANLkTikM5uwyBuYAfASbdbf9v_VibOWnwq6EUs19NkSM@mail.gmail.com>
References: <AANLkTiniKm1ghON3AIl5JPNzOVqyasTxxQaRsGCPBnba@mail.gmail.com>
	<AANLkTinjYAdubXNSV2tdic1Xn-jlTFs_iswOBpU3xNGQ@mail.gmail.com>
	<4C3CB3E9.4040305@oracle.com>
	<AANLkTim63XWKdILkuMcALP5rOPL_ga_rbCfYN3AAYalT@mail.gmail.com>
	<AANLkTimKNhq7vvx0-S1yiJj6GA_O1ySin2Oys-LeYA8H@mail.gmail.com>
	<AANLkTikM5uwyBuYAfASbdbf9v_VibOWnwq6EUs19NkSM@mail.gmail.com>
Message-ID: <AANLkTi=0vSQ8E0EALDrs3j=mWNaAF52i_mY0Qvu=D6o9@mail.gmail.com>

A bit more data. I did the following patch:

@@ -1560,6 +1575,19 @@

       _non_young_other_cost_per_region_ms_seq->add(non_young_other_time_ms
/
                                          (double)
_recorded_non_young_regions);
+    } else {
+      // no non-young gen collections - if our prediction is high enough,
we would
+      // never collect non-young again, so push it back towards zero so we
give it
+      // another try.
+      double predicted_other_time = predict_non_young_other_time_ms(1);
+      if (predicted_other_time > MaxGCPauseMillis/2.0) {
+        if (G1PolicyVerbose > 0) {
+          gclog_or_tty->print_cr(
+            "Predicted non-young other time %.1f is too large compared to
max pause time. Weighting down.",
+            predicted_other_time);
+        }
+        _non_young_other_cost_per_region_ms_seq->add(0.0);
+      }
     }

and this mostly solved the problem described above. Now I get a full GC
every 45-50 minutes which is way improved from what it was before.

I still seem to be putting off GC of non-young regions too much though. I
did some analysis of the G1 log and made these graphs:

http://people.apache.org/~todd/hbase-fragmentation/g1-graphing.png

The top graph is a heat map of the number of young (pink color) and
non-young (blue) in each collection.
The middle graph is the post-collection heap usage over time in MB
The bottom graph is a heat map and smoothed line graph of the number of
millis spent per collection. The target in this case is 50ms.

 A few interesting things:
 - not sure what causes the sort of periodic striated pattern in the number
of young generation regions chosen
 - most of the time no old gen regions are selected for collection at all!
Here's a graph of just old regions:
http://people.apache.org/~todd/hbase-fragmentation/old-regions.png
 - When old regions are actually selected for collection the heap usage does
drop, though elapsed time does spike over the guarantee.

So seems like something about the heuristics aren't quite right. Thoughts?

-Todd

On Fri, Jan 21, 2011 at 11:38 AM, Todd Lipcon <todd at cloudera.com> wrote:

> Hey folks,
>
> Took some time over the last day or two to follow up on this on the latest
> checkout of JDK7. I added some more instrumentation and my findings so far
> are:
>
> 1) CMS is definitely hitting a fragmentation problem. Our workload is
> pretty much guaranteed to fragment, and I don't think there's anything CMS
> can do about it - see the following graphs:
> http://people.apache.org/~todd/hbase-fragmentation/
>
> <http://people.apache.org/~todd/hbase-fragmentation/>2) G1GC is hitting
> full pauses because the "other" pause time predictions end up higher than
> the minimum pause length. I'm seeing the following sequence:
>
> - A single choose_cset operation for a non_young region takes a long time
> (unclear yet why this is happening, see below)
> - This inflates the predict_non_young_other_time_ms(1) result to a value
> greater than my pause goal
> - From then on, it doesn't collect any more non-young regions (ever!)
> because any region will be considered expensive regardless of the estimated
> rset or collection costs
> - The heap slowly fills up with non-young regions until we reach a full GC
>
> 3) So the question is why the choose_cset is taking a long time. I added
> getrusage() calls to wrap the choose_cset operation. Here's some output with
> extra logging:
>
> --> About to choose cset at   725.458
> Adding 1 young regions to the CSet
>   Added 0x0000000000000001 Young Regions to CS.
>     (3596288 KB left in heap.)
>     (picked region; 9.948053ms predicted; 21.164738ms remaining; 2448kb
> marked; 2448kb maxlive; 59-59% liveness)
>     (3593839 KB left in heap.)
> predict_region_elapsed_time_ms: 10.493828ms total, 5.486072ms rs scan
> (14528 cnum), 4.828102 copy time (2333800 bytes), 0.179654 other time
>     (picked region; 10.493828ms predicted; 11.216685ms remaining; 2279kb
> marked; 2279kb maxlive; 55-55% liveness)
> predict_region_elapsed_time_ms: 10.493828ms total, 5.486072ms rs scan
> (14528 cnum), 4.828102 copy time (2333800 bytes), 0.179654 other time
>     (3591560 KB left in heap.)
> predict_region_elapsed_time_ms: 10.346346ms total, 5.119780ms rs scan
> (13558 cnum), 5.046912 copy time (2439568 bytes), 0.179654 other time
> predict_region_elapsed_time_ms: 10.407672ms total, 5.333135ms rs scan
> (14123 cnum), 4.894882 copy time (2366080 bytes), 0.179654 other time
>     (no more marked regions; next region too expensive (adaptive; predicted
> 10.407672ms > remaining 0.722857ms))
> Resource usage of choose_cset:majflt: 0 nswap: 0  nvcsw: 6  nivcsw: 0
> --> About to prepare RS scan at   725.657
>
> The resource usage line with nvcsw=6 indicates there were 6 voluntary
> context switches while choosing cset. This choose_cset operation took
> 198.9ms all in choosing non-young.
>
> So, why are there voluntary context switches while choosing cset? This
> isn't swapping -- that should show under majflt, right? My only theories
> are:
> - are any locks acquired in choose_cset?
> - maybe the gc logging itself is blocking on IO to the log file? ie the
> instrumentation itself is interfering with the algorithm?
>
>
> Regardless, I think a single length choose_non_young_cset operation
> shouldn't be allowed to push the prediction above the time boundary and
> trigger this issue. Perhaps a simple workaround is that, whenever a
> collection chooses no non_young regions, it should contribute a value of 0
> to the average?
>
> I'll give this heuristic a try on my build and see if it solves the issue.
>
> -Todd
>
> On Tue, Jul 27, 2010 at 3:08 PM, Todd Lipcon <todd at cloudera.com> wrote:
>
>> Hi all,
>>
>> Back from my vacation and took some time yesterday and today to build a
>> fresh JDK 7 with some additional debug printouts from Peter's patch.
>>
>> What I found was a bit different - the rset scanning estimates are low,
>> but I consistently am seeing "Other time" estimates in the >40ms range.
>> Given my pause time goal of 20ms, these estimates are I think excluding most
>> of the regions from collectability. I haven't been able to dig around yet to
>> figure out where the long estimate for "other time" is coming from - in the
>> collections logged it sometimes shows fairly high "Other" but the "Choose
>> CSet" component is very short. I'll try to add some more debug info to the
>> verbose logging and rerun some tests over the next couple of days.
>>
>> At the moment I'm giving the JRockit VM a try to see how its deterministic
>> GC stacks up against G1 and CMS.
>>
>> Thanks
>> -Todd
>>
>>
>> On Tue, Jul 13, 2010 at 5:15 PM, Peter Schuller <
>> peter.schuller at infidyne.com> wrote:
>>
>>> Ramki/Tony,
>>>
>>> > Any chance of setting +PrintHeapAtGC and -XX:+PrintHeapAtGCExtended and
>>> > sending us the log, or part of it (say between two Full GCs)? Be
>>> prepared:
>>> > this will generate piles of output. But it will give us per-region
>>> > information that might shed more light on the cause of the issue....
>>> thanks,
>>>
>>> So what I have in terms of data is (see footnotes for urls references in
>>> []):
>>>
>>> (a) A patch[1] that prints some additional information about estimated
>>> costs of region eviction, and disables the GC efficiency check that
>>> normally terminates selection of regions. (Note: This is a throw-away
>>> patch for debugging; it's not intended as a suggested change for
>>> inclusion.)
>>>
>>> (b) A log[2] showing the output of a test run I did just now, with
>>> both your flags above and my patch enabled (but without disabling the
>>> efficiency check). It shows fallback to full GC when the actual live
>>> set size is 252 MB, and the maximum heap size is 2 GB (in other words,
>>> ~ 12% liveness). An easy way to find the point of full gc is to search
>>> for the string 'full 1'.
>>>
>>> (c) A file[3] with the effective VM options during the test.
>>>
>>> (d) Instructions for how to run the test to reproduce it (I'll get to
>>> that at the end; it's simplified relative to previously).
>>>
>>> (e) Nature of the test.
>>>
>>> Discussion:
>>>
>>> WIth respect to region information: I originally tried it in response
>>> to your recommendation earlier, but I found I did not see the
>>> information I was after. Perhaps I was just misreading it, but I
>>> mostly just saw either 0% or 100% fullness, and never the actual
>>> liveness estimate as produced by the mark phase. In the log I am
>>> referring to in this E-Mail, you can see that the last printout of
>>> region information just before the live GC fits this pattern; I just
>>> don't see anything that looks like legitimate liveness information
>>> being printed. (I don't have time to dig back into it right now to
>>> double-check what it's printing.)
>>>
>>> If you scroll up from the point of the full gc until you find a bunch
>>> of output starting with "predict_region_elapsed_time_ms" you see some
>>> output resulting from the patch, with pretty extreme values such as:
>>>
>>> predict_region_elapsed_time_ms: 34.378642ms total, 34.021154ms rs scan
>>> (46542 cnum), 0.040069 copy time (20704 bytes), 0.317419 other time
>>> predict_region_elapsed_time_ms: 45.020866ms total, 44.653222ms rs scan
>>> (61087 cnum), 0.050225 copy time (25952 bytes), 0.317419 other time
>>> predict_region_elapsed_time_ms: 16.250033ms total, 15.887065ms rs scan
>>> (21734 cnum), 0.045550 copy time (23536 bytes), 0.317419 other time
>>> predict_region_elapsed_time_ms: 226.942877ms total, 226.559163ms rs
>>> scan (309940 cnum), 0.066296 copy time (34256 bytes), 0.317419 other
>>> time
>>> predict_region_elapsed_time_ms: 542.344828ms total, 541.954750ms rs
>>> scan (741411 cnum), 0.072659 copy time (37544 bytes), 0.317419 other
>>> time
>>> predict_region_elapsed_time_ms: 668.595597ms total, 668.220877ms rs
>>> scan (914147 cnum), 0.057301 copy time (29608 bytes), 0.317419 other
>>> time
>>>
>>> So in the most extreme case in the excerpt above, that's > half a
>>> second of estimate rset scanning time for a single region with 914147
>>> cards to be scanned. While not all are that extreme, lots and lots of
>>> regions are very expensive and almost only due to rset scanning costs.
>>>
>>> If you scroll down a bit to the first (and ONLY) partial that happened
>>> after the statistics accumulating from the marking phase, we see more
>>> output resulting form the patch. At the end, we see:
>>>
>>>    (picked region; 0.345890ms predicted; 1.317244ms remaining; 15kb
>>> marked; 15kb maxlive; 1-1% liveness)
>>>    (393380 KB left in heap.)
>>>    (picked region; 0.345963ms predicted; 0.971354ms remaining; 15kb
>>> marked; 15kb maxlive; 1-1% liveness)
>>>    (393365 KB left in heap.)
>>>    (picked region; 0.346036ms predicted; 0.625391ms remaining; 15kb
>>> marked; 15kb maxlive; 1-1% liveness)
>>>    (393349 KB left in heap.)
>>>    (no more marked regions; next region too expensive (adaptive;
>>> predicted 0.346036ms > remaining 0.279355ms))
>>>
>>> So in other words, it picked a bunch of regions in order of "lowest
>>> hanging fruit". The *least* low hanging fruit picked still had
>>> liveness at 1%; in other words, there's plenty of further regions that
>>> ideally should be collected because they contain almost no garbage
>>> (ignoring the cost of collecting them).
>>>
>>> In this case, it stopped picking regions because the next region to be
>>> picked, though cheap, was the straw that broke the camel's neck and we
>>> simply exceeded the alloted time for this particular GC.
>>>
>>> However, after this partial completes, it reverts back to doing just
>>> young gc:s. In other words, even though there's *plenty* of regions
>>> with very low liveness, further partials aren't happening.
>>>
>>> By applying this part of the patch:
>>>
>>>  -           (adaptive_young_list_length() &&
>>>  +           (adaptive_young_list_length() && false && // scodetodo
>>>
>>> I artificially force g1 to not fall back to doing young gc:s for
>>> efficiency reasons. When I run with that change, I don't experience
>>> the slow perpetual growth until fallback to full GC. If I remember
>>> correctly though, the rset scanning cost is in fact high, but I don't
>>> have details saved and I'm afraid I don't have time to re-run those
>>> tests right now and compare numbers.
>>>
>>> Reproducing it:
>>>
>>> I made some changes and the test case should now hopefully be easy to
>>> run assuming you have maven installed. The github project is at:
>>>
>>>   http://github.com/scode/httpgctest
>>>
>>> There is a README, but the shortest possible instructions to
>>> re-produce the test that I did:
>>>
>>>  git clone git://github.com/scode/httpgctest.git
>>>  cd httpgctest.git
>>>  git checkout 20100714_1       # grab from appropriate tag, in case I
>>> change master
>>>  mvn package
>>>  HTTPGCTEST_LOGGC=gc.log ./run.sh
>>>
>>> That should start the http server; then run concurrently:
>>>
>>>  while [ 1 ] ; do curl 'http://localhost:9191/dropdata?ratio=0.10' ;
>>> curl 'http://localhost:9191/gendata?amount=25000' ; sleep 0.1  ; done
>>>
>>> And then just wait and observe.
>>>
>>> Nature of the test:
>>>
>>> So the test if run as above will essentially reach a steady state of
>>> equilibrium with about 25000 pieces of data in a clojure immutable
>>> map. The result is that a significant amount of new data is being
>>> allocated, but very little writing to old regions is happening. The
>>> garbage generated is very well spread out over the entire heap because
>>> it goes through all objects and drops 10% (the ratio=0.10) for each
>>> iteration, after which it adds 25000 new items.
>>>
>>> In other words; not a lot of old gen writing, but lots of writes to
>>> the young gen referencing objects in the old gen.
>>>
>>> [1] http://distfiles.scode.org/mlref/g1/g1_region_live_stats_hack.patch
>>> [2]
>>> http://distfiles.scode.org/mlref/gctest/httpgctest-g1-fullgc-20100714/gc-fullfallback.log
>>> [3]
>>> http://distfiles.scode.org/mlref/gctest/httpgctest-g1-fullgc-20100714/vmoptions.txt
>>>
>>> --
>>> / Peter Schuller
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


-- 
Todd Lipcon
Software Engineer, Cloudera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110121/60510b12/attachment-0001.html 

From peter.schuller at infidyne.com  Sun Jan 23 00:21:13 2011
From: peter.schuller at infidyne.com (Peter Schuller)
Date: Sun, 23 Jan 2011 09:21:13 +0100
Subject: G1GC Full GCs
In-Reply-To: <AANLkTi=0vSQ8E0EALDrs3j=mWNaAF52i_mY0Qvu=D6o9@mail.gmail.com>
References: <AANLkTiniKm1ghON3AIl5JPNzOVqyasTxxQaRsGCPBnba@mail.gmail.com>
	<AANLkTinjYAdubXNSV2tdic1Xn-jlTFs_iswOBpU3xNGQ@mail.gmail.com>
	<4C3CB3E9.4040305@oracle.com>
	<AANLkTim63XWKdILkuMcALP5rOPL_ga_rbCfYN3AAYalT@mail.gmail.com>
	<AANLkTimKNhq7vvx0-S1yiJj6GA_O1ySin2Oys-LeYA8H@mail.gmail.com>
	<AANLkTikM5uwyBuYAfASbdbf9v_VibOWnwq6EUs19NkSM@mail.gmail.com>
	<AANLkTi=0vSQ8E0EALDrs3j=mWNaAF52i_mY0Qvu=D6o9@mail.gmail.com>
Message-ID: <AANLkTimSx=ASNhzat864PkU3xqaL-sg_94EyZLCRQhGZ@mail.gmail.com>

> ?- most of the time no old gen regions are selected for collection at all!
> Here's a graph of just old regions:
> http://people.apache.org/~todd/hbase-fragmentation/old-regions.png

This is consistent with my anecdotal observations as well and I
believe it is expected. What I have observed happening is that
non-young (partial) collections always happen after the marking phases
some number of times, followed by young collections only until another
marking phase is triggered and completed.

I think this makes sense because region selection is based on cost
heuristics largely based on liveness data from marking. So you have
your marking phase followed by a period of decreasing availability of
non-young regions that are eligible for collection given the GC
efficiency goals (and the pause time goals), until there are 0 such.
Young collections then continue until unrelated criteria trigger a new
marking phase, giving non-young regions a chance again to get above
the eligibility watermark.

-- 
/ Peter Schuller

From peter.schuller at infidyne.com  Sun Jan 23 00:42:04 2011
From: peter.schuller at infidyne.com (Peter Schuller)
Date: Sun, 23 Jan 2011 09:42:04 +0100
Subject: G1GC Full GCs
In-Reply-To: <AANLkTi=0vSQ8E0EALDrs3j=mWNaAF52i_mY0Qvu=D6o9@mail.gmail.com>
References: <AANLkTiniKm1ghON3AIl5JPNzOVqyasTxxQaRsGCPBnba@mail.gmail.com>
	<AANLkTinjYAdubXNSV2tdic1Xn-jlTFs_iswOBpU3xNGQ@mail.gmail.com>
	<4C3CB3E9.4040305@oracle.com>
	<AANLkTim63XWKdILkuMcALP5rOPL_ga_rbCfYN3AAYalT@mail.gmail.com>
	<AANLkTimKNhq7vvx0-S1yiJj6GA_O1ySin2Oys-LeYA8H@mail.gmail.com>
	<AANLkTikM5uwyBuYAfASbdbf9v_VibOWnwq6EUs19NkSM@mail.gmail.com>
	<AANLkTi=0vSQ8E0EALDrs3j=mWNaAF52i_mY0Qvu=D6o9@mail.gmail.com>
Message-ID: <AANLkTimur1JbWnhkete4GGjLfm+b8PuBxjpPWbHb1upZ@mail.gmail.com>

> I still seem to be putting off GC of non-young regions too much though. I

Part of my experiments I have been harping on was the below change to
cut GC efficiency out of the decision to perform non-young
collections. I'm not suggesting it actually be disabled, but perhaps
it can be adjusted to fit your workload? If there is nothing outright
wrong in terms of predictions and the problem is due to cost estimates
being too high, that may be a way to avoid full GC:s at the expense of
more expensive GC activity. This smells like something that should be
a tweakable VM option. Just like GCTimeRatio affects heap expansion
decisions, something to affect this (probably just a ratio applied to
the test below?).

Another thing: This is to a large part my human confirmation biased
brain speaking, but I would be really interested to find out if if the
slow build-up you seem to be experiencing is indeed due to rs scan
costs de to sparse table overflow (I've been harping about roughly the
same thing several times so maybe people are tired of it; most
recently in the thread "g1: dealing with high rates of inter-region
pointer writes").

Is your test easily runnable so that one can reproduce? Preferably
without lots of hbase/hadoop knowledge. I.e., is it something that can
be run in a self-contained fashion fairly easily?

Here's the patch indicating where to adjust the efficiency thresholding:

--- a/src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp   Fri
Dec 17 23:32:58 2010 -0800
+++ b/src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp   Sun
Jan 23 09:21:54 2011 +0100
@@ -1463,7 +1463,7 @@
     if ( !_last_young_gc_full ) {
       if ( _should_revert_to_full_young_gcs ||
            _known_garbage_ratio < 0.05 ||
-           (adaptive_young_list_length() &&
+           (adaptive_young_list_length() && //false && // scodetodo
            (get_gc_eff_factor() * cur_efficiency < predict_young_gc_eff())) ) {
         set_full_young_gcs(true);
       }


-- 
/ Peter Schuller

From todd at cloudera.com  Sun Jan 23 22:16:43 2011
From: todd at cloudera.com (Todd Lipcon)
Date: Sun, 23 Jan 2011 22:16:43 -0800
Subject: G1GC Full GCs
In-Reply-To: <AANLkTimur1JbWnhkete4GGjLfm+b8PuBxjpPWbHb1upZ@mail.gmail.com>
References: <AANLkTiniKm1ghON3AIl5JPNzOVqyasTxxQaRsGCPBnba@mail.gmail.com>
	<AANLkTinjYAdubXNSV2tdic1Xn-jlTFs_iswOBpU3xNGQ@mail.gmail.com>
	<4C3CB3E9.4040305@oracle.com>
	<AANLkTim63XWKdILkuMcALP5rOPL_ga_rbCfYN3AAYalT@mail.gmail.com>
	<AANLkTimKNhq7vvx0-S1yiJj6GA_O1ySin2Oys-LeYA8H@mail.gmail.com>
	<AANLkTikM5uwyBuYAfASbdbf9v_VibOWnwq6EUs19NkSM@mail.gmail.com>
	<AANLkTi=0vSQ8E0EALDrs3j=mWNaAF52i_mY0Qvu=D6o9@mail.gmail.com>
	<AANLkTimur1JbWnhkete4GGjLfm+b8PuBxjpPWbHb1upZ@mail.gmail.com>
Message-ID: <AANLkTi=kdC=CJ5gfdwEnFMtgSnUKFgdHYV9NkS=REUjQ@mail.gmail.com>

Unfortunately my test is not easy to reproduce in its current form. But as I
look more and more into it, it looks like we're running into the same issue.

I added some code at the end of the mark phase that, after it sorts the
regions by efficiency, will print an object histogram for any regions that
are >98% garbage but very inefficient (<100KB/ms predicted collection rate)

Here's an example of an "uncollectable" region that is all garbage but for
one object:

Region 0x00002aaab0203e18 (  M1) [0x00002aaaf3800000, 0x00002aaaf3c00000]
Used: 4096K, garbage: 4095K. Eff: 6.448103 K/ms
  Very low-occupancy low-efficiency region. Histogram:

 num     #instances         #bytes  class name
----------------------------------------------
   1:             1            280
 [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;
Total             1            280

At 6K/ms it's predicting take 600+ms to collect this region, so it will
never happen.

I can't think of any way that there would be a high mutation rate of
references to this Entry object..

So, my shot-in-the-dark theory is similar to what Peter was thinking. When a
region through its lifetime has a large number of other regions reference
it, even briefly, its sparse table will overflow. Then, later in life when
it's down to even just one object with a very small number of inbound
references, it still has all of those coarse entries -- they don't get
scrubbed because those regions are suffering the same issue.

Thoughts?

-Todd

On Sun, Jan 23, 2011 at 12:42 AM, Peter Schuller <
peter.schuller at infidyne.com> wrote:

> > I still seem to be putting off GC of non-young regions too much though. I
>
> Part of my experiments I have been harping on was the below change to
> cut GC efficiency out of the decision to perform non-young
> collections. I'm not suggesting it actually be disabled, but perhaps
> it can be adjusted to fit your workload? If there is nothing outright
> wrong in terms of predictions and the problem is due to cost estimates
> being too high, that may be a way to avoid full GC:s at the expense of
> more expensive GC activity. This smells like something that should be
> a tweakable VM option. Just like GCTimeRatio affects heap expansion
> decisions, something to affect this (probably just a ratio applied to
> the test below?).
>
> Another thing: This is to a large part my human confirmation biased
> brain speaking, but I would be really interested to find out if if the
> slow build-up you seem to be experiencing is indeed due to rs scan
> costs de to sparse table overflow (I've been harping about roughly the
> same thing several times so maybe people are tired of it; most
> recently in the thread "g1: dealing with high rates of inter-region
> pointer writes").
>
> Is your test easily runnable so that one can reproduce? Preferably
> without lots of hbase/hadoop knowledge. I.e., is it something that can
> be run in a self-contained fashion fairly easily?
>
> Here's the patch indicating where to adjust the efficiency thresholding:
>
> --- a/src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp   Fri
> Dec 17 23:32:58 2010 -0800
> +++ b/src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp   Sun
> Jan 23 09:21:54 2011 +0100
> @@ -1463,7 +1463,7 @@
>     if ( !_last_young_gc_full ) {
>       if ( _should_revert_to_full_young_gcs ||
>            _known_garbage_ratio < 0.05 ||
> -           (adaptive_young_list_length() &&
> +           (adaptive_young_list_length() && //false && // scodetodo
>             (get_gc_eff_factor() * cur_efficiency <
> predict_young_gc_eff())) ) {
>         set_full_young_gcs(true);
>       }
>
>
> --
> / Peter Schuller
>


-- 
Todd Lipcon
Software Engineer, Cloudera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110123/74351a9e/attachment.html 

From todd at cloudera.com  Mon Jan 24 00:24:55 2011
From: todd at cloudera.com (Todd Lipcon)
Date: Mon, 24 Jan 2011 00:24:55 -0800
Subject: G1GC Full GCs
In-Reply-To: <AANLkTi=kdC=CJ5gfdwEnFMtgSnUKFgdHYV9NkS=REUjQ@mail.gmail.com>
References: <AANLkTiniKm1ghON3AIl5JPNzOVqyasTxxQaRsGCPBnba@mail.gmail.com>
	<AANLkTinjYAdubXNSV2tdic1Xn-jlTFs_iswOBpU3xNGQ@mail.gmail.com>
	<4C3CB3E9.4040305@oracle.com>
	<AANLkTim63XWKdILkuMcALP5rOPL_ga_rbCfYN3AAYalT@mail.gmail.com>
	<AANLkTimKNhq7vvx0-S1yiJj6GA_O1ySin2Oys-LeYA8H@mail.gmail.com>
	<AANLkTikM5uwyBuYAfASbdbf9v_VibOWnwq6EUs19NkSM@mail.gmail.com>
	<AANLkTi=0vSQ8E0EALDrs3j=mWNaAF52i_mY0Qvu=D6o9@mail.gmail.com>
	<AANLkTimur1JbWnhkete4GGjLfm+b8PuBxjpPWbHb1upZ@mail.gmail.com>
	<AANLkTi=kdC=CJ5gfdwEnFMtgSnUKFgdHYV9NkS=REUjQ@mail.gmail.com>
Message-ID: <AANLkTi=iXgZV6+cS+6yUZ8YXChBksV98Of1pDn2T=CjJ@mail.gmail.com>

Added one more bit of debug output here.. whenever it comes across a region
with only one live object, it also dumps the stats on the regions in the
coarse rset. For example:


Region 0x00002aaab0c48388 (  M1) [0x00002aac7f000000, 0x00002aac7f400000]
Used: 4096K, garbage: 4095K. Eff: 30.061317 K/ms
  Very low-occupancy low-efficiency region.
  RSet: coarse: 90112  fine: 0  sparse: 2717

 num     #instances         #bytes  class name
----------------------------------------------
   1:             1             56
 java.util.concurrent.locks.AbstractQueuedSynchronizer$Node
Total             1             56
Coarse region references:
--------------------------
Region 0x00002aaab04b5288 (  M1) [0x00002aab5b000000, 0x00002aab5b400000]
Used: 4096K, garbage: 4094K. Eff: 243.385409 K/ms
Region 0x00002aaab04c2708 (  M1) [0x00002aab5d000000, 0x00002aab5d400000]
Used: 4096K, garbage: 1975K. Eff: 366.659049 K/ms
Region 0x00002aaab054de48 (  M1) [0x00002aab72000000, 0x00002aab72400000]
Used: 4096K, garbage: 4095K. Eff: 40.958295 K/ms
Region 0x00002aaab0622648 (  M1) [0x00002aab92000000, 0x00002aab92400000]
Used: 4096K, garbage: 4042K. Eff: 40.304866 K/ms
Region 0x00002aaab0c30fa8 (  M1) [0x00002aac7b800000, 0x00002aac7bc00000]
Used: 4096K, garbage: 4094K. Eff: 53.233756 K/ms
Region 0x00002aaab0c32a38 (  M1) [0x00002aac7bc00000, 0x00002aac7c000000]
Used: 4096K, garbage: 4094K. Eff: 143.159938 K/ms
Region 0x00002aaab0c4b8a8 (  M1) [0x00002aac7f800000, 0x00002aac7fc00000]
Used: 4096K, garbage: 4095K. Eff: 53.680457 K/ms
Region 0x00002aaab0c50858 (  M1) [0x00002aac80400000, 0x00002aac80800000]
Used: 4096K, garbage: 4095K. Eff: 20.865626 K/ms
Region 0x00002aaab0c522e8 (  M1) [0x00002aac80800000, 0x00002aac80c00000]
Used: 4096K, garbage: 4094K. Eff: 36.474733 K/ms
Region 0x00002aaab0c6b158 (  M1) [0x00002aac84400000, 0x00002aac84800000]
Used: 4096K, garbage: 4095K. Eff: 19.686717 K/ms
Region 0x00002aaab0d74b58 (  M1) [0x00002aacac400000, 0x00002aacac800000]
Used: 4096K, garbage: 4095K. Eff: 36.379891 K/ms
--------------------------

So here we have a region that has 11 coarse rset members, all of which are
pretty low efficiency. So, they're not going to get collected, and neither
will this region.

Basically we always devolve into a case where there are a bunch of these
inefficient regions all referring to each other in the coarse rsets.

Would it be possible to add some phase which "breaks down" the coarse
members back into sparse/fine? Note how in this case all of the fine
references are gone. I imagine most of the coarse references are "ghosts" as
well - once upon a time there was an object in those regions that referred
to this region, but they're long since dead.

On Sun, Jan 23, 2011 at 10:16 PM, Todd Lipcon <todd at cloudera.com> wrote:

> Unfortunately my test is not easy to reproduce in its current form. But as
> I look more and more into it, it looks like we're running into the same
> issue.
>
> I added some code at the end of the mark phase that, after it sorts the
> regions by efficiency, will print an object histogram for any regions that
> are >98% garbage but very inefficient (<100KB/ms predicted collection rate)
>
> Here's an example of an "uncollectable" region that is all garbage but for
> one object:
>
> Region 0x00002aaab0203e18 (  M1) [0x00002aaaf3800000, 0x00002aaaf3c00000]
> Used: 4096K, garbage: 4095K. Eff: 6.448103 K/ms
>   Very low-occupancy low-efficiency region. Histogram:
>
>  num     #instances         #bytes  class name
> ----------------------------------------------
>    1:             1            280
>  [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;
> Total             1            280
>
> At 6K/ms it's predicting take 600+ms to collect this region, so it will
> never happen.
>
> I can't think of any way that there would be a high mutation rate of
> references to this Entry object..
>
> So, my shot-in-the-dark theory is similar to what Peter was thinking. When
> a region through its lifetime has a large number of other regions reference
> it, even briefly, its sparse table will overflow. Then, later in life when
> it's down to even just one object with a very small number of inbound
> references, it still has all of those coarse entries -- they don't get
> scrubbed because those regions are suffering the same issue.
>
> Thoughts?
>
> -Todd
>
> On Sun, Jan 23, 2011 at 12:42 AM, Peter Schuller <
> peter.schuller at infidyne.com> wrote:
>
>> > I still seem to be putting off GC of non-young regions too much though.
>> I
>>
>> Part of my experiments I have been harping on was the below change to
>> cut GC efficiency out of the decision to perform non-young
>> collections. I'm not suggesting it actually be disabled, but perhaps
>> it can be adjusted to fit your workload? If there is nothing outright
>> wrong in terms of predictions and the problem is due to cost estimates
>> being too high, that may be a way to avoid full GC:s at the expense of
>> more expensive GC activity. This smells like something that should be
>> a tweakable VM option. Just like GCTimeRatio affects heap expansion
>> decisions, something to affect this (probably just a ratio applied to
>> the test below?).
>>
>> Another thing: This is to a large part my human confirmation biased
>> brain speaking, but I would be really interested to find out if if the
>> slow build-up you seem to be experiencing is indeed due to rs scan
>> costs de to sparse table overflow (I've been harping about roughly the
>> same thing several times so maybe people are tired of it; most
>> recently in the thread "g1: dealing with high rates of inter-region
>> pointer writes").
>>
>> Is your test easily runnable so that one can reproduce? Preferably
>> without lots of hbase/hadoop knowledge. I.e., is it something that can
>> be run in a self-contained fashion fairly easily?
>>
>> Here's the patch indicating where to adjust the efficiency thresholding:
>>
>> --- a/src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp   Fri
>> Dec 17 23:32:58 2010 -0800
>> +++ b/src/share/vm/gc_implementation/g1/g1CollectorPolicy.cpp   Sun
>> Jan 23 09:21:54 2011 +0100
>> @@ -1463,7 +1463,7 @@
>>     if ( !_last_young_gc_full ) {
>>       if ( _should_revert_to_full_young_gcs ||
>>            _known_garbage_ratio < 0.05 ||
>> -           (adaptive_young_list_length() &&
>> +           (adaptive_young_list_length() && //false && // scodetodo
>>             (get_gc_eff_factor() * cur_efficiency <
>> predict_young_gc_eff())) ) {
>>         set_full_young_gcs(true);
>>       }
>>
>>
>> --
>> / Peter Schuller
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


-- 
Todd Lipcon
Software Engineer, Cloudera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110124/1e893c18/attachment.html