From anmulholland at expedia.com Fri Aug 9 15:48:37 2013 From: anmulholland at expedia.com (Andrew Mulholland) Date: Fri, 9 Aug 2013 23:48:37 +0100 Subject: ParNew 4x slower under 1.7 vs 1.6? Message-ID: Hi I've seen and read this thread from last year: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-November/001437. html and have tried some of the options suggested in it, but to no avail. We've been testing one of our web apps with 1.7.0_25 vs 1.6.0_43, and have been seeing some very strange numbers for ParNew GC under 1.7. We are fairly happy with the GC performance under 1.6, but wish to move to 1.7 following the EoL of 1.6. I've attached (gzipped) GC logs from both 1.6 and 1.7 for analysis. The logs are from the JVM startup, during which a script warms some caches with a low level of load ? and then the app is opened to a production equivalent load for ~10 minutes. The only difference between the runs, is the Java version. The results are repeatable across multiple runs. Additionally the performance remains similar if the JVM runs for a very long time period (we've left a test load running for days under 1.7). Under 1.6, we see: - average ParNew pause times of ~100ms - with standard deviation of 0.12 - max of ~800ms - cumulative total of ~11 seconds ParNew GC during the test run Under 1.7 we see: - average ParNew pause times of ~440ms - with standard deviation of 0.26 - max of ~1600s - cumulative total of ~40 seconds PewNew GC during the test run Our application runs in a tomcat container and is run with JVM configured as follows: JAVA_OPTS="$JAVA_OPTS -Xms8192m -Xmx8192m" JAVA_OPTS="$JAVA_OPTS -XX:MaxPermSize=256m -XX:NewRatio=2 -XX:SurvivorRatio=16 -Xss512k" JAVA_OPTS="$JAVA_OPTS -XX:+PrintCommandLineFlags -XX:PrintFLSStatistics=1 -XX:+PrintGCApplicationStoppedTime" JAVA_OPTS="$JAVA_OPTS -Xloggc:${CATALINA_BASE}/logs/gc.log" JAVA_OPTS="$JAVA_OPTS -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails" JAVA_OPTS="$JAVA_OPTS -XX:+PrintTenuringDistribution" JAVA_OPTS="$JAVA_OPTS -XX:+UseLinuxPosixThreadCPUClocks" JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC" JAVA_OPTS="$JAVA_OPTS -XX:+CMSConcurrentMTEnabled -XX:+CMSScavengeBeforeRemark -XX:+CMSClassUnloadingEnabled" JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=70" JAVA_OPTS="$JAVA_OPTS -XX:+UseCompressedOops" We run on Linux (Centos 6.3), on x86_64. We have tried both on Physical hardware and Vms, and see the same degradation on both. I have also experimented with other settings such as the ones mentioned in the other thread above, and whilst I can improve performance under 1.7 by about 20%, it's still over 3x slower than under 1.6, which doesn't seem right. I have also tried removing most of the GC settings, and leaving just -XX:+UseConcMarkSweepGC -XX:+UseParNewGC, and see similar or worse results. Adjusting heap size, or NewRatio also seems to make little effect on the performance too? Typical Collection under 1.6: 568.940: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 583884197 Max Chunk Size: 583884197 Number of Blocks: 1 Av. Block Size: 583884197 Tree Height: 1 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 1721764 Max Chunk Size: 1681920 Number of Blocks: 35 Av. Block Size: 49193 Tree Height: 13 568.940: [ParNew Desired survivor size 79527936 bytes, new threshold 15 (max 15) - age 1: 15260904 bytes, 15260904 total - age 2: 7430544 bytes, 22691448 total - age 3: 7280688 bytes, 29972136 total - age 4: 7580200 bytes, 37552336 total - age 5: 3680 bytes, 37556016 total - age 6: 7513296 bytes, 45069312 total - age 7: 5490216 bytes, 50559528 total - age 8: 10016024 bytes, 60575552 total - age 9: 7684424 bytes, 68259976 total - age 10: 3650576 bytes, 71910552 total - age 11: 4543672 bytes, 76454224 total : 2625089K->106222K(2640832K), 0.0986890 secs] 3651001K->1144739K(8233280K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 582295618 Max Chunk Size: 582295618 Number of Blocks: 1 Av. Block Size: 582295618 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 1721764 Max Chunk Size: 1681920 Number of Blocks: 35 Av. Block Size: 49193 Tree Height: 13 , 0.0989930 secs] [Times: user=0.46 sys=0.02, real=0.10 secs] Typical Collection under 1.7: 492.187: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 493494915 Max Chunk Size: 493494915 Number of Blocks: 1 Av. Block Size: 493494915 Tree Height: 1 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 2207447 Max Chunk Size: 2178048 Number of Blocks: 45 Av. Block Size: 49054 Tree Height: 11 492.187: [ParNew Desired survivor size 79527936 bytes, new threshold 15 (max 15) - age 1: 73826328 bytes, 73826328 total : 2599315K->87426K(2640832K), 0.4153760 secs] 4331146K->1833474K(8233280K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 491592318 Max Chunk Size: 491592318 Number of Blocks: 1 Av. Block Size: 491592318 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 2207447 Max Chunk Size: 2178048 Number of Blocks: 45 Av. Block Size: 49054 Tree Height: 11 , 0.4156130 secs] [Times: user=0.65 sys=0.02, real=0.42 secs] Am I missing something obvious? Thanks Andrew Andrew Mulholland Operations Architect Expedia Affiliate Network p: +44 (0)20 7019 2927 | m: +44 (0)77 1585 4475 e: anmulholland at expedia.com -------------- next part -------------- A non-text attachment was scrubbed... Name: gc.log-1.7.0_25.gz Type: application/x-gzip Size: 24896 bytes Desc: gc.log-1.7.0_25.gz Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130809/7381fe52/gc.log-1.7.0_25-0001.gz -------------- next part -------------- A non-text attachment was scrubbed... Name: gc.log-1.6.0_43.gz Type: application/x-gzip Size: 35238 bytes Desc: gc.log-1.6.0_43.gz Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130809/7381fe52/gc.log-1.6.0_43-0001.gz From bernd.eckenfels at googlemail.com Fri Aug 9 18:40:08 2013 From: bernd.eckenfels at googlemail.com (Bernd Eckenfels) Date: Sat, 10 Aug 2013 03:40:08 +0200 Subject: ParNew 4x slower under 1.7 vs 1.6? In-Reply-To: References: Message-ID: Am 10.08.2013, 00:48 Uhr, schrieb Andrew Mulholland : > Under 1.6, we see: > - average ParNew pause times of ~100ms > - with standard deviation of 0.12 > - max of ~800ms > - cumulative total of ~11 seconds ParNew GC during the test run > > Under 1.7 we see: > - average ParNew pause times of ~440ms > - with standard deviation of 0.26 > - max of ~1600s > - cumulative total of ~40 seconds PewNew GC during the test run Sounds pretty mich like it is using less threads for the collection. Did you tried to run it with -XX:+PrintFlagsFinal and have a look at ParallelGCThreads? (CommandLineFlags shows only the ones which get calculated based on some compat). Gruss Bernd From anmulholland at expedia.com Sat Aug 10 03:16:39 2013 From: anmulholland at expedia.com (Andrew Mulholland) Date: Sat, 10 Aug 2013 11:16:39 +0100 Subject: ParNew 4x slower under 1.7 vs 1.6? In-Reply-To: Message-ID: On 8/10/13 2:40 AM, "Bernd Eckenfels" wrote: >Am 10.08.2013, 00:48 Uhr, schrieb Andrew Mulholland >: >> Under 1.6, we see: >> - average ParNew pause times of ~100ms >> - with standard deviation of 0.12 >> - max of ~800ms >> - cumulative total of ~11 seconds ParNew GC during the test run >> >> Under 1.7 we see: >> - average ParNew pause times of ~440ms >> - with standard deviation of 0.26 >> - max of ~1600s >> - cumulative total of ~40 seconds PewNew GC during the test run > >Sounds pretty mich like it is using less threads for the collection. Did >you tried to run it with -XX:+PrintFlagsFinal and have a look at >ParallelGCThreads? (CommandLineFlags shows only the ones which get >calculated based on some compat). Thanks for the reply.. In a diff between print flags for the different versions, I didn't initially pick up anything here - as I saw that: 1.6: uintx ParallelGCThreads := 8 {product} 1.7: uintx ParallelGCThreads = 8 {product} Is there any the significance in the := 8 on 1.6 vs the = 8 on 1.7? I've tried overriding the ParallelGCThreads, to 6, 8 etc, but seen no difference in performance on 1.7 The server I'm testing on has 8 cores. Thanks Andrew From jon.masamitsu at oracle.com Sun Aug 11 21:50:35 2013 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Sun, 11 Aug 2013 21:50:35 -0700 Subject: ParNew 4x slower under 1.7 vs 1.6? In-Reply-To: References: Message-ID: <5208699B.5030305@oracle.com> Andrew, I saw your response to Bernd where you verified the number of GC threads that you are using but the amount of parallelism your getting still doesn't seem to be right. See below. On 8/9/2013 3:48 PM, Andrew Mulholland wrote: > Hi > > I've seen and read this thread from last year: > http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-November/001437. > html and have tried some of the options suggested in it, but to no avail. > > We've been testing one of our web apps with 1.7.0_25 vs 1.6.0_43, and have > been seeing some very strange numbers for ParNew GC under 1.7. > > We are fairly happy with the GC performance under 1.6, but wish to move to > 1.7 following the EoL of 1.6. > > I've attached (gzipped) GC logs from both 1.6 and 1.7 for analysis. > > The logs are from the JVM startup, during which a script warms some caches > with a low level of load ? and then the app is opened to a production > equivalent load for ~10 minutes. > > The only difference between the runs, is the Java version. The results are > repeatable across multiple runs. > > Additionally the performance remains similar if the JVM runs for a very > long time period (we've left a test load running for days under 1.7). > > > Under 1.6, we see: > - average ParNew pause times of ~100ms > - with standard deviation of 0.12 > - max of ~800ms > - cumulative total of ~11 seconds ParNew GC during the test run > > Under 1.7 we see: > - average ParNew pause times of ~440ms > - with standard deviation of 0.26 > - max of ~1600s > - cumulative total of ~40 seconds PewNew GC during the test run > > > > > > Our application runs in a tomcat container and is run with JVM configured > as follows: > > JAVA_OPTS="$JAVA_OPTS -Xms8192m -Xmx8192m" > JAVA_OPTS="$JAVA_OPTS -XX:MaxPermSize=256m -XX:NewRatio=2 > -XX:SurvivorRatio=16 -Xss512k" > JAVA_OPTS="$JAVA_OPTS -XX:+PrintCommandLineFlags -XX:PrintFLSStatistics=1 > -XX:+PrintGCApplicationStoppedTime" > JAVA_OPTS="$JAVA_OPTS -Xloggc:${CATALINA_BASE}/logs/gc.log" > JAVA_OPTS="$JAVA_OPTS -verbose:gc -XX:+PrintGCTimeStamps > -XX:+PrintGCDetails" > JAVA_OPTS="$JAVA_OPTS -XX:+PrintTenuringDistribution" > JAVA_OPTS="$JAVA_OPTS -XX:+UseLinuxPosixThreadCPUClocks" > JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC -XX:+UseParNewGC > -XX:+DisableExplicitGC" > JAVA_OPTS="$JAVA_OPTS -XX:+CMSConcurrentMTEnabled > -XX:+CMSScavengeBeforeRemark -XX:+CMSClassUnloadingEnabled" > JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=70" > JAVA_OPTS="$JAVA_OPTS -XX:+UseCompressedOops" > > We run on Linux (Centos 6.3), on x86_64. > We have tried both on Physical hardware and Vms, and see the same > degradation on both. > > > > I have also experimented with other settings such as the ones mentioned in > the other thread above, and whilst I can improve performance under 1.7 by > about 20%, it's still over 3x slower than under 1.6, which doesn't seem > right. > I have also tried removing most of the GC settings, and leaving just > -XX:+UseConcMarkSweepGC -XX:+UseParNewGC, and see similar or worse results. > > Adjusting heap size, or NewRatio also seems to make little effect on the > performance tooS( > > > > Typical Collection under 1.6: > 568.940: [GC Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 583884197 > Max Chunk Size: 583884197 > Number of Blocks: 1 > Av. Block Size: 583884197 > Tree Height: 1 > Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 1721764 > Max Chunk Size: 1681920 > Number of Blocks: 35 > Av. Block Size: 49193 > Tree Height: 13 > > 568.940: [ParNew > Desired survivor size 79527936 bytes, new threshold 15 (max 15) > - age 1: 15260904 bytes, 15260904 total > - age 2: 7430544 bytes, 22691448 total > - age 3: 7280688 bytes, 29972136 total > - age 4: 7580200 bytes, 37552336 total > - age 5: 3680 bytes, 37556016 total > - age 6: 7513296 bytes, 45069312 total > - age 7: 5490216 bytes, 50559528 total > - age 8: 10016024 bytes, 60575552 total > - age 9: 7684424 bytes, 68259976 total > - age 10: 3650576 bytes, 71910552 total > - age 11: 4543672 bytes, 76454224 total > : 2625089K->106222K(2640832K), 0.0986890 secs] > 3651001K->1144739K(8233280K)After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 582295618 > Max Chunk Size: 582295618 > Number of Blocks: 1 > Av. Block Size: 582295618 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 1721764 > Max Chunk Size: 1681920 > Number of Blocks: 35 > Av. Block Size: 49193 > Tree Height: 13 > , 0.0989930 secs] [Times: user=0.46 sys=0.02, real=0.10 secs] > About a 4 times speed up here (user / real ~ 4.6) > > > > Typical Collection under 1.7: > > 492.187: [GC Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 493494915 > Max Chunk Size: 493494915 > Number of Blocks: 1 > Av. Block Size: 493494915 > Tree Height: 1 > Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 2207447 > Max Chunk Size: 2178048 > Number of Blocks: 45 > Av. Block Size: 49054 > Tree Height: 11 > 492.187: [ParNew > Desired survivor size 79527936 bytes, new threshold 15 (max 15) > - age 1: 73826328 bytes, 73826328 total > : 2599315K->87426K(2640832K), 0.4153760 secs] > 4331146K->1833474K(8233280K)After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 491592318 > Max Chunk Size: 491592318 > Number of Blocks: 1 > Av. Block Size: 491592318 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 2207447 > Max Chunk Size: 2178048 > Number of Blocks: 45 > Av. Block Size: 49054 > Tree Height: 11 > , 0.4156130 secs] [Times: user=0.65 sys=0.02, real=0.42 secs] Only about a 1.5 times speed up here (user / real ~1.5). Can you check some other entries and verify that the amount of parallelism your seeing on 1.7 is only 1.5. Jon > > > > > > > > Am I missing something obvious? > > > > > > Thanks > > > Andrew > > > Andrew Mulholland > Operations Architect > Expedia Affiliate Network > p: +44 (0)20 7019 2927 | m: +44 (0)77 1585 4475 > e: anmulholland at expedia.com > > > This body part will be downloaded on demand. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130811/03dfca8a/attachment.html From anmulholland at expedia.com Mon Aug 12 02:57:04 2013 From: anmulholland at expedia.com (Andrew Mulholland) Date: Mon, 12 Aug 2013 10:57:04 +0100 Subject: ParNew 4x slower under 1.7 vs 1.6? In-Reply-To: <5208699B.5030305@oracle.com> Message-ID: Hi Jon On 8/12/13 5:50 AM, "Jon Masamitsu" wrote: > >I saw your response to Bernd where you verified >the number of GC threads that you are using but the >amount of parallelism your getting still doesn't seem to >be right. See below. Thanks - data below as requested: > >, 0.0989930 secs] [Times: user=0.46 sys=0.02, real=0.10 secs] > > > >About a 4 times speed up here (user / real ~ 4.6) > For JVM 1.6.0_43: 90% of ParNew Gcs had parallelism of at least: 5.17 75% of ParNew Gcs had parallelism of at least: 5.83 50% of ParNew Gcs had parallelism of at least: 6.71 25% of ParNew Gcs had parallelism of at least 7.66 10% of ParNew Gcs had parallelism of at least 8.5 Average parallelism for ParNew Gs: 6.75 (stdev 1.21) Max Parallelism for ParNew GCs: 9 > >, 0.4156130 secs] [Times: user=0.65 sys=0.02, real=0.42 secs] > > >Only about a 1.5 times speed up here (user / real ~1.5). For JVM 1.7.0_25: 90% of ParNew Gcs had parallelism of at least 1.53 75% of ParNew Gcs had parallelism of at least 1.63 50% of ParNew Gcs had parallelism of at least 2 25% of ParNew Gcs had parallelism of at least: 2.72 10% of ParNew Gcs had parallelism of at least: 5.32 Average parallelism for ParNew GCs: 2.58 (stdev 1.51) Max Parallelism for ParNew GCs: 8.6 > >Can you check some other entries and verify that the amount of >parallelism your seeing on 1.7 is only 1.5. > >From the above it seems that we do get higher than 1.5.. With jvm 1.7 - but it is consistently significantly lower than on jvm 1.6 Thanks Andrew From pbouchard8 at gmail.com Thu Aug 15 10:59:04 2013 From: pbouchard8 at gmail.com (Phil Bouchard) Date: Thu, 15 Aug 2013 13:59:04 -0400 Subject: Deterministic memory manager of constant complexity Message-ID: <520D16E8.9060003@gmail.com> Hello, I am a software engineer who wrote the following: http://finitetheory.com/author.html I would like to propose Java a new deterministic memory manager of constant complexity that will make applications run smoothly and never to lag again. Its documentation is available at: https://svn.boost.org/svn/boost/sandbox/block_ptr/libs/smart_ptr/doc/index.html As a practical example, the applications on the Android will run much better. Sincerely yours, Phil Bouchard