From prasanna.gopal at blackrock.com Thu Oct 6 10:48:09 2016 From: prasanna.gopal at blackrock.com (Gopal, Prasanna CWK) Date: Thu, 6 Oct 2016 10:48:09 +0000 Subject: G1 GC - [Ref Enq] taking lot of time Message-ID: Hi All We are experimenting G1 GC for one of our application. Please find our application settings GC Settings -XX:MaxPermSize=512m -XX:+UseG1GC -XX:G1ReservePercent=40 -XX:ConcGCThreads=14 -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime JVM : JVM : jdk_7u40_x64 While scanning for instances which caused application threads to get stopped , we found the following instance in our GC logs. GC Logs ======= {Heap before GC invocations=57832 (full 1): garbage-first heap total 5242880K, used 3111240K [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000) region size 2048K, 672 young (1376256K), 1 survivors (2048K) compacting perm gen total 98304K, used 96772K [0x00000007e0000000, 0x00000007e6000000, 0x0000000800000000) the space 98304K, 98% used [0x00000007e0000000, 0x00000007e5e81338, 0x00000007e5e81400, 0x00000007e6000000) No shared spaces configured. 2016-10-05T12:13:19.835-0400: 80080.725: [GC pause (young) Desired survivor size 88080384 bytes, new threshold 15 (max 15) - age 1: 136824 bytes, 136824 total - age 2: 11120 bytes, 147944 total - age 3: 11408 bytes, 159352 total - age 4: 9248 bytes, 168600 total - age 5: 8632 bytes, 177232 total - age 6: 8224 bytes, 185456 total - age 7: 8784 bytes, 194240 total - age 8: 87856 bytes, 282096 total - age 9: 25080 bytes, 307176 total - age 10: 8272 bytes, 315448 total - age 11: 7984 bytes, 323432 total - age 12: 14120 bytes, 337552 total - age 13: 9824 bytes, 347376 total - age 14: 11616 bytes, 358992 total - age 15: 8032 bytes, 367024 total , 51.2453720 secs] [Parallel Time: 11.3 ms, GC Workers: 16] [GC Worker Start (ms): Min: 80080725.7, Avg: 80080725.8, Max: 80080725.9, Diff: 0.2] [Ext Root Scanning (ms): Min: 4.4, Avg: 5.2, Max: 10.7, Diff: 6.3, Sum: 83.5] [Update RS (ms): Min: 0.0, Avg: 0.8, Max: 1.3, Diff: 1.3, Sum: 13.4] [Processed Buffers: Min: 0, Avg: 3.0, Max: 7, Diff: 7, Sum: 48] [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.3, Diff: 0.2, Sum: 3.7] [Object Copy (ms): Min: 0.1, Avg: 0.4, Max: 0.6, Diff: 0.5, Sum: 6.0] [Termination (ms): Min: 0.0, Avg: 4.3, Max: 4.6, Diff: 4.6, Sum: 68.6] [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.1, Sum: 1.0] [GC Worker Total (ms): Min: 10.9, Avg: 11.0, Max: 11.2, Diff: 0.3, Sum: 176.3] [GC Worker End (ms): Min: 80080736.7, Avg: 80080736.8, Max: 80080736.8, Diff: 0.1] [Code Root Fixup: 0.0 ms] [Clear CT: 0.4 ms] [Other: 51233.7 ms] [Choose CSet: 0.0 ms] [Ref Proc: 17.2 ms] [Ref Enq: 50800.9 ms] [Free CSet: 381.2 ms] [Eden: 1342.0M(1342.0M)->0.0B(254.0M) Survivors: 2048.0K->2048.0K Heap: 3038.3M(5120.0M)->1696.2M(5120.0M)] Heap after GC invocations=57833 (full 1): garbage-first heap total 5242880K, used 1736897K [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000) region size 2048K, 1 young (2048K), 1 survivors (2048K) compacting perm gen total 98304K, used 96772K [0x00000007e0000000, 0x00000007e6000000, 0x0000000800000000) the space 98304K, 98% used [0x00000007e0000000, 0x00000007e5e81338, 0x00000007e5e81400, 0x00000007e6000000) No shared spaces configured. } [Times: user=0.00 sys=17.51, real=51.28 secs] 2016-10-05T12:14:11.208-0400: 80132.098: Total time for which application threads were stopped: 51.3791600 seconds It looks like Reference Enqueue ( Ref Enq) event took nearly 50 seconds to complete. Could you please help me in understanding , why it might take so much time to complete. Do I need to add any diagnostic flag to get more information ?. Apologies , If similar question was answered before in the mailing list. Any help is really appreciated. Thanks and Regards Prasanna This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information. Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock?s Privacy Policy. BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) Limited are authorised and regulated by the Financial Conduct Authority. Registered in England No. 796793 and No. 2020394 respectively. BlackRock Life Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Registered in England No. 2223202. Registered Offices: 12 Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is authorised and regulated by the Financial Conduct Authority and is a registered investment adviser with the Securities and Exchange Commission (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange Place One, 1 Semple Street, Edinburgh EH3 8BL. For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations. ? 2016 BlackRock, Inc. All rights reserved. -------------- next part -------------- An HTML attachment was scrubbed... URL: From prasanna.gopal at blackrock.com Fri Oct 7 12:00:04 2016 From: prasanna.gopal at blackrock.com (Gopal, Prasanna CWK) Date: Fri, 7 Oct 2016 12:00:04 +0000 Subject: G1-GC - Full GC [humongous allocation request failed] Message-ID: Hi All We have one of our application with the following settings JVM : jdk_7u40_x64 ( we are in process of migrating to latest Jdk 7 family ) -XX:MaxPermSize=512m -XX:+UseG1GC -XX:G1ReservePercent=40 -XX:ConcGCThreads=14 -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy -XX:+PrintHeapAtGC -XX:+PrintReferenceGC -Xmx5120M -Xms5120M >From our GC logs , we can see our application is going Full GC due to humongous allocation failure. But from the logs we can see GC logs ======= 2016-10-07T02:37:14.978-0400: 71150.009: Total time for which application threads were stopped: 0.0137870 seconds 71150.399: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 75497488 bytes] 71150.399: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 75497472 bytes, attempted expansion amount: 75497472 bytes] 71150.399: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 2016-10-07T02:37:15.367-0400: 71150.399: Application time: 0.3898050 seconds 71150.401: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 75497488 bytes] 71150.401: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 75497472 bytes, attempted expansion amount: 75497472 bytes] 71150.401: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] {Heap before GC invocations=55428 (full 4): garbage-first heap total 5242880K, used 1900903K [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000) region size 2048K, 197 young (403456K), 14 survivors (28672K) compacting perm gen total 96256K, used 94313K [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000) the space 96256K, 97% used [0x00000007e0000000, 0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000) No shared spaces configured. 2016-10-07T02:37:15.369-0400: 71150.401: [GC pause (young) Desired survivor size 108003328 bytes, new threshold 15 (max 15) - age 1: 2362400 bytes, 2362400 total - age 2: 393128 bytes, 2755528 total - age 3: 1086824 bytes, 3842352 total - age 4: 1086528 bytes, 4928880 total - age 5: 1075480 bytes, 6004360 total - age 6: 1126736 bytes, 7131096 total - age 7: 1153072 bytes, 8284168 total - age 8: 1145832 bytes, 9430000 total - age 9: 1217904 bytes, 10647904 total - age 10: 1188384 bytes, 11836288 total - age 11: 1212456 bytes, 13048744 total - age 12: 1263960 bytes, 14312704 total - age 13: 4816 bytes, 14317520 total - age 14: 88952 bytes, 14406472 total - age 15: 7408 bytes, 14413880 total 71150.401: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 149101, predicted base time: 16.35 ms, remaining time: 183.65 ms, target pause time: 200.00 ms] 71150.401: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 183 regions, survivors: 14 regions, predicted young region time: 3.45 ms] 71150.401: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 183 regions, survivors: 14 regions, old: 0 regions, predicted pause time: 19.80 ms, target pause time: 200.00 ms] 2016-10-07T02:37:15.410-0400: 71150.442: [SoftReference, 0 refs, 0.0000460 secs]2016-10-07T02:37:15.410-0400: 71150.442: [WeakReference, 1 refs, 0.0000050 secs]2016-10-07T02:37:15.410-0400: 71150.442: [FinalReference, 4 refs, 0.0000210 secs]2016-10-07T02:37:15.410-0400: 71150.442: [PhantomReference, 0 refs, 0.0000040 secs]2016-10-07T02:37:15.410-0400: 71150.442: [JNI Weak Reference, 0.0000050 secs], 0.0428440 secs] [Parallel Time: 40.0 ms, GC Workers: 16] [GC Worker Start (ms): Min: 71150401.6, Avg: 71150401.8, Max: 71150401.9, Diff: 0.4] [Ext Root Scanning (ms): Min: 3.1, Avg: 3.9, Max: 8.4, Diff: 5.3, Sum: 62.1] [Update RS (ms): Min: 14.8, Avg: 19.0, Max: 19.8, Diff: 5.0, Sum: 304.3] [Processed Buffers: Min: 21, Avg: 37.6, Max: 86, Diff: 65, Sum: 601] [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.9] [Object Copy (ms): Min: 16.5, Avg: 16.6, Max: 16.8, Diff: 0.3, Sum: 266.3] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.7] [GC Worker Total (ms): Min: 39.5, Avg: 39.6, Max: 39.9, Diff: 0.4, Sum: 634.4] [GC Worker End (ms): Min: 71150441.4, Avg: 71150441.4, Max: 71150441.5, Diff: 0.1] [Code Root Fixup: 0.0 ms] [Clear CT: 0.2 ms] [Other: 2.7 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.3 ms] [Ref Enq: 0.0 ms] [Free CSet: 1.3 ms] [Eden: 366.0M(1616.0M)->0.0B(1384.0M) Survivors: 28.0M->104.0M Heap: 1856.6M(5120.0M)->1568.8M(5120.0M)] Heap after GC invocations=55429 (full 4): garbage-first heap total 5242880K, used 1606459K [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000) region size 2048K, 52 young (106496K), 52 survivors (106496K) compacting perm gen total 96256K, used 94313K [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000) the space 96256K, 97% used [0x00000007e0000000, 0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000) No shared spaces configured. } [Times: user=0.64 sys=0.00, real=0.04 secs] 71150.444: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 75497488 bytes] 71150.444: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 69206016 bytes, attempted expansion amount: 69206016 bytes] 71150.444: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 2016-10-07T02:37:15.412-0400: 71150.444: Total time for which application threads were stopped: 0.0448480 seconds 2016-10-07T02:37:15.412-0400: 71150.444: Application time: 0.0000500 seconds 71150.445: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 75497488 bytes] 71150.445: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 69206016 bytes, attempted expansion amount: 69206016 bytes] 71150.445: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 71150.445: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 75497488 bytes] 71150.445: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 75497488 bytes, attempted expansion amount: 77594624 bytes] 71150.445: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] {Heap before GC invocations=55429 (full 4): garbage-first heap total 5242880K, used 1606459K [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000) region size 2048K, 53 young (108544K), 52 survivors (106496K) compacting perm gen total 96256K, used 94313K [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000) the space 96256K, 97% used [0x00000007e0000000, 0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000) No shared spaces configured. 2016-10-07T02:37:15.414-0400: 71150.445: [Full GC2016-10-07T02:37:16.337-0400: 71151.368: [SoftReference, 86 refs, 0.0000720 secs]2016-10-07T02:37:16.337-0400: 71151.368: [WeakReference, 1760 refs, 0.0002980 secs]2016-10-07T02:37:16.337-0400: 71151.369: [FinalReference, 1201 refs, 0.0002080 secs]2016-10-07T02:37:16.337-0400: 71151.369: [PhantomReference, 0 refs, 0.0000030 secs]2016-10-07T02:37:16.337-0400: 71151.369: [JNI Weak Reference, 0.0000080 secs] 1568M->915M(5120M), 2.6880870 secs] 60 refs, 0.0002980 secs]2016-10-07T02:37:16.337-0400: 71151.369: [FinalReference, 1201 refs, 0.0002080 secs]2016-10-07T02:37:16.337-0400: 71151.369: [PhantomReference, 0 refs, 0.0000030 secs]2016-10-07T02:37:16.337-0400: 71151.369: [JNI Weak Reference, 0.0000080 secs] 1568M->915M(5120M), 2.6880870 secs] [Eden: 2048.0K(1384.0M)->0.0B(2112.0M) Survivors: 104.0M->0.0B Heap: 1568.8M(5120.0M)->915.2M(5120.0M)] Heap after GC invocations=55430 (full 5): garbage-first heap total 5242880K, used 937168K [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000) region size 2048K, 0 young (0K), 0 survivors (0K) compacting perm gen total 96256K, used 94313K [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000) the space 96256K, 97% used [0x00000007e0000000, 0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000) No shared spaces configured. Could you please help me in giving your views on the following queries 1) Humongus allocation request for 72 mb failed, from the logs we can also see we have free space of around 3 GB. Does this means , our application is encountering high amount of fragmentation ?. 2) Does tunning the gc params to make sure Mixed GC happens more , will help in resolving such Full GC?s ? 3) Is there ?XX:Print* flag which can tell us how many old gen and humongous regions we have (other than looking at [G1 Ergonomics] output , which sometimes gives old gen region count) ? Please do let me know , if you need any more information. Appreciate your help. Thanks and Regards Prasanna This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information. Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock?s Privacy Policy. BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) Limited are authorised and regulated by the Financial Conduct Authority. Registered in England No. 796793 and No. 2020394 respectively. BlackRock Life Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Registered in England No. 2223202. Registered Offices: 12 Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is authorised and regulated by the Financial Conduct Authority and is a registered investment adviser with the Securities and Exchange Commission (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange Place One, 1 Semple Street, Edinburgh EH3 8BL. For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations. ? 2016 BlackRock, Inc. All rights reserved. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Fri Oct 7 12:48:49 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 7 Oct 2016 08:48:49 -0400 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: References: Message-ID: Hi Prasanna, First suggestion - move to latest Java 8. G1 saw a lot of improvements in 8, and 7 is EOL of course. Humongous allocations require contiguous regions to satisfy the allocation, and are done directly out of old gen. You're reserving 40% of heap to handle overflow(G1ReservePercent) - why? I believe that reserve is only for mitigating to-space exhaustion, which is during evacuation only - they won't be available for humongous allocations (someone can correct me if that's wrong). Heap expansion fails because you're already at the limit given 40% is reserved. Again, I think you'll get more help here if you move to one of the latest Java 8 releases. On Friday, October 7, 2016, Gopal, Prasanna CWK < prasanna.gopal at blackrock.com > wrote: > Hi All > > > > We have one of our application with the following settings > > > > JVM : jdk_7u40_x64 ( we are in process of migrating to latest Jdk 7 > family ) > > > > -XX:MaxPermSize=512m > > -XX:+UseG1GC > > -XX:G1ReservePercent=40 > > -XX:ConcGCThreads=14 > > -XX:+PrintGCDateStamps > > -XX:+PrintTenuringDistribution > > -XX:+PrintGCApplicationConcurrentTime > > -XX:+PrintGCApplicationStoppedTime > > -XX:+PrintAdaptiveSizePolicy > > -XX:+PrintHeapAtGC > > -XX:+PrintReferenceGC > > -Xmx5120M > > -Xms5120M > > > > > > From our GC logs , we can see our application is going Full GC due to > humongous allocation failure. But from the logs we can see > > > > > > GC logs > > ======= > > > > 2016-10-07T02:37:14.978-0400: 71150.009: Total time for which application > threads were stopped: 0.0137870 seconds > > 71150.399: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > humongous allocation request failed, allocation request: 75497488 bytes] > > 71150.399: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: *75497472* bytes, attempted expansion amount: 75497472 > bytes] > > 71150.399: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: > heap expansion operation failed] > > 2016-10-07T02:37:15.367-0400: 71150.399: Application time: 0.3898050 > seconds > > 71150.401: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > humongous allocation request failed, allocation request: 75497488 bytes] > > 71150.401: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 75497472 bytes, attempted expansion amount: 75497472 > bytes] > > 71150.401: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: > heap expansion operation failed] > > {Heap before GC invocations=55428 (full 4): > > garbage-first heap total 5242880K, used 1900903K [0x00000006a0000000, > 0x00000007e0000000, 0x00000007e0000000) > > region size 2048K, 197 young (403456K), 14 survivors (28672K) > > compacting perm gen total 96256K, used 94313K [0x00000007e0000000, > 0x00000007e5e00000, 0x0000000800000000) > > the space 96256K, 97% used [0x00000007e0000000, 0x00000007e5c1a700, > 0x00000007e5c1a800, 0x00000007e5e00000) > > No shared spaces configured. > > 2016-10-07T02:37:15.369-0400: 71150.401: [GC pause (young) > > Desired survivor size 108003328 bytes, new threshold 15 (max 15) > > - age 1: 2362400 bytes, 2362400 total > > - age 2: 393128 bytes, 2755528 total > > - age 3: 1086824 bytes, 3842352 total > > - age 4: 1086528 bytes, 4928880 total > > - age 5: 1075480 bytes, 6004360 total > > - age 6: 1126736 bytes, 7131096 total > > - age 7: 1153072 bytes, 8284168 total > > - age 8: 1145832 bytes, 9430000 total > > - age 9: 1217904 bytes, 10647904 total > > - age 10: 1188384 bytes, 11836288 total > > - age 11: 1212456 bytes, 13048744 total > > - age 12: 1263960 bytes, 14312704 total > > - age 13: 4816 bytes, 14317520 total > > - age 14: 88952 bytes, 14406472 total > > - age 15: 7408 bytes, 14413880 total > > 71150.401: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 149101, predicted base time: 16.35 ms, remaining time: > 183.65 ms, target pause time: 200.00 ms] > > 71150.401: [G1Ergonomics (CSet Construction) add young regions to CSet, > eden: 183 regions, survivors: 14 regions, predicted young region time: 3.45 > ms] > > 71150.401: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: > 183 regions, survivors: 14 regions, old: 0 regions, predicted pause time: > 19.80 ms, target pause time: 200.00 ms] > > 2016-10-07T02:37:15.410-0400: 71150.442: [SoftReference, 0 refs, 0.0000460 > secs]2016-10-07T02:37:15.410-0400: 71150.442: [WeakReference, 1 refs, > 0.0000050 secs]2016-10-07T02:37:15.410-0400: 71150.442: [FinalReference, > 4 refs, 0.0000210 secs]2016-10-07T02:37:15.410-0400: 71150.442: > [PhantomReference, 0 refs, 0.0000040 secs]2016-10-07T02:37:15.410-0400: > 71150.442: [JNI Weak Reference, 0.0000050 secs], 0.0428440 secs] > > [Parallel Time: 40.0 ms, GC Workers: 16] > > [GC Worker Start (ms): Min: 71150401.6, Avg: 71150401.8, Max: > 71150401.9, Diff: 0.4] > > [Ext Root Scanning (ms): Min: 3.1, Avg: 3.9, Max: 8.4, Diff: 5.3, > Sum: 62.1] > > [Update RS (ms): Min: 14.8, Avg: 19.0, Max: 19.8, Diff: 5.0, Sum: > 304.3] > > [Processed Buffers: Min: 21, Avg: 37.6, Max: 86, Diff: > 65, Sum: 601] > > [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.9] > > [Object Copy (ms): Min: 16.5, Avg: 16.6, Max: 16.8, Diff: 0.3, Sum: > 266.3] > > [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] > > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: > 0.7] > > [GC Worker Total (ms): Min: 39.5, Avg: 39.6, Max: 39.9, Diff: 0.4, > Sum: 634.4] > > [GC Worker End (ms): Min: 71150441.4, Avg: 71150441.4, Max: > 71150441.5, Diff: 0.1] > > [Code Root Fixup: 0.0 ms] > > [Clear CT: 0.2 ms] > > [Other: 2.7 ms] > > [Choose CSet: 0.0 ms] > > [Ref Proc: 0.3 ms] > > [Ref Enq: 0.0 ms] > > [Free CSet: 1.3 ms] > > [Eden: 366.0M(1616.0M)->0.0B(1384.0M) Survivors: 28.0M->104.0M Heap: > 1856.6M(5120.0M)->1568.8M(5120.0M)] > > Heap after GC invocations=55429 (full 4): > > garbage-first heap total 5242880K, used 1606459K [0x00000006a0000000, > 0x00000007e0000000, 0x00000007e0000000) > > region size 2048K, 52 young (106496K), 52 survivors (106496K) > > compacting perm gen total 96256K, used 94313K [0x00000007e0000000, > 0x00000007e5e00000, 0x0000000800000000) > > the space 96256K, 97% used [0x00000007e0000000, 0x00000007e5c1a700, > 0x00000007e5c1a800, 0x00000007e5e00000) > > No shared spaces configured. > > } > > [Times: user=0.64 sys=0.00, real=0.04 secs] > > 71150.444: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > humongous allocation request failed, allocation request: 75497488 bytes] > > 71150.444: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 69206016 bytes, attempted expansion amount: 69206016 > bytes] > > 71150.444: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: > heap expansion operation failed] > > 2016-10-07T02:37:15.412-0400: 71150.444: Total time for which application > threads were stopped: 0.0448480 seconds > > 2016-10-07T02:37:15.412-0400: 71150.444: Application time: 0.0000500 > seconds > > 71150.445: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > humongous allocation request failed, allocation request: 75497488 bytes] > > 71150.445: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 69206016 bytes, attempted expansion amount: 69206016 > bytes] > > 71150.445: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: > heap expansion operation failed] > > 71150.445: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > allocation request failed, allocation request: 75497488 bytes] > > 71150.445: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 75497488 bytes, attempted expansion amount: 77594624 > bytes] > > 71150.445: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: > heap expansion operation failed] > > {Heap before GC invocations=55429 (full 4): > > garbage-first heap total 5242880K, used 1606459K [0x00000006a0000000, > 0x00000007e0000000, 0x00000007e0000000) > > region size 2048K, 53 young (108544K), 52 survivors (106496K) > > compacting perm gen total 96256K, used 94313K [0x00000007e0000000, > 0x00000007e5e00000, 0x0000000800000000) > > the space 96256K, 97% used [0x00000007e0000000, 0x00000007e5c1a700, > 0x00000007e5c1a800, 0x00000007e5e00000) > > No shared spaces configured. > > 2016-10-07T02:37:15.414-0400: 71150.445: [Full > GC2016-10-07T02:37:16.337-0400: 71151.368: [SoftReference, 86 refs, > 0.0000720 secs]2016-10-07T02:37:16.337-0400: 71151.368: [WeakReference, > 1760 refs, 0.0002980 secs]2016-10-07T02:37:16.337-0400: 71151.369: > [FinalReference, 1201 refs, 0.0002080 secs]2016-10-07T02:37:16.337-0400: > 71151.369: [PhantomReference, 0 refs, 0.0000030 > secs]2016-10-07T02:37:16.337-0400: 71151.369: [JNI Weak Reference, > 0.0000080 secs] 1568M->915M(5120M), 2.6880870 secs] > > 60 refs, 0.0002980 secs]2016-10-07T02:37:16.337-0400: 71151.369: > [FinalReference, 1201 refs, 0.0002080 secs]2016-10-07T02:37:16.337-0400: > 71151.369: [PhantomReference, 0 refs, 0.0000030 > secs]2016-10-07T02:37:16.337-0400: 71151.369: [JNI Weak Reference, > 0.0000080 secs] 1568M->915M(5120M), 2.6880870 secs] > > [Eden: 2048.0K(1384.0M)->0.0B(2112.0M) Survivors: 104.0M->0.0B Heap: > 1568.8M(5120.0M)->915.2M(5120.0M)] > > Heap after GC invocations=55430 (full 5): > > garbage-first heap total 5242880K, used 937168K [0x00000006a0000000, > 0x00000007e0000000, 0x00000007e0000000) > > region size 2048K, 0 young (0K), 0 survivors (0K) > > compacting perm gen total 96256K, used 94313K [0x00000007e0000000, > 0x00000007e5e00000, 0x0000000800000000) > > the space 96256K, 97% used [0x00000007e0000000, 0x00000007e5c1a700, > 0x00000007e5c1a800, 0x00000007e5e00000) > > No shared spaces configured. > > > > Could you please help me in giving your views on the following queries > > > > 1) Humongus allocation request for 72 mb failed, from the logs we > can also see we have free space of around 3 GB. Does this means , our > application is encountering high amount of fragmentation ?. > > 2) Does tunning the gc params to make sure Mixed GC happens more , > will help in resolving such Full GC?s ? > > 3) Is there ?XX:Print* flag which can tell us how many old gen and > humongous regions we have (other than looking at [G1 Ergonomics] output > , which sometimes gives old gen region count) ? > > > > Please do let me know , if you need any more information. Appreciate your > help. > > > > Thanks and Regards > > Prasanna > > > > This message may contain information that is confidential or privileged. > If you are not the intended recipient, please advise the sender immediately > and delete this message. See http://www.blackrock.com/corpo > rate/en-us/compliance/email-disclaimers for further information. Please > refer to http://www.blackrock.com/corporate/en-us/compliance/privacy- > policy for more information about BlackRock?s Privacy Policy. > > BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) > Limited are authorised and regulated by the Financial Conduct Authority. > Registered in England No. 796793 and No. 2020394 respectively. BlackRock > Life Limited is authorised by the Prudential Regulation Authority and > regulated by the Financial Conduct Authority and the Prudential Regulation > Authority. Registered in England No. 2223202. Registered Offices: 12 > Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is > authorised and regulated by the Financial Conduct Authority and is a > registered investment adviser with the Securities and Exchange Commission > (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange > Place One, 1 Semple Street, Edinburgh EH3 8BL. > > For a list of BlackRock's office addresses worldwide, see > http://www.blackrock.com/corporate/en-us/about-us/contacts-locations. > > ? 2016 BlackRock, Inc. All rights reserved. > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Fri Oct 7 15:52:01 2016 From: yu.zhang at oracle.com (yu.zhang at oracle.com) Date: Fri, 7 Oct 2016 08:52:01 -0700 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: References: Message-ID: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> Prasanna, In addition to what Vitaly said, I have some comments about your question: 1) Humongus allocation request for 72 mb failed, from the logs we can also see we have free space of around 3 GB. Does this means , our application is encountering high amount of fragmentation ?. It is possible. What it means is g1 can not find 36 consecutive regions for that 72 mb object. I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent. 2)2) Does tunning the gc params to make sure Mixed GC happens more , will help in resolving such Full GC?s ? If you can not move to jdk8, in jdk7 the default for parameter G1MixedGCLiveThresholdPercent is 65 (changed to 85 in jdk8). That is too low for most workloads. You can increase that so that more old regions will be treat as candidate for mixed gc. 3) Is there ?XX:Print* flag which can tell us how many old gen and humongous regions we have (other than looking at [G1 Ergonomics] output , which sometimes gives old gen region count) ? You get that count in jdk9, not jdk7. One basic requirement for using G1GC, you need to give the pause time goal. G1 uses that for sizing young/old gen. The default is 200ms. Thanks Jenny On 10/07/2016 05:48 AM, Vitaly Davidovich wrote: > Hi Prasanna, > > First suggestion - move to latest Java 8. G1 saw a lot of > improvements in 8, and 7 is EOL of course. > > Humongous allocations require contiguous regions to satisfy the > allocation, and are done directly out of old gen. You're reserving > 40% of heap to handle overflow(G1ReservePercent) - why? I believe that > reserve is only for mitigating to-space exhaustion, which is during > evacuation only - they won't be available for humongous allocations > (someone can correct me if that's wrong). > > Heap expansion fails because you're already at the limit given 40% is > reserved. > > Again, I think you'll get more help here if you move to one of the > latest Java 8 releases. > > On Friday, October 7, 2016, Gopal, Prasanna CWK > > wrote: > > Hi All > > We have one of our application with the following settings > > JVM : jdk_7u40_x64 ( we are in process of migrating to latest > Jdk 7 family ) > > -XX:MaxPermSize=512m > > -XX:+UseG1GC > > -XX:G1ReservePercent=40 > > -XX:ConcGCThreads=14 > > -XX:+PrintGCDateStamps > > -XX:+PrintTenuringDistribution > > -XX:+PrintGCApplicationConcurrentTime > > -XX:+PrintGCApplicationStoppedTime > > -XX:+PrintAdaptiveSizePolicy > > -XX:+PrintHeapAtGC > > -XX:+PrintReferenceGC > > -Xmx5120M > > -Xms5120M > > From our GC logs , we can see our application is going Full GC due > to humongous allocation failure. But from the logs we can see > > GC logs > > ======= > > 2016-10-07T02:37:14.978-0400: 71150.009: Total time for which > application threads were stopped: 0.0137870 seconds > > 71150.399: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: humongous allocation request failed, allocation request: > 75497488 bytes] > > 71150.399: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: *75497472* bytes, attempted expansion amount: > 75497472 bytes] > > 71150.399: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > > 2016-10-07T02:37:15.367-0400: 71150.399: Application time: > 0.3898050 seconds > > 71150.401: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: humongous allocation request failed, allocation request: > 75497488 bytes] > > 71150.401: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 75497472 bytes, attempted expansion amount: > 75497472 bytes] > > 71150.401: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > > {Heap before GC invocations=55428 (full 4): > > garbage-first heap total 5242880K, used 1900903K > [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000) > > region size 2048K, 197 young (403456K), 14 survivors (28672K) > > compacting perm gen total 96256K, used 94313K > [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000) > > the space 96256K, 97% used [0x00000007e0000000, > 0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000) > > No shared spaces configured. > > 2016-10-07T02:37:15.369-0400: 71150.401: [GC pause (young) > > Desired survivor size 108003328 bytes, new threshold 15 (max 15) > > - age 1: 2362400 bytes, 2362400 total > > - age 2: 393128 bytes, 2755528 total > > - age 3: 1086824 bytes, 3842352 total > > - age 4: 1086528 bytes, 4928880 total > > - age 5: 1075480 bytes, 6004360 total > > - age 6: 1126736 bytes, 7131096 total > > - age 7: 1153072 bytes, 8284168 total > > - age 8: 1145832 bytes, 9430000 total > > - age 9: 1217904 bytes, 10647904 total > > - age 10: 1188384 bytes, 11836288 total > > - age 11: 1212456 bytes, 13048744 total > > - age 12: 1263960 bytes, 14312704 total > > - age 13: 4816 bytes, 14317520 total > > - age 14: 88952 bytes, 14406472 total > > - age 15: 7408 bytes, 14413880 total > > 71150.401: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 149101, predicted base time: 16.35 ms, remaining > time: 183.65 ms, target pause time: 200.00 ms] > > 71150.401: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 183 regions, survivors: 14 regions, predicted young > region time: 3.45 ms] > > 71150.401: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 183 regions, survivors: 14 regions, old: 0 regions, > predicted pause time: 19.80 ms, target pause time: 200.00 ms] > > 2016-10-07T02:37:15.410-0400: 71150.442: [SoftReference, 0 refs, > 0.0000460 secs]2016-10-07T02:37:15.410-0400: 71150.442: > [WeakReference, 1 refs, 0.0000050 > secs]2016-10-07T02:37:15.410-0400: 71150.442: [FinalReference, 4 > refs, 0.0000210 secs]2016-10-07T02:37:15.410-0400: 71150.442: > [PhantomReference, 0 refs, 0.0000040 > secs]2016-10-07T02:37:15.410-0400: 71150.442: [JNI Weak Reference, > 0.0000050 secs], 0.0428440 secs] > > [Parallel Time: 40.0 ms, GC Workers: 16] > > [GC Worker Start (ms): Min: 71150401.6, Avg: 71150401.8, Max: > 71150401.9, Diff: 0.4] > > [Ext Root Scanning (ms): Min: 3.1, Avg: 3.9, Max: 8.4, Diff: 5.3, > Sum: 62.1] > > [Update RS (ms): Min: 14.8, Avg: 19.0, Max: 19.8, Diff: 5.0, Sum: > 304.3] > > [Processed Buffers: Min: 21, Avg: 37.6, Max: 86, Diff: 65, > Sum: 601] > > [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.9] > > [Object Copy (ms): Min: 16.5, Avg: 16.6, Max: 16.8, Diff: 0.3, > Sum: 266.3] > > [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] > > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, > Sum: 0.7] > > [GC Worker Total (ms): Min: 39.5, Avg: 39.6, Max: 39.9, Diff: 0.4, > Sum: 634.4] > > [GC Worker End (ms): Min: 71150441.4, Avg: 71150441.4, Max: > 71150441.5, Diff: 0.1] > > [Code Root Fixup: 0.0 ms] > > [Clear CT: 0.2 ms] > > [Other: 2.7 ms] > > [Choose CSet: 0.0 ms] > > [Ref Proc: 0.3 ms] > > [Ref Enq: 0.0 ms] > > [Free CSet: 1.3 ms] > > [Eden: 366.0M(1616.0M)->0.0B(1384.0M) Survivors: 28.0M->104.0M > Heap: 1856.6M(5120.0M)->1568.8M(5120.0M)] > > Heap after GC invocations=55429 (full 4): > > garbage-first heap total 5242880K, used 1606459K > [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000) > > region size 2048K, 52 young (106496K), 52 survivors (106496K) > > compacting perm gen total 96256K, used 94313K > [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000) > > the space 96256K, 97% used [0x00000007e0000000, > 0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000) > > No shared spaces configured. > > } > > [Times: user=0.64 sys=0.00, real=0.04 secs] > > 71150.444: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: humongous allocation request failed, allocation request: > 75497488 bytes] > > 71150.444: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 69206016 bytes, attempted expansion amount: > 69206016 bytes] > > 71150.444: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > > 2016-10-07T02:37:15.412-0400: 71150.444: Total time for which > application threads were stopped: 0.0448480 seconds > > 2016-10-07T02:37:15.412-0400: 71150.444: Application time: > 0.0000500 seconds > > 71150.445: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: humongous allocation request failed, allocation request: > 75497488 bytes] > > 71150.445: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 69206016 bytes, attempted expansion amount: > 69206016 bytes] > > 71150.445: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > > 71150.445: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: allocation request failed, allocation request: 75497488 bytes] > > 71150.445: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 75497488 bytes, attempted expansion amount: > 77594624 bytes] > > 71150.445: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > > {Heap before GC invocations=55429 (full 4): > > garbage-first heap total 5242880K, used 1606459K > [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000) > > region size 2048K, 53 young (108544K), 52 survivors (106496K) > > compacting perm gen total 96256K, used 94313K > [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000) > > the space 96256K, 97% used [0x00000007e0000000, > 0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000) > > No shared spaces configured. > > 2016-10-07T02:37:15.414-0400: 71150.445: [Full > GC2016-10-07T02:37:16.337-0400: 71151.368: [SoftReference, 86 > refs, 0.0000720 secs]2016-10-07T02:37:16.337-0400: 71151.368: > [WeakReference, 1760 refs, 0.0002980 > secs]2016-10-07T02:37:16.337-0400: 71151.369: [FinalReference, > 1201 refs, 0.0002080 secs]2016-10-07T02:37:16.337-0400: 71151.369: > [PhantomReference, 0 refs, 0.0000030 > secs]2016-10-07T02:37:16.337-0400: 71151.369: [JNI Weak Reference, > 0.0000080 secs] 1568M->915M(5120M), 2.6880870 secs] > > 60 refs, 0.0002980 secs]2016-10-07T02:37:16.337-0400: 71151.369: > [FinalReference, 1201 refs, 0.0002080 > secs]2016-10-07T02:37:16.337-0400: 71151.369: [PhantomReference, 0 > refs, 0.0000030 secs]2016-10-07T02:37:16.337-0400: 71151.369: [JNI > Weak Reference, 0.0000080 secs] 1568M->915M(5120M), 2.6880870 secs] > > [Eden: 2048.0K(1384.0M)->0.0B(2112.0M) Survivors: 104.0M->0.0B > Heap: 1568.8M(5120.0M)->915.2M(5120.0M)] > > Heap after GC invocations=55430 (full 5): > > garbage-first heap total 5242880K, used 937168K > [0x00000006a0000000, 0x00000007e0000000, 0x00000007e0000000) > > region size 2048K, 0 young (0K), 0 survivors (0K) > > compacting perm gen total 96256K, used 94313K > [0x00000007e0000000, 0x00000007e5e00000, 0x0000000800000000) > > the space 96256K, 97% used [0x00000007e0000000, > 0x00000007e5c1a700, 0x00000007e5c1a800, 0x00000007e5e00000) > > No shared spaces configured. > > Could you please help me in giving your views on the following queries > > 1) Humongus allocation request for 72 mb failed, from the logs we > can also see we have free space of around 3 GB. Does this means , > our application is encountering high amount of fragmentation ?. > > 2) Does tunning the gc params to make sure Mixed GC happens more , > will help in resolving such Full GC?s ? > > 3) Is there ?XX:Print* flag which can tell us how many old gen and > humongous regions we have (other than looking at [G1 Ergonomics] > output , which sometimes gives old gen region count) ? > > Please do let me know , if you need any more information. > Appreciate your help. > > Thanks and Regards > > Prasanna > > This message may contain information that is confidential or > privileged. If you are not the intended recipient, please advise > the sender immediately and delete this message. See > http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers > > for further information. Please refer to > http://www.blackrock.com/corporate/en-us/compliance/privacy-policy > for > more information about BlackRock?s Privacy Policy. > > BlackRock Advisors (UK) Limited and BlackRock Investment > Management (UK) Limited are authorised and regulated by the > Financial Conduct Authority. Registered in England No. 796793 and > No. 2020394 respectively. BlackRock Life Limited is authorised by > the Prudential Regulation Authority and regulated by the Financial > Conduct Authority and the Prudential Regulation Authority. > Registered in England No. 2223202. Registered Offices: 12 > Throgmorton Avenue, London EC2N 2DL. BlackRock International > Limited is authorised and regulated by the Financial Conduct > Authority and is a registered investment adviser with the > Securities and Exchange Commission (SEC). Registered in Scotland > No. SC160821. Registered Office: Exchange Place One, 1 Semple > Street, Edinburgh EH3 8BL. > > For a list of BlackRock's office addresses worldwide, see > http://www.blackrock.com/corporate/en-us/about-us/contacts-locations > . > > ? 2016 BlackRock, Inc. All rights reserved. > > > > -- > Sent from my phone > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Fri Oct 7 16:00:00 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 7 Oct 2016 12:00:00 -0400 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> Message-ID: Hi Jenny, On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com wrote: > Prasanna, > > In addition to what Vitaly said, I have some comments about your question: > > 1) Humongus allocation request for 72 mb failed, from the logs we > can also see we have free space of around 3 GB. Does this means , our > application is encountering high amount of fragmentation ?. > > It is possible. What it means is g1 can not find 36 consecutive regions > for that 72 mb object. > > I agree the ReservePercent=40 is too high, but that should not prevent > allocating to the old gen. G1 tries to honor ReservePercent. > So just to clarify - is the space (i.e. regions) reserved by G1ReservePercent allocatable to humongous object allocations? All docs/webpages I found talk about this space being for holding survivors (i.e. evac failure/to-space exhaustion mitigation). It sounds like you're saying these reserved regions should also be used to satisfy HO allocs? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlie.hunt at oracle.com Fri Oct 7 16:46:00 2016 From: charlie.hunt at oracle.com (charlie hunt) Date: Fri, 7 Oct 2016 11:46:00 -0500 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> Message-ID: <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> Hi Vitaly, Just to clarify things in case there might be some confusion ? one of the terms in G1 can be a little confusing with a term used in Parallel GC, Serial GC and CMS GC, and that is ?to-space?. In the latter case, ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is evacuating objects too. So a ?to-space exhausted? means that during an evacuation of live objects from a G1 region (which could be an eden region, survivor region or old region), and there is not an available region to evacuate those live objects, this constitutes a ?to-space failure?. I may be wrong, but my understanding is that once a humongous object is allocated, it is not evacuated. It stays in the same allocated region(s) until it is marked as being unreachable and can be reclaimed. charlie > On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich wrote: > > Hi Jenny, > > On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com > wrote: > Prasanna, > > In addition to what Vitaly said, I have some comments about your question: > > 1) Humongus allocation request for 72 mb failed, from the logs we can also see we have free space of around 3 GB. Does this means , our application is encountering high amount of fragmentation ?. > > It is possible. What it means is g1 can not find 36 consecutive regions for that 72 mb object. > I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent. > > So just to clarify - is the space (i.e. regions) reserved by G1ReservePercent allocatable to humongous object allocations? All docs/webpages I found talk about this space being for holding survivors (i.e. evac failure/to-space exhaustion mitigation). It sounds like you're saying these reserved regions should also be used to satisfy HO allocs? > > Thanks > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Fri Oct 7 16:51:47 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 7 Oct 2016 12:51:47 -0400 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> Message-ID: Hi Charlie, On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt wrote: > Hi Vitaly, > > Just to clarify things in case there might be some confusion ? one of the > terms in G1 can be a little confusing with a term used in Parallel GC, > Serial GC and CMS GC, and that is ?to-space?. In the latter case, > ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is > evacuating objects too. So a ?to-space exhausted? means that during an > evacuation of live objects from a G1 region (which could be an eden region, > survivor region or old region), and there is not an available region to > evacuate those live objects, this constitutes a ?to-space failure?. > > I may be wrong, but my understanding is that once a humongous object is > allocated, it is not evacuated. It stays in the same allocated region(s) > until it is marked as being unreachable and can be reclaimed. > Right, I understand the distinction in terminology. What I'm a bit confused by is when Jenny said "I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent." Specifically, the "G1 tries to honor ReservePercent". It wasn't clear to me whether that implies humongous allocations can look for contiguous regions in the reserve, or not. That's what I'm hoping to get clarification on since other sources online don't mention G1ReservePercent playing a role for HO specifically. Thanks > > charlie > > On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich wrote: > > Hi Jenny, > > On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com > wrote: > >> Prasanna, >> >> In addition to what Vitaly said, I have some comments about your question: >> >> 1) Humongus allocation request for 72 mb failed, from the logs we >> can also see we have free space of around 3 GB. Does this means , our >> application is encountering high amount of fragmentation ?. >> >> It is possible. What it means is g1 can not find 36 consecutive regions >> for that 72 mb object. >> >> I agree the ReservePercent=40 is too high, but that should not prevent >> allocating to the old gen. G1 tries to honor ReservePercent. >> > So just to clarify - is the space (i.e. regions) reserved by > G1ReservePercent allocatable to humongous object allocations? All > docs/webpages I found talk about this space being for holding survivors > (i.e. evac failure/to-space exhaustion mitigation). It sounds like you're > saying these reserved regions should also be used to satisfy HO allocs? > > Thanks > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlie.hunt at oracle.com Fri Oct 7 17:00:06 2016 From: charlie.hunt at oracle.com (charlie hunt) Date: Fri, 7 Oct 2016 12:00:06 -0500 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> Message-ID: Glad to hear you?re not confused with the terminology. :-) On ReservePercent, my understanding is that the ReservePercent applies to the number of regions that will not used for young generation, eden regions or survivor regions. The intent is avoid to-space exhausted by ensuring a ?reserved percentage? of regions are available for evacuation. This implies that those reserved regions could be used for old regions or humongous regions. charlie > On Oct 7, 2016, at 11:51 AM, Vitaly Davidovich wrote: > > Hi Charlie, > > On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt > wrote: > Hi Vitaly, > > Just to clarify things in case there might be some confusion ? one of the terms in G1 can be a little confusing with a term used in Parallel GC, Serial GC and CMS GC, and that is ?to-space?. In the latter case, ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is evacuating objects too. So a ?to-space exhausted? means that during an evacuation of live objects from a G1 region (which could be an eden region, survivor region or old region), and there is not an available region to evacuate those live objects, this constitutes a ?to-space failure?. > > I may be wrong, but my understanding is that once a humongous object is allocated, it is not evacuated. It stays in the same allocated region(s) until it is marked as being unreachable and can be reclaimed. > Right, I understand the distinction in terminology. > > What I'm a bit confused by is when Jenny said "I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent." Specifically, the "G1 tries to honor ReservePercent". It wasn't clear to me whether that implies humongous allocations can look for contiguous regions in the reserve, or not. That's what I'm hoping to get clarification on since other sources online don't mention G1ReservePercent playing a role for HO specifically. > > Thanks > > charlie > >> On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich > wrote: >> >> Hi Jenny, >> >> On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com > wrote: >> Prasanna, >> >> In addition to what Vitaly said, I have some comments about your question: >> >> 1) Humongus allocation request for 72 mb failed, from the logs we can also see we have free space of around 3 GB. Does this means , our application is encountering high amount of fragmentation ?. >> >> It is possible. What it means is g1 can not find 36 consecutive regions for that 72 mb object. >> I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent. >> >> So just to clarify - is the space (i.e. regions) reserved by G1ReservePercent allocatable to humongous object allocations? All docs/webpages I found talk about this space being for holding survivors (i.e. evac failure/to-space exhaustion mitigation). It sounds like you're saying these reserved regions should also be used to satisfy HO allocs? >> >> Thanks >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Fri Oct 7 17:09:13 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 7 Oct 2016 13:09:13 -0400 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> Message-ID: On Fri, Oct 7, 2016 at 1:00 PM, charlie hunt wrote: > Glad to hear you?re not confused with the terminology. :-) > > On ReservePercent, my understanding is that the ReservePercent applies to > the number of regions that will not used for young generation, eden regions > or survivor regions. The intent is avoid to-space exhausted by ensuring a > ?reserved percentage? of regions are available for evacuation. This implies > that those reserved regions could be used for old regions or humongous > regions. > Ok, so then the more explicit wording would be "The intent is to avoid to-space exhausted by ensuring a reserved percentage of regions are available for evacuation or humongous object allocation", right? Perhaps the "for evacuation" is throwing it off a bit for me, since the HO allocation isn't an "evacuation" obviously. Thanks Charlie P.S. I realize I'm hijacking Prasanna's thread quite a bit, but hopefully the discussed info is useful anyway. > > charlie > > On Oct 7, 2016, at 11:51 AM, Vitaly Davidovich wrote: > > Hi Charlie, > > On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt > wrote: > >> Hi Vitaly, >> >> Just to clarify things in case there might be some confusion ? one of the >> terms in G1 can be a little confusing with a term used in Parallel GC, >> Serial GC and CMS GC, and that is ?to-space?. In the latter case, >> ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is >> evacuating objects too. So a ?to-space exhausted? means that during an >> evacuation of live objects from a G1 region (which could be an eden region, >> survivor region or old region), and there is not an available region to >> evacuate those live objects, this constitutes a ?to-space failure?. >> >> I may be wrong, but my understanding is that once a humongous object is >> allocated, it is not evacuated. It stays in the same allocated region(s) >> until it is marked as being unreachable and can be reclaimed. >> > Right, I understand the distinction in terminology. > > What I'm a bit confused by is when Jenny said "I agree the > ReservePercent=40 is too high, but that should not prevent allocating to > the old gen. G1 tries to honor ReservePercent." Specifically, the "G1 > tries to honor ReservePercent". It wasn't clear to me whether that implies > humongous allocations can look for contiguous regions in the reserve, or > not. That's what I'm hoping to get clarification on since other sources > online don't mention G1ReservePercent playing a role for HO specifically. > > Thanks > >> >> charlie >> >> On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich wrote: >> >> Hi Jenny, >> >> On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com > > wrote: >> >>> Prasanna, >>> >>> In addition to what Vitaly said, I have some comments about your >>> question: >>> >>> 1) Humongus allocation request for 72 mb failed, from the logs we >>> can also see we have free space of around 3 GB. Does this means , our >>> application is encountering high amount of fragmentation ?. >>> >>> It is possible. What it means is g1 can not find 36 consecutive regions >>> for that 72 mb object. >>> >>> I agree the ReservePercent=40 is too high, but that should not prevent >>> allocating to the old gen. G1 tries to honor ReservePercent. >>> >> So just to clarify - is the space (i.e. regions) reserved by >> G1ReservePercent allocatable to humongous object allocations? All >> docs/webpages I found talk about this space being for holding survivors >> (i.e. evac failure/to-space exhaustion mitigation). It sounds like you're >> saying these reserved regions should also be used to satisfy HO allocs? >> >> Thanks >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Fri Oct 7 17:15:54 2016 From: yu.zhang at oracle.com (yu.zhang at oracle.com) Date: Fri, 7 Oct 2016 10:15:54 -0700 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> Message-ID: <76f94e97-3886-9c1d-65a6-d00dda2d903c@oracle.com> Hi, Vitaly, Here is what happens in jdk9(I think the logic is the same as in jdk8). _reserve_regions = reserve percent*regions of the heap when trying to decide regions for young gen, we look at the free regions at the end of the collection, and try to honor the reserve_regions if (available_free_regions > _reserve_regions) { base_free_regions = available_free_regions - _reserve_regions; } And there are other constrains to consider: user defined constrains and pause time goal. This is what I meant by 'try to honor' the reserved. If there is enough available_free_regions, it will reserve those regions. Those regions can be used as old or young. Jenny On 10/07/2016 09:51 AM, Vitaly Davidovich wrote: > Hi Charlie, > > On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt > wrote: > > Hi Vitaly, > > Just to clarify things in case there might be some confusion ? one > of the terms in G1 can be a little confusing with a term used in > Parallel GC, Serial GC and CMS GC, and that is ?to-space?. In the > latter case, ?to-space? is a survivor space. In G1, ?to-space? is > any space that a G1 is evacuating objects too. So a ?to-space > exhausted? means that during an evacuation of live objects from a > G1 region (which could be an eden region, survivor region or old > region), and there is not an available region to evacuate those > live objects, this constitutes a ?to-space failure?. > > I may be wrong, but my understanding is that once a humongous > object is allocated, it is not evacuated. It stays in the same > allocated region(s) until it is marked as being unreachable and > can be reclaimed. > > Right, I understand the distinction in terminology. > > What I'm a bit confused by is when Jenny said "I agree the > ReservePercent=40 is too high, but that should not prevent allocating > to the old gen. G1 tries to honor ReservePercent." Specifically, the > "G1 tries to honor ReservePercent". It wasn't clear to me whether that > implies humongous allocations can look for contiguous regions in the > reserve, or not. That's what I'm hoping to get clarification on since > other sources online don't mention G1ReservePercent playing a role for > HO specifically. > > Thanks > > > charlie > >> On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich > > wrote: >> >> Hi Jenny, >> >> On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com >> > > wrote: >> >> Prasanna, >> >> In addition to what Vitaly said, I have some comments about >> your question: >> >> 1) Humongus allocation request for 72 mb failed, from the >> logs we can also see we have free space of around 3 GB. Does >> this means , our application is encountering high amount of >> fragmentation ?. >> >> It is possible. What it means is g1 can not find 36 >> consecutive regions for that 72 mb object. >> >> I agree the ReservePercent=40 is too high, but that should >> not prevent allocating to the old gen. G1 tries to honor >> ReservePercent. >> >> So just to clarify - is the space (i.e. regions) reserved by >> G1ReservePercent allocatable to humongous object allocations? All >> docs/webpages I found talk about this space being for holding >> survivors (i.e. evac failure/to-space exhaustion mitigation). It >> sounds like you're saying these reserved regions should also be >> used to satisfy HO allocs? >> >> Thanks >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlie.hunt at oracle.com Fri Oct 7 17:24:57 2016 From: charlie.hunt at oracle.com (charlie hunt) Date: Fri, 7 Oct 2016 12:24:57 -0500 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> Message-ID: <2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com> I think others are benefiting from your question(s) ? and it?s helping refresh my memory of things too. ;-) Actually, I just looked at what we documented in Java Performance Companion for G1ReservePercent, this wording may imply a very slightly subtle different definition, ?To reduce the risk of getting a promotion failure, G1 reserves some memory for promotions. This memory will not be used for the young generation.? Perhaps one of the G1 engineers can clarify this? Based on what we documented for G1ReservePercent, it implies that regions are reserved for promotions, which implies old generation regions. Note that on a young GC, some objects will be evacuated to survivor regions, and if G1 decides to grow the number of eden regions, then both those evacuated ?to survivor regions? and ?additional eden regions? will not come from that G1ReservePercent. And, since humongous objects are allocated from old regions, it is not clear to me that G1ReservePercent regions could be allocated into as humongous objects if the intent for G1ReservePercent is for promotions. Humongous objects are not promoted. They are allocated directly into humongous regions which get allocated from old generation. Again, hopefully one of the G1 engineers can jump in and clarify. Thanks for the question(s)! charlie > On Oct 7, 2016, at 12:09 PM, Vitaly Davidovich wrote: > > > > On Fri, Oct 7, 2016 at 1:00 PM, charlie hunt > wrote: > Glad to hear you?re not confused with the terminology. :-) > > On ReservePercent, my understanding is that the ReservePercent applies to the number of regions that will not used for young generation, eden regions or survivor regions. The intent is avoid to-space exhausted by ensuring a ?reserved percentage? of regions are available for evacuation. This implies that those reserved regions could be used for old regions or humongous regions. > Ok, so then the more explicit wording would be "The intent is to avoid to-space exhausted by ensuring a reserved percentage of regions are available for evacuation or humongous object allocation", right? Perhaps the "for evacuation" is throwing it off a bit for me, since the HO allocation isn't an "evacuation" obviously. > > Thanks Charlie > > P.S. I realize I'm hijacking Prasanna's thread quite a bit, but hopefully the discussed info is useful anyway. > > charlie > >> On Oct 7, 2016, at 11:51 AM, Vitaly Davidovich > wrote: >> >> Hi Charlie, >> >> On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt > wrote: >> Hi Vitaly, >> >> Just to clarify things in case there might be some confusion ? one of the terms in G1 can be a little confusing with a term used in Parallel GC, Serial GC and CMS GC, and that is ?to-space?. In the latter case, ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is evacuating objects too. So a ?to-space exhausted? means that during an evacuation of live objects from a G1 region (which could be an eden region, survivor region or old region), and there is not an available region to evacuate those live objects, this constitutes a ?to-space failure?. >> >> I may be wrong, but my understanding is that once a humongous object is allocated, it is not evacuated. It stays in the same allocated region(s) until it is marked as being unreachable and can be reclaimed. >> Right, I understand the distinction in terminology. >> >> What I'm a bit confused by is when Jenny said "I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent." Specifically, the "G1 tries to honor ReservePercent". It wasn't clear to me whether that implies humongous allocations can look for contiguous regions in the reserve, or not. That's what I'm hoping to get clarification on since other sources online don't mention G1ReservePercent playing a role for HO specifically. >> >> Thanks >> >> charlie >> >>> On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich > wrote: >>> >>> Hi Jenny, >>> >>> On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com > wrote: >>> Prasanna, >>> >>> In addition to what Vitaly said, I have some comments about your question: >>> >>> 1) Humongus allocation request for 72 mb failed, from the logs we can also see we have free space of around 3 GB. Does this means , our application is encountering high amount of fragmentation ?. >>> >>> It is possible. What it means is g1 can not find 36 consecutive regions for that 72 mb object. >>> I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent. >>> >>> So just to clarify - is the space (i.e. regions) reserved by G1ReservePercent allocatable to humongous object allocations? All docs/webpages I found talk about this space being for holding survivors (i.e. evac failure/to-space exhaustion mitigation). It sounds like you're saying these reserved regions should also be used to satisfy HO allocs? >>> >>> Thanks >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Fri Oct 7 17:27:13 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 7 Oct 2016 13:27:13 -0400 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: <76f94e97-3886-9c1d-65a6-d00dda2d903c@oracle.com> References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> <76f94e97-3886-9c1d-65a6-d00dda2d903c@oracle.com> Message-ID: Hi Jenny, On Fri, Oct 7, 2016 at 1:15 PM, yu.zhang at oracle.com wrote: > Hi, Vitaly, > > Here is what happens in jdk9(I think the logic is the same as in jdk8). > _reserve_regions = reserve percent*regions of the heap > when trying to decide regions for young gen, we look at the free regions > at the end of the collection, and try to honor the reserve_regions > if (available_free_regions > _reserve_regions) { > base_free_regions = available_free_regions - _reserve_regions; > } > > And there are other constrains to consider: user defined constrains and > pause time goal. > > This is what I meant by 'try to honor' the reserved. > If there is enough available_free_regions, it will reserve those regions. > Those regions can be used as old or young. > Ok, thanks. As you say, G1 *tries* to honor it, but may not. The docs I've come across online make it sound like this reservation is a guarantee, or at least they don't stipulate the reservation may not work. I don't know if it's worth clarifying that point or not, but my vote would be to make the docs err on the side of "more info" than less. The second part is what I mentioned to Charlie in my last reply - can humongous *allocations* be satisfied out of the reserve, or are the reserved regions only used to hold evacuees (when base_free_regions are not available). Thanks > > Jenny > > On 10/07/2016 09:51 AM, Vitaly Davidovich wrote: > > Hi Charlie, > > On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt > wrote: > >> Hi Vitaly, >> >> Just to clarify things in case there might be some confusion ? one of the >> terms in G1 can be a little confusing with a term used in Parallel GC, >> Serial GC and CMS GC, and that is ?to-space?. In the latter case, >> ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is >> evacuating objects too. So a ?to-space exhausted? means that during an >> evacuation of live objects from a G1 region (which could be an eden region, >> survivor region or old region), and there is not an available region to >> evacuate those live objects, this constitutes a ?to-space failure?. >> >> I may be wrong, but my understanding is that once a humongous object is >> allocated, it is not evacuated. It stays in the same allocated region(s) >> until it is marked as being unreachable and can be reclaimed. >> > Right, I understand the distinction in terminology. > > What I'm a bit confused by is when Jenny said "I agree the > ReservePercent=40 is too high, but that should not prevent allocating to > the old gen. G1 tries to honor ReservePercent." Specifically, the "G1 > tries to honor ReservePercent". It wasn't clear to me whether that implies > humongous allocations can look for contiguous regions in the reserve, or > not. That's what I'm hoping to get clarification on since other sources > online don't mention G1ReservePercent playing a role for HO specifically. > > Thanks > >> >> charlie >> >> On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich wrote: >> >> Hi Jenny, >> >> On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com > > wrote: >> >>> Prasanna, >>> >>> In addition to what Vitaly said, I have some comments about your >>> question: >>> >>> 1) Humongus allocation request for 72 mb failed, from the logs we >>> can also see we have free space of around 3 GB. Does this means , our >>> application is encountering high amount of fragmentation ?. >>> >>> It is possible. What it means is g1 can not find 36 consecutive regions >>> for that 72 mb object. >>> >>> I agree the ReservePercent=40 is too high, but that should not prevent >>> allocating to the old gen. G1 tries to honor ReservePercent. >>> >> So just to clarify - is the space (i.e. regions) reserved by >> G1ReservePercent allocatable to humongous object allocations? All >> docs/webpages I found talk about this space being for holding survivors >> (i.e. evac failure/to-space exhaustion mitigation). It sounds like you're >> saying these reserved regions should also be used to satisfy HO allocs? >> >> Thanks >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From prasanna.gopal at blackrock.com Fri Oct 7 17:29:44 2016 From: prasanna.gopal at blackrock.com (Gopal, Prasanna CWK) Date: Fri, 7 Oct 2016 17:29:44 +0000 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> <76f94e97-3886-9c1d-65a6-d00dda2d903c@oracle.com> Message-ID: <1102f59681054bbdba0107e076e98303@UKPMSEXD202N02.na.blkint.com> Hi All Thanks for all your reply. These discussions certainly help to get good insight ?. So just to summarize 1) G1ReservePercent will not affect Humongus allocation , so the full GC we are encountering is due to fragmentation 2) I will try chaging G1MixedGCLiveThresholdPercent to 85 to see the mixed GC?s can be increased. 3) Due to some other dependencies , we were unable to move to latest Jdk?s ( Jdk 8). Our application is currently running with CMS and we are seeing long GC pause , that why we wanted to explore G1.As we can?t move Jdk 8 soon , Is it good idea to migrate to G1 with Jdk 7 Thanks and regards Prasanna From: hotspot-gc-use [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of Vitaly Davidovich Sent: 07 October 2016 18:27 To: yu.zhang at oracle.com Cc: hotspot-gc-use at openjdk.java.net Subject: Re: G1-GC - Full GC [humongous allocation request failed] Hi Jenny, On Fri, Oct 7, 2016 at 1:15 PM, yu.zhang at oracle.com > wrote: Hi, Vitaly, Here is what happens in jdk9(I think the logic is the same as in jdk8). _reserve_regions = reserve percent*regions of the heap when trying to decide regions for young gen, we look at the free regions at the end of the collection, and try to honor the reserve_regions if (available_free_regions > _reserve_regions) { base_free_regions = available_free_regions - _reserve_regions; } And there are other constrains to consider: user defined constrains and pause time goal. This is what I meant by 'try to honor' the reserved. If there is enough available_free_regions, it will reserve those regions. Those regions can be used as old or young. Ok, thanks. As you say, G1 *tries* to honor it, but may not. The docs I've come across online make it sound like this reservation is a guarantee, or at least they don't stipulate the reservation may not work. I don't know if it's worth clarifying that point or not, but my vote would be to make the docs err on the side of "more info" than less. The second part is what I mentioned to Charlie in my last reply - can humongous *allocations* be satisfied out of the reserve, or are the reserved regions only used to hold evacuees (when base_free_regions are not available). Thanks Jenny On 10/07/2016 09:51 AM, Vitaly Davidovich wrote: Hi Charlie, On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt > wrote: Hi Vitaly, Just to clarify things in case there might be some confusion ? one of the terms in G1 can be a little confusing with a term used in Parallel GC, Serial GC and CMS GC, and that is ?to-space?. In the latter case, ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is evacuating objects too. So a ?to-space exhausted? means that during an evacuation of live objects from a G1 region (which could be an eden region, survivor region or old region), and there is not an available region to evacuate those live objects, this constitutes a ?to-space failure?. I may be wrong, but my understanding is that once a humongous object is allocated, it is not evacuated. It stays in the same allocated region(s) until it is marked as being unreachable and can be reclaimed. Right, I understand the distinction in terminology. What I'm a bit confused by is when Jenny said "I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent." Specifically, the "G1 tries to honor ReservePercent". It wasn't clear to me whether that implies humongous allocations can look for contiguous regions in the reserve, or not. That's what I'm hoping to get clarification on since other sources online don't mention G1ReservePercent playing a role for HO specifically. Thanks charlie On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich > wrote: Hi Jenny, On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com > wrote: Prasanna, In addition to what Vitaly said, I have some comments about your question: 1) Humongus allocation request for 72 mb failed, from the logs we can also see we have free space of around 3 GB. Does this means , our application is encountering high amount of fragmentation ?. It is possible. What it means is g1 can not find 36 consecutive regions for that 72 mb object. I agree the ReservePercent=40 is too high, but that should not prevent allocating to the old gen. G1 tries to honor ReservePercent. So just to clarify - is the space (i.e. regions) reserved by G1ReservePercent allocatable to humongous object allocations? All docs/webpages I found talk about this space being for holding survivors (i.e. evac failure/to-space exhaustion mitigation). It sounds like you're saying these reserved regions should also be used to satisfy HO allocs? Thanks _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information. Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock?s Privacy Policy. BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) Limited are authorised and regulated by the Financial Conduct Authority. Registered in England No. 796793 and No. 2020394 respectively. BlackRock Life Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Registered in England No. 2223202. Registered Offices: 12 Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is authorised and regulated by the Financial Conduct Authority and is a registered investment adviser with the Securities and Exchange Commission (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange Place One, 1 Semple Street, Edinburgh EH3 8BL. For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations. ? 2016 BlackRock, Inc. All rights reserved. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Fri Oct 7 17:44:29 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 7 Oct 2016 13:44:29 -0400 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: <2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com> References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> <2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com> Message-ID: On Friday, October 7, 2016, charlie hunt wrote: > I think others are benefiting from your question(s) ? and it?s helping > refresh my memory of things too. ;-) > > Actually, I just looked at what we documented in Java Performance > Companion for G1ReservePercent, this wording may imply a very slightly > subtle different definition, ?To reduce the risk of getting a promotion > failure, G1 reserves some memory for promotions. This memory will not be > used for the young generation.? > > Perhaps one of the G1 engineers can clarify this? > Yeah, would be good to clarify. The above wording in the performance companion is at odds of other definitions. If the regions can be used to hold Eden survivors or other survivors from existing survivor regions, then it's not really accurate to say it's not used for the young generation. I'm guessing what's meant is it won't be used for Eden regions, i.e. to hold normal non HO allocations. > > Based on what we documented for G1ReservePercent, it implies that regions > are reserved for promotions, which implies old generation regions. Note > that on a young GC, some objects will be evacuated to survivor regions, and > if G1 decides to grow the number of eden regions, then both those evacuated > ?to survivor regions? and ?additional eden regions? will not come from that > G1ReservePercent. And, since humongous objects are allocated from old > regions, it is not clear to me that G1ReservePercent regions could be > allocated into as humongous objects if the intent for G1ReservePercent is > for promotions. Humongous objects are not promoted. They are allocated > directly into humongous regions which get allocated from old generation. > It seems they're reserved for evacuees, not promotions, which can come from any region modulo humongous (since those aren't copied). > > Again, hopefully one of the G1 engineers can jump in and clarify. > Yes, please. > > Thanks for the question(s)! > > charlie > > On Oct 7, 2016, at 12:09 PM, Vitaly Davidovich > wrote: > > > > On Fri, Oct 7, 2016 at 1:00 PM, charlie hunt > wrote: > >> Glad to hear you?re not confused with the terminology. :-) >> >> On ReservePercent, my understanding is that the ReservePercent applies to >> the number of regions that will not used for young generation, eden regions >> or survivor regions. The intent is avoid to-space exhausted by ensuring a >> ?reserved percentage? of regions are available for evacuation. This implies >> that those reserved regions could be used for old regions or humongous >> regions. >> > Ok, so then the more explicit wording would be "The intent is to avoid > to-space exhausted by ensuring a reserved percentage of regions are > available for evacuation or humongous object allocation", right? Perhaps > the "for evacuation" is throwing it off a bit for me, since the HO > allocation isn't an "evacuation" obviously. > > Thanks Charlie > > P.S. I realize I'm hijacking Prasanna's thread quite a bit, but hopefully > the discussed info is useful anyway. > >> >> charlie >> >> On Oct 7, 2016, at 11:51 AM, Vitaly Davidovich > > wrote: >> >> Hi Charlie, >> >> On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt > > wrote: >> >>> Hi Vitaly, >>> >>> Just to clarify things in case there might be some confusion ? one of >>> the terms in G1 can be a little confusing with a term used in Parallel GC, >>> Serial GC and CMS GC, and that is ?to-space?. In the latter case, >>> ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is >>> evacuating objects too. So a ?to-space exhausted? means that during an >>> evacuation of live objects from a G1 region (which could be an eden region, >>> survivor region or old region), and there is not an available region to >>> evacuate those live objects, this constitutes a ?to-space failure?. >>> >>> I may be wrong, but my understanding is that once a humongous object is >>> allocated, it is not evacuated. It stays in the same allocated region(s) >>> until it is marked as being unreachable and can be reclaimed. >>> >> Right, I understand the distinction in terminology. >> >> What I'm a bit confused by is when Jenny said "I agree the >> ReservePercent=40 is too high, but that should not prevent allocating to >> the old gen. G1 tries to honor ReservePercent." Specifically, the "G1 >> tries to honor ReservePercent". It wasn't clear to me whether that implies >> humongous allocations can look for contiguous regions in the reserve, or >> not. That's what I'm hoping to get clarification on since other sources >> online don't mention G1ReservePercent playing a role for HO specifically. >> >> Thanks >> >>> >>> charlie >>> >>> On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich >> > wrote: >>> >>> Hi Jenny, >>> >>> On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com >>> < >>> yu.zhang at oracle.com >>> > wrote: >>> >>>> Prasanna, >>>> >>>> In addition to what Vitaly said, I have some comments about your >>>> question: >>>> >>>> 1) Humongus allocation request for 72 mb failed, from the logs we >>>> can also see we have free space of around 3 GB. Does this means , our >>>> application is encountering high amount of fragmentation ?. >>>> >>>> It is possible. What it means is g1 can not find 36 consecutive regions >>>> for that 72 mb object. >>>> >>>> I agree the ReservePercent=40 is too high, but that should not prevent >>>> allocating to the old gen. G1 tries to honor ReservePercent. >>>> >>> So just to clarify - is the space (i.e. regions) reserved by >>> G1ReservePercent allocatable to humongous object allocations? All >>> docs/webpages I found talk about this space being for holding survivors >>> (i.e. evac failure/to-space exhaustion mitigation). It sounds like you're >>> saying these reserved regions should also be used to satisfy HO allocs? >>> >>> Thanks >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> >>> >> >> > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Fri Oct 7 18:21:00 2016 From: yu.zhang at oracle.com (yu.zhang at oracle.com) Date: Fri, 7 Oct 2016 11:21:00 -0700 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> <76f94e97-3886-9c1d-65a6-d00dda2d903c@oracle.com> Message-ID: <6b567a4d-3c10-a56e-8d58-c6f5d01de29e@oracle.com> Vitaly, I am cc this to the dev list. My comments in line. On 10/07/2016 10:27 AM, Vitaly Davidovich wrote: > Hi Jenny, > > On Fri, Oct 7, 2016 at 1:15 PM, yu.zhang at oracle.com > > wrote: > > Hi, Vitaly, > > Here is what happens in jdk9(I think the logic is the same as in > jdk8). > > _reserve_regions = reserve percent*regions of the heap > when trying to decide regions for young gen, we look at the free > regions at the end of the collection, and try to honor the > reserve_regions > if (available_free_regions > _reserve_regions) { > base_free_regions = available_free_regions - _reserve_regions; > } > > And there are other constrains to consider: user defined > constrains and pause time goal. > > This is what I meant by 'try to honor' the reserved. > If there is enough available_free_regions, it will reserve those > regions. Those regions can be used as old or young. > > Ok, thanks. As you say, G1 *tries* to honor it, but may not. The > docs I've come across online make it sound like this reservation is a > guarantee, or at least they don't stipulate the reservation may not > work. I don't know if it's worth clarifying that point or not, but my > vote would be to make the docs err on the side of "more info" than less. Agree. > > The second part is what I mentioned to Charlie in my last reply - can > humongous *allocations* be satisfied out of the reserve, or are the > reserved regions only used to hold evacuees (when base_free_regions > are not available). That is a good question. Here is my understanding, which need to be confirmed by G1 developer. In this code HeapWord* G1CollectedHeap::humongous_obj_allocate(size_t word_size, AllocationContext_t context) G1 tries to find regions from _free_list that can hold the humongous objects. The reserved regions are also on the _free_list (again need to be confirmed by developer). So my understanding is those reserved regions can be used as humongous allocation. But I might be missing something. > > Thanks > > > Jenny > > On 10/07/2016 09:51 AM, Vitaly Davidovich wrote: >> Hi Charlie, >> >> On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt >> > wrote: >> >> Hi Vitaly, >> >> Just to clarify things in case there might be some confusion >> ? one of the terms in G1 can be a little confusing with a >> term used in Parallel GC, Serial GC and CMS GC, and that is >> ?to-space?. In the latter case, ?to-space? is a survivor >> space. In G1, ?to-space? is any space that a G1 is evacuating >> objects too. So a ?to-space exhausted? means that during an >> evacuation of live objects from a G1 region (which could be >> an eden region, survivor region or old region), and there is >> not an available region to evacuate those live objects, this >> constitutes a ?to-space failure?. >> >> I may be wrong, but my understanding is that once a humongous >> object is allocated, it is not evacuated. It stays in the >> same allocated region(s) until it is marked as being >> unreachable and can be reclaimed. >> >> Right, I understand the distinction in terminology. >> >> What I'm a bit confused by is when Jenny said "I agree the >> ReservePercent=40 is too high, but that should not prevent >> allocating to the old gen. G1 tries to honor ReservePercent." >> Specifically, the "G1 tries to honor ReservePercent". It wasn't >> clear to me whether that implies humongous allocations can look >> for contiguous regions in the reserve, or not. That's what I'm >> hoping to get clarification on since other sources online don't >> mention G1ReservePercent playing a role for HO specifically. >> >> Thanks >> >> >> charlie >> >>> On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich >>> > wrote: >>> >>> Hi Jenny, >>> >>> On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com >>> >> > wrote: >>> >>> Prasanna, >>> >>> In addition to what Vitaly said, I have some comments >>> about your question: >>> >>> 1) Humongus allocation request for 72 mb failed, from >>> the logs we can also see we have free space of around 3 >>> GB. Does this means , our application is encountering >>> high amount of fragmentation ?. >>> >>> It is possible. What it means is g1 can not find 36 >>> consecutive regions for that 72 mb object. >>> >>> I agree the ReservePercent=40 is too high, but that >>> should not prevent allocating to the old gen. G1 tries >>> to honor ReservePercent. >>> >>> So just to clarify - is the space (i.e. regions) reserved by >>> G1ReservePercent allocatable to humongous object >>> allocations? All docs/webpages I found talk about this space >>> being for holding survivors (i.e. evac failure/to-space >>> exhaustion mitigation). It sounds like you're saying these >>> reserved regions should also be used to satisfy HO allocs? >>> >>> Thanks >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Sat Oct 8 18:30:53 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Sat, 8 Oct 2016 14:30:53 -0400 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: <6b567a4d-3c10-a56e-8d58-c6f5d01de29e@oracle.com> References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> <76f94e97-3886-9c1d-65a6-d00dda2d903c@oracle.com> <6b567a4d-3c10-a56e-8d58-c6f5d01de29e@oracle.com> Message-ID: On Friday, October 7, 2016, Gopal, Prasanna CWK < prasanna.gopal at blackrock.com > wrote: > Hi All > > > > > > Thanks for all your reply. These discussions certainly help to get good > insight J. > > > > So just to summarize > > > > 1) G1ReservePercent will not affect Humongus allocation , so the full > GC we are encountering is due to fragmentation > > It may or may not - let's see what a G1 dev says (as Jenny mentioned). Either way, you don't have enough contiguous regions to satisfy the allocation, so it's fragmentation one way or the other. You should drop G1ReservePercent for now unless you have good reason for setting it to 40 (that's a large value, btw). > 2) I will try chaging G1MixedGCLiveThresholdPercent to 85 to see the > mixed GC?s can be increased. > Yes, that's a good idea. Do you see any mixed GCs at all now? If so, how long are the concurrent marking phases taking (look for concurrent-mark-end in the gc log). > 3) Due to some other dependencies , we were unable to move to latest > Jdk?s ( Jdk 8). Our application is currently running with CMS and we are > seeing long GC pause , that why we wanted to explore G1.As we can?t move > Jdk 8 soon , Is it good idea to migrate to G1 with Jdk 7 > Have you tried just setting Xmx and a reasonable pause time goal? As a rule of thumb, setting Xmx to 3x your live set works well (the more headroom you give to G1, the better). Giving it a reasonable pause time goal allows it to adjust young gen dynamically and possibly raising it high enough such that there's either no promotion or very little - young gen collection efficiency is a function of how many survivors you have when the collection kicks in, so the fewer survivors the better (that applies to all generational copying collectors, not just G1 of course). > > > Thanks and regards > > Prasanna > > > *From:* hotspot-gc-use [mailto:hotspot-gc-use-bounces at openjdk.java.net] *On > Behalf Of *Vitaly Davidovich > *Sent:* 07 October 2016 18:27 > *To:* yu.zhang at oracle.com > *Cc:* hotspot-gc-use at openjdk.java.net > *Subject:* Re: G1-GC - Full GC [humongous allocation request failed] > > > > Hi Jenny, > > > > On Fri, Oct 7, 2016 at 1:15 PM, yu.zhang at oracle.com > wrote: > > Hi, Vitaly, > > Here is what happens in jdk9(I think the logic is the same as in jdk8). > > _reserve_regions = reserve percent*regions of the heap > when trying to decide regions for young gen, we look at the free regions > at the end of the collection, and try to honor the reserve_regions > if (available_free_regions > _reserve_regions) { > base_free_regions = available_free_regions - _reserve_regions; > } > > And there are other constrains to consider: user defined constrains and > pause time goal. > > This is what I meant by 'try to honor' the reserved. > If there is enough available_free_regions, it will reserve those regions. > Those regions can be used as old or young. > > Ok, thanks. As you say, G1 *tries* to honor it, but may not. The docs > I've come across online make it sound like this reservation is a guarantee, > or at least they don't stipulate the reservation may not work. I don't > know if it's worth clarifying that point or not, but my vote would be to > make the docs err on the side of "more info" than less. > > > > The second part is what I mentioned to Charlie in my last reply - can > humongous *allocations* be satisfied out of the reserve, or are the > reserved regions only used to hold evacuees (when base_free_regions are not > available). > > > > Thanks > > > Jenny > > > > On 10/07/2016 09:51 AM, Vitaly Davidovich wrote: > > Hi Charlie, > > > > On Fri, Oct 7, 2016 at 12:46 PM, charlie hunt > wrote: > > Hi Vitaly, > > > > Just to clarify things in case there might be some confusion ? one of the > terms in G1 can be a little confusing with a term used in Parallel GC, > Serial GC and CMS GC, and that is ?to-space?. In the latter case, > ?to-space? is a survivor space. In G1, ?to-space? is any space that a G1 is > evacuating objects too. So a ?to-space exhausted? means that during an > evacuation of live objects from a G1 region (which could be an eden region, > survivor region or old region), and there is not an available region to > evacuate those live objects, this constitutes a ?to-space failure?. > > > > I may be wrong, but my understanding is that once a humongous object is > allocated, it is not evacuated. It stays in the same allocated region(s) > until it is marked as being unreachable and can be reclaimed. > > Right, I understand the distinction in terminology. > > > > What I'm a bit confused by is when Jenny said "I agree the > ReservePercent=40 is too high, but that should not prevent allocating to > the old gen. G1 tries to honor ReservePercent." Specifically, the "G1 > tries to honor ReservePercent". It wasn't clear to me whether that implies > humongous allocations can look for contiguous regions in the reserve, or > not. That's what I'm hoping to get clarification on since other sources > online don't mention G1ReservePercent playing a role for HO specifically. > > > > Thanks > > > > charlie > > > > On Oct 7, 2016, at 11:00 AM, Vitaly Davidovich wrote: > > > > Hi Jenny, > > > > On Fri, Oct 7, 2016 at 11:52 AM, yu.zhang at oracle.com > wrote: > > Prasanna, > > In addition to what Vitaly said, I have some comments about your question: > > 1) Humongus allocation request for 72 mb failed, from the logs we can > also see we have free space of around 3 GB. Does this means , our > application is encountering high amount of fragmentation ?. > > It is possible. What it means is g1 can not find 36 consecutive regions > for that 72 mb object. > > I agree the ReservePercent=40 is too high, but that should not prevent > allocating to the old gen. G1 tries to honor ReservePercent. > > So just to clarify - is the space (i.e. regions) reserved by > G1ReservePercent allocatable to humongous object allocations? All > docs/webpages I found talk about this space being for holding survivors > (i.e. evac failure/to-space exhaustion mitigation). It sounds like you're > saying these reserved regions should also be used to satisfy HO allocs? > > > > Thanks > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > > > > > > > > > > This message may contain information that is confidential or privileged. > If you are not the intended recipient, please advise the sender immediately > and delete this message. See http://www.blackrock.com/corpo > rate/en-us/compliance/email-disclaimers for further information. Please > refer to http://www.blackrock.com/corporate/en-us/compliance/privacy- > policy for more information about BlackRock?s Privacy Policy. > > BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) > Limited are authorised and regulated by the Financial Conduct Authority. > Registered in England No. 796793 and No. 2020394 respectively. BlackRock > Life Limited is authorised by the Prudential Regulation Authority and > regulated by the Financial Conduct Authority and the Prudential Regulation > Authority. Registered in England No. 2223202. Registered Offices: 12 > Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is > authorised and regulated by the Financial Conduct Authority and is a > registered investment adviser with the Securities and Exchange Commission > (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange > Place One, 1 Semple Street, Edinburgh EH3 8BL. > > For a list of BlackRock's office addresses worldwide, see > http://www.blackrock.com/corporate/en-us/about-us/contacts-locations. > > ? 2016 BlackRock, Inc. All rights reserved. > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Mon Oct 10 08:12:51 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 10 Oct 2016 10:12:51 +0200 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> <2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com> Message-ID: <1476087171.2652.37.camel@oracle.com> Hi all, On Fri, 2016-10-07 at 13:44 -0400, Vitaly Davidovich wrote: >? >?On Friday, October 7, 2016, charlie hunt >?wrote: >?>?I think others are benefiting from your question(s) ? and it?s >?>?helping refresh my memory of things too. ;-) >?>? >?>?Actually, I just looked at what we documented in Java Performance >?>?Companion for G1ReservePercent, this wording may imply a very >?>?slightly subtle different definition, ?To reduce the risk of >?>?getting a promotion failure, G1 reserves some memory for >?>?promotions. This memory will not?be used for the young >?>?generation.?? >?>? >?>?Perhaps one of the G1 engineers can clarify this? ? the area covered by G1ReservePercent is regular space available for any allocation, whether young or old or humongous. The only difference is that while the heap occupancy is beyond the reserve percent threshold, young gen will be minimal (like bounded by G1NewSizePercent). I.e. G1 will run in some kind of "degraded throughput" mode. "Degraded" as in young gen size is typically somehow correlated with allocation throughput, so if you bound young gen size, you also bound throughput. The thinking for the reserve is to cover for extraneous large allocations (either humongous or just a case where due to application behavior changes lots of young gen objects survive) while G1 is getting liveness information for the reclamation phase (i.e. mixed gc phase). The collector just can't know what is the "maximum" promotion or humongous object allocation rate as it is heavily application dependent. Just assuming the worst case, i.e. G1ReservePercent equals young gen, would be way too wasteful, and at odds with other settings actually - G1 can and will expand young gen to up to 70% if possible. Further, such a heuristic would not capture humongous allocation by the application anyway. Ideally G1ReservePercent and InitiatingHeapOccupancyPercent are tuned so that reclamation starts when occupancy reaches the G1ReservePercent threshold. I.e., some ASCII art: ? ?+--------------------+ ?<-- heap full ^ ?| ? ? ? ? ? ? ? ? ? ?| | ?| 1)G1ReservePercent | | ?| ? ? ? ? ? ? ? ? ? ?| ? ?+--------------------+ ?<-- first mixed gc H ?| ? ? ? ? ? ? ? ? ? ?| e ?| 2)Allocation into ?| a ?| old gen during ? ? | p ?| marking ? ? ? ? ? ?| ? ?| ? ? ? ? ? ? ? ? ? ?| o ?+--------------------+ <-- InitiatingHeapOccupancyPercent c ?| ? ? ? ? ? ? ? ? ? ?| c ?. 3)"Unconstrained" ?. u ?. young gen sizing ? . p ?. operation ? ? ? ? ?. a ?. ? ? ? ? ? ? ? ? ? ?. n ?. ? ? ? ? ? ? ? ? ? ?. c ?. ? ? ? ? ? ? ? ? ? ?. y ?. ? ? ? ? ? ? ? ? ? ?. ? ?+--------------------+ ?<-- heap empty (I am probably forgetting one or the other edge case here, but that's the general idea; also please consider that for G1, except for humongous allocations, the heap does not need to ) So when current young gen size + old gen occupancy is somewhere in areas 2)/3), G1 will expand young gen as it sees fit to meet pause time, i.e. is "unconstrained". If young gen size + old gen occupancy starts eating into area 1), G1 minimizes young gen to try to keep as much memory left for these "extraneous allocations" that G1ReservePercent indicates, in the hope that the IHOP is "soon" kicking in. Until jdk9, G1 assumes that the user gave some sane settings according to (roughly) this model. With jdk9 onwards, the IHOP is determined automatically according to this model and so far seems to work quite nicely - at least it will typically give you a decent starting point for setting it on your own. As for the default value of G1ReservePercent (=10), well, consider it some default for the "typical" application, trying to strike some balance between throughput and safety to prevent running out of memory. For very large heaps, it might typically be set a bit too large as the young gen will most of the times be smaller than 10% of the heap due to pause time constraints (or e.g. G1MaxNewSizePercent) and application specific boundaries like "useful" allocation rates. Setting it to 40% seems a bit too cautious, but may be warranted in some cases. Before JDK9, it may be better to set InitiatingHeapOccupancyPercent properly. For very small heaps?G1ReservePercent?may be too small. (jdk9 specific tip: you can use?G1ReservePercent?to set a maximum IHOP value). Thanks, ? Thomas From thomas.schatzl at oracle.com Mon Oct 10 08:32:39 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 10 Oct 2016 10:32:39 +0200 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> <76f94e97-3886-9c1d-65a6-d00dda2d903c@oracle.com> <6b567a4d-3c10-a56e-8d58-c6f5d01de29e@oracle.com> Message-ID: <1476088359.2652.48.camel@oracle.com> Hi, On Sat, 2016-10-08 at 14:30 -0400, Vitaly Davidovich wrote: > > > On Friday, October 7, 2016, Gopal, Prasanna CWK rock.com> wrote: > > Hi All > > ? > > ? > > Thanks for all your reply. These discussions certainly help to get > > good insight J. > > ? > > So just to summarize > > ? > > 1)???? G1ReservePercent will not affect Humongus allocation , so > > the full GC we are encountering is due to fragmentation > > > It may or may not - let's see what a G1 dev says (as Jenny > mentioned).? Either way, you don't have enough contiguous regions to > satisfy the allocation, so it's fragmentation one way or the other. > > You should drop G1ReservePercent for now unless you have good reason > for setting it to 40 (that's a large value, btw).? See the other email in this thread. > > 2)???? I will try chaging G1MixedGCLiveThresholdPercent to 85 to > > see the mixed GC?s can be increased. > > > Yes, that's a good idea.? Do you see any mixed GCs at all now? If so, > how long are the concurrent marking phases taking (look for > concurrent-mark-end in the gc log). Agree. Making G1 more aggressive with reclaiming old gen regions may help. With 8u40+, some simple, quite effective heuristics were added that in many cases decrease fragmentation a lot. With 7, you can only either give G1 more memory, try to make it more aggressively reclaim regions, or minimize old gen allocations that cause fragmentation. > > 3)???? Due to some other dependencies , we were unable to move to > > latest Jdk?s ( Jdk 8).? Our application is currently running with > > CMS and we are seeing long GC pause , that why we wanted to explore > > G1.As we can?t move Jdk 8 soon , Is it good idea to migrate to G1 > > with Jdk 7?? Please at least move to latest 7u (I remember you mentioning 7u40). There were a few very useful patches for G1 in 7u60 iirc. > Have you tried just setting Xmx and a reasonable pause time goal? > As a rule of thumb, setting Xmx to 3x your live set works? > well(the more headroom you give to G1, the better).? Giving it? > a reasonable pause time goal allows it to adjust young gen? > dynamically and possibly raising it high enough such that there's? > either no promotion or very little - young gen collection efficiency? > is a function of how many survivors you have when the collection? > kicks in, so the fewer survivors the better (that applies? > to all generational?copying collectors, not just G1 of course).? Thanks, ? Thomas From thomas.schatzl at oracle.com Mon Oct 10 08:38:42 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 10 Oct 2016 10:38:42 +0200 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: <1476087171.2652.37.camel@oracle.com> References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> <2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com> <1476087171.2652.37.camel@oracle.com> Message-ID: <1476088722.2652.50.camel@oracle.com> On Mon, 2016-10-10 at 10:12 +0200, Thomas Schatzl wrote: > Hi all, > [...] > > (I am probably forgetting one or the other edge case here, but that's > the general idea; also please consider that for G1, except for > humongous allocations, the heap does not need to ) ... the actually occupied heap area does not need to be contiguous. It's just easier to draw as such :) Thanks, ? Thomas From vitalyd at gmail.com Mon Oct 10 10:42:20 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 10 Oct 2016 06:42:20 -0400 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: <1476087171.2652.37.camel@oracle.com> References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> <2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com> <1476087171.2652.37.camel@oracle.com> Message-ID: Hi Thomas, Thanks for the clarification and insights. A few comments below ... On Monday, October 10, 2016, Thomas Schatzl wrote: > Hi all, > > On Fri, 2016-10-07 at 13:44 -0400, Vitaly Davidovich wrote: > > > > On Friday, October 7, 2016, charlie hunt > > > wrote: > > > I think others are benefiting from your question(s) ? and it?s > > > helping refresh my memory of things too. ;-) > > > > > > Actually, I just looked at what we documented in Java Performance > > > Companion for G1ReservePercent, this wording may imply a very > > > slightly subtle different definition, ?To reduce the risk of > > > getting a promotion failure, G1 reserves some memory for > > > promotions. This memory will not be used for the young > > > generation.? > > > > > > Perhaps one of the G1 engineers can clarify this? > the area covered by G1ReservePercent is regular space available for > any allocation, whether young or old or humongous. > > The only difference is that while the heap occupancy is beyond the > reserve percent threshold, young gen will be minimal (like bounded by > G1NewSizePercent). I.e. G1 will run in some kind of "degraded > throughput" mode. "Degraded" as in young gen size is typically somehow > correlated with allocation throughput, so if you bound young gen size, > you also bound throughput. Ok, so that's a quite different definition of the reserve than pretty much all sources that I've seen :). Your explanation makes it sound like a "yellow zone" for G1, or a throttle/watermark for the young gen sizing. > > The thinking for the reserve is to cover for extraneous large > allocations (either humongous or just a case where due to application > behavior changes lots of young gen objects survive) while G1 is getting > liveness information for the reclamation phase (i.e. mixed gc phase). > The collector just can't know what is the "maximum" promotion or > humongous object allocation rate as it is heavily application > dependent. > Just assuming the worst case, i.e. G1ReservePercent equals young gen, > would be way too wasteful, and at odds with other settings actually - > G1 can and will expand young gen to up to 70% if possible. Further, > such a heuristic would not capture humongous allocation by the > application anyway. > > Ideally G1ReservePercent and InitiatingHeapOccupancyPercent are tuned > so that reclamation starts when occupancy reaches the G1ReservePercent > threshold. I.e., some ASCII art: > > +--------------------+ <-- heap full > ^ | | > | | 1)G1ReservePercent | > | | | > +--------------------+ <-- first mixed gc > H | | > e | 2)Allocation into | > a | old gen during | > p | marking | > | | > o +--------------------+ <-- InitiatingHeapOccupancyPercent > c | | > c . 3)"Unconstrained" . > u . young gen sizing . > p . operation . > a . . > n . . > c . . > y . . > +--------------------+ <-- heap empty > > (I am probably forgetting one or the other edge case here, but that's > the general idea; also please consider that for G1, except for > humongous allocations, the heap does not need to ) > > So when current young gen size + old gen occupancy is somewhere in > areas 2)/3), G1 will expand young gen as it sees fit to meet pause > time, i.e. is "unconstrained". > > If young gen size + old gen occupancy starts eating into area 1), G1 > minimizes young gen to try to keep as much memory left for these > "extraneous allocations" that G1ReservePercent indicates, in the hope > that the IHOP is "soon" kicking in. Until jdk9, G1 assumes that the > user gave some sane settings according to (roughly) this model. > With jdk9 onwards, the IHOP is determined automatically according to > this model and so far seems to work quite nicely - at least it will > typically give you a decent starting point for setting it on your own. Ok, so the reserve acts like a high watermark in 9, used to adjust IHOP dynamically. It sounds like it's an IHOP++ setting :). I'm also not sure winding the young gen down helps in cases where old gen occupancy is growing. Intuitively, that ought to make things worse actually. Young evacs will occur more frequently, with higher likelihood that more objects are still live, and need to be kept alive, possibly causing further promotion. One way that it helps is there's more frequent feedback to G1 about heap occupancy (since young evacs occur more frequently), and so it may notice that things aren't looking so peachy earlier. Is that the idea? > As for the default value of G1ReservePercent (=10), well, consider it > some default for the "typical" application, trying to strike some > balance between throughput and safety to prevent running out of memory. > > For very large heaps, it might typically be set a bit too large as the > young gen will most of the times be smaller than 10% of the heap due to > pause time constraints (or e.g. G1MaxNewSizePercent) and application > specific boundaries like "useful" allocation rates. Setting it to 40% > seems a bit too cautious, but may be warranted in some cases. Before > JDK9, it may be better to set InitiatingHeapOccupancyPercent properly. > > For very small heaps G1ReservePercent may be too small. > > (jdk9 specific tip: you can use G1ReservePercent to set a maximum IHOP > value). > > Thanks, > Thomas > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Mon Oct 10 10:45:35 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 10 Oct 2016 06:45:35 -0400 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: <1476088722.2652.50.camel@oracle.com> References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> <2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com> <1476087171.2652.37.camel@oracle.com> <1476088722.2652.50.camel@oracle.com> Message-ID: On Monday, October 10, 2016, Thomas Schatzl wrote: > > On Mon, 2016-10-10 at 10:12 +0200, Thomas Schatzl wrote: > > Hi all, > > > [...] > > > > (I am probably forgetting one or the other edge case here, but that's > > the general idea; also please consider that for G1, except for > > humongous allocations, the heap does not need to ) > > ... the actually occupied heap area does not need to be contiguous. > It's just easier to draw as such :) Are the reserved regions contiguous or no? Thanks > > Thanks, > Thomas > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Mon Oct 10 11:07:27 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 10 Oct 2016 13:07:27 +0200 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> <2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com> <1476087171.2652.37.camel@oracle.com> <1476088722.2652.50.camel@oracle.com> Message-ID: <1476097647.2652.63.camel@oracle.com> Hi, On Mon, 2016-10-10 at 06:45 -0400, Vitaly Davidovich wrote: > > > On Monday, October 10, 2016, Thomas Schatzl m> wrote: > > > > On Mon, 2016-10-10 at 10:12 +0200, Thomas Schatzl wrote: > > > Hi all, > > > > > [...] > > > > > > (I am probably forgetting one or the other edge case here, but > > that's > > > the general idea; also please consider that for G1, except for > > > humongous allocations, the heap does not need to ) > > > > ... the actually occupied heap area does not need to be contiguous. > > It's just easier to draw as such :) > Are the reserved regions contiguous or no? ? no. You can't guarantee that. Consider long-living allocations into this area. At the moment G1 never moves humongous objects. Thomas From vitalyd at gmail.com Mon Oct 10 11:17:48 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 10 Oct 2016 07:17:48 -0400 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: <1476097647.2652.63.camel@oracle.com> References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> <2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com> <1476087171.2652.37.camel@oracle.com> <1476088722.2652.50.camel@oracle.com> <1476097647.2652.63.camel@oracle.com> Message-ID: On Monday, October 10, 2016, Thomas Schatzl wrote: > Hi, > > On Mon, 2016-10-10 at 06:45 -0400, Vitaly Davidovich wrote: > > > > > > On Monday, October 10, 2016, Thomas Schatzl > > m> wrote: > > > > > > On Mon, 2016-10-10 at 10:12 +0200, Thomas Schatzl wrote: > > > > Hi all, > > > > > > > [...] > > > > > > > > (I am probably forgetting one or the other edge case here, but > > > that's > > > > the general idea; also please consider that for G1, except for > > > > humongous allocations, the heap does not need to ) > > > > > > ... the actually occupied heap area does not need to be contiguous. > > > It's just easier to draw as such :) > > Are the reserved regions contiguous or no? > > no. You can't guarantee that. Consider long-living allocations into > this area. At the moment G1 never moves humongous objects. Yeah, I didn't expect them to be but wanted to clarify/confirm that. This makes them less useful for covering humongous allocations, but I understand the constraints. Also, by considering the reserve as a watermark value, rather than space you really expect to use, I think that's fine. Thanks > > Thomas > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From prasanna.gopal at blackrock.com Mon Oct 10 14:01:22 2016 From: prasanna.gopal at blackrock.com (Gopal, Prasanna CWK) Date: Mon, 10 Oct 2016 14:01:22 +0000 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> <2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com> <1476087171.2652.37.camel@oracle.com> Message-ID: Hi Thomas Thanks for this wonderful explanation for G1ReservePercent parameter. As noted by Vitaly , it is activing as watermark for young generation heap size. So in our case where G1ReservePercent=40, We are effectively asking G1 not to resize (increase) young generation , when we reach 60% of heap occupancy. @All -thanks for your comments. I will adjust the parameters we discussed and publish the outcome. I am going to tune with the following parameters 1) G1ReservePercent -To reduce this value to a reasonable value. As a result , we are allowing G1 to resize young generation size more which can reduce the object promotion rate. 2) G1MixedGCLiveThresholdPercent ? To increase this percent to 85. This will G1 more aggressive by having more mixed GC?s Could someone please explain me how increasing it from 65 ( which is default in JDK 7) to 85 makes G1 to collect more old regions. I would have thought keeping it 65 means , we asking G1 to consider regions above 65% of occupancy which will include regions with 85% as well. Am I missing some thing here ? 3) To override MaxGCPauseMillis to a higher value , to make G1 less aggressive about GC pause time. 4) To move to latest version of JDK , as suggested by everyone. Thanks again for your comments. Really appreciate it. Thanks and Regards Prasanna From: hotspot-gc-use [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of Vitaly Davidovich Sent: 10 October 2016 11:42 To: Thomas Schatzl Cc: hotspot-gc-use at openjdk.java.net Subject: Re: G1-GC - Full GC [humongous allocation request failed] Hi Thomas, Thanks for the clarification and insights. A few comments below ... On Monday, October 10, 2016, Thomas Schatzl > wrote: Hi all, On Fri, 2016-10-07 at 13:44 -0400, Vitaly Davidovich wrote: > > On Friday, October 7, 2016, charlie hunt > > wrote: > > I think others are benefiting from your question(s) ? and it?s > > helping refresh my memory of things too. ;-) > > > > Actually, I just looked at what we documented in Java Performance > > Companion for G1ReservePercent, this wording may imply a very > > slightly subtle different definition, ?To reduce the risk of > > getting a promotion failure, G1 reserves some memory for > > promotions. This memory will not be used for the young > > generation.? > > > > Perhaps one of the G1 engineers can clarify this? the area covered by G1ReservePercent is regular space available for any allocation, whether young or old or humongous. The only difference is that while the heap occupancy is beyond the reserve percent threshold, young gen will be minimal (like bounded by G1NewSizePercent). I.e. G1 will run in some kind of "degraded throughput" mode. "Degraded" as in young gen size is typically somehow correlated with allocation throughput, so if you bound young gen size, you also bound throughput. Ok, so that's a quite different definition of the reserve than pretty much all sources that I've seen :). Your explanation makes it sound like a "yellow zone" for G1, or a throttle/watermark for the young gen sizing. The thinking for the reserve is to cover for extraneous large allocations (either humongous or just a case where due to application behavior changes lots of young gen objects survive) while G1 is getting liveness information for the reclamation phase (i.e. mixed gc phase). The collector just can't know what is the "maximum" promotion or humongous object allocation rate as it is heavily application dependent. Just assuming the worst case, i.e. G1ReservePercent equals young gen, would be way too wasteful, and at odds with other settings actually - G1 can and will expand young gen to up to 70% if possible. Further, such a heuristic would not capture humongous allocation by the application anyway. Ideally G1ReservePercent and InitiatingHeapOccupancyPercent are tuned so that reclamation starts when occupancy reaches the G1ReservePercent threshold. I.e., some ASCII art: +--------------------+ <-- heap full ^ | | | | 1)G1ReservePercent | | | | +--------------------+ <-- first mixed gc H | | e | 2)Allocation into | a | old gen during | p | marking | | | o +--------------------+ <-- InitiatingHeapOccupancyPercent c | | c . 3)"Unconstrained" . u . young gen sizing . p . operation . a . . n . . c . . y . . +--------------------+ <-- heap empty (I am probably forgetting one or the other edge case here, but that's the general idea; also please consider that for G1, except for humongous allocations, the heap does not need to ) So when current young gen size + old gen occupancy is somewhere in areas 2)/3), G1 will expand young gen as it sees fit to meet pause time, i.e. is "unconstrained". If young gen size + old gen occupancy starts eating into area 1), G1 minimizes young gen to try to keep as much memory left for these "extraneous allocations" that G1ReservePercent indicates, in the hope that the IHOP is "soon" kicking in. Until jdk9, G1 assumes that the user gave some sane settings according to (roughly) this model. With jdk9 onwards, the IHOP is determined automatically according to this model and so far seems to work quite nicely - at least it will typically give you a decent starting point for setting it on your own. Ok, so the reserve acts like a high watermark in 9, used to adjust IHOP dynamically. It sounds like it's an IHOP++ setting :). I'm also not sure winding the young gen down helps in cases where old gen occupancy is growing. Intuitively, that ought to make things worse actually. Young evacs will occur more frequently, with higher likelihood that more objects are still live, and need to be kept alive, possibly causing further promotion. One way that it helps is there's more frequent feedback to G1 about heap occupancy (since young evacs occur more frequently), and so it may notice that things aren't looking so peachy earlier. Is that the idea? As for the default value of G1ReservePercent (=10), well, consider it some default for the "typical" application, trying to strike some balance between throughput and safety to prevent running out of memory. For very large heaps, it might typically be set a bit too large as the young gen will most of the times be smaller than 10% of the heap due to pause time constraints (or e.g. G1MaxNewSizePercent) and application specific boundaries like "useful" allocation rates. Setting it to 40% seems a bit too cautious, but may be warranted in some cases. Before JDK9, it may be better to set InitiatingHeapOccupancyPercent properly. For very small heaps G1ReservePercent may be too small. (jdk9 specific tip: you can use G1ReservePercent to set a maximum IHOP value). Thanks, Thomas -- Sent from my phone This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information. Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock?s Privacy Policy. BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) Limited are authorised and regulated by the Financial Conduct Authority. Registered in England No. 796793 and No. 2020394 respectively. BlackRock Life Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Registered in England No. 2223202. Registered Offices: 12 Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is authorised and regulated by the Financial Conduct Authority and is a registered investment adviser with the Securities and Exchange Commission (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange Place One, 1 Semple Street, Edinburgh EH3 8BL. For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations. ? 2016 BlackRock, Inc. All rights reserved. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Mon Oct 10 22:28:07 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 10 Oct 2016 18:28:07 -0400 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> <2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com> <1476087171.2652.37.camel@oracle.com> Message-ID: On Monday, October 10, 2016, Gopal, Prasanna CWK < prasanna.gopal at blackrock.com> wrote: > Hi Thomas > > > > Thanks for this wonderful explanation for G1ReservePercent parameter. As > noted by Vitaly , it is activing as watermark for young generation heap > size. So in our case where G1ReservePercent=40, We are effectively > asking G1 not to resize (increase) young generation , when we reach 60% of > heap occupancy. > > > > @All -thanks for your comments. I will adjust the parameters we discussed > and publish the outcome. I am going to tune with the following parameters > > > > 1) G1ReservePercent -To reduce this value to a reasonable value. As > a result , we are allowing G1 to resize young generation size more which > can reduce the object promotion rate. > > > > 2) G1MixedGCLiveThresholdPercent ? To increase this percent to 85. > This will G1 more aggressive by having more mixed GC?s > > Could someone please explain me how increasing it from 65 ( > which is default in JDK 7) to 85 makes G1 to collect more old regions. I > would have thought keeping it 65 means , we asking G1 to consider > > regions above 65% of occupancy which will include regions with 85% as > well. Am I missing some thing here ? > > This value says "what max liveness does an old region need to have to be considered for mixed collections". In other words, a value of 65 means a region must have liveness of 65 or less to be considered. Put another way, if garbage is 35%+ it's a candidate. When you set it to 85%, a region can be "more live"/less garbage (i.e. 15%+ garbage) and still be eligible. > > > 3) To override MaxGCPauseMillis to a higher value , to make G1 less > aggressive about GC pause time. > > > > 4) To move to latest version of JDK , as suggested by everyone. > > > > Thanks again for your comments. Really appreciate it. > > > > Thanks and Regards > > Prasanna > > > > > > *From:* hotspot-gc-use [mailto:hotspot-gc-use-bounces at openjdk.java.net > ] > *On Behalf Of *Vitaly Davidovich > *Sent:* 10 October 2016 11:42 > *To:* Thomas Schatzl > > *Cc:* hotspot-gc-use at openjdk.java.net > > *Subject:* Re: G1-GC - Full GC [humongous allocation request failed] > > > > Hi Thomas, > > > > Thanks for the clarification and insights. A few comments below ... > > On Monday, October 10, 2016, Thomas Schatzl > wrote: > > Hi all, > > On Fri, 2016-10-07 at 13:44 -0400, Vitaly Davidovich wrote: > > > > On Friday, October 7, 2016, charlie hunt > > wrote: > > > I think others are benefiting from your question(s) ? and it?s > > > helping refresh my memory of things too. ;-) > > > > > > Actually, I just looked at what we documented in Java Performance > > > Companion for G1ReservePercent, this wording may imply a very > > > slightly subtle different definition, ?To reduce the risk of > > > getting a promotion failure, G1 reserves some memory for > > > promotions. This memory will not be used for the young > > > generation.? > > > > > > Perhaps one of the G1 engineers can clarify this? > the area covered by G1ReservePercent is regular space available for > any allocation, whether young or old or humongous. > > The only difference is that while the heap occupancy is beyond the > reserve percent threshold, young gen will be minimal (like bounded by > G1NewSizePercent). I.e. G1 will run in some kind of "degraded > throughput" mode. "Degraded" as in young gen size is typically somehow > correlated with allocation throughput, so if you bound young gen size, > you also bound throughput. > > Ok, so that's a quite different definition of the reserve than pretty much > all sources that I've seen :). Your explanation makes it sound like a > "yellow zone" for G1, or a throttle/watermark for the young gen sizing. > > > The thinking for the reserve is to cover for extraneous large > allocations (either humongous or just a case where due to application > behavior changes lots of young gen objects survive) while G1 is getting > liveness information for the reclamation phase (i.e. mixed gc phase). > > > The collector just can't know what is the "maximum" promotion or > humongous object allocation rate as it is heavily application > dependent. > Just assuming the worst case, i.e. G1ReservePercent equals young gen, > would be way too wasteful, and at odds with other settings actually - > G1 can and will expand young gen to up to 70% if possible. Further, > such a heuristic would not capture humongous allocation by the > application anyway. > > Ideally G1ReservePercent and InitiatingHeapOccupancyPercent are tuned > so that reclamation starts when occupancy reaches the G1ReservePercent > threshold. I.e., some ASCII art: > > +--------------------+ <-- heap full > ^ | | > | | 1)G1ReservePercent | > | | | > +--------------------+ <-- first mixed gc > H | | > e | 2)Allocation into | > a | old gen during | > p | marking | > | | > o +--------------------+ <-- InitiatingHeapOccupancyPercent > c | | > c . 3)"Unconstrained" . > u . young gen sizing . > p . operation . > a . . > n . . > c . . > y . . > +--------------------+ <-- heap empty > > (I am probably forgetting one or the other edge case here, but that's > the general idea; also please consider that for G1, except for > humongous allocations, the heap does not need to ) > > So when current young gen size + old gen occupancy is somewhere in > areas 2)/3), G1 will expand young gen as it sees fit to meet pause > time, i.e. is "unconstrained". > > If young gen size + old gen occupancy starts eating into area 1), G1 > minimizes young gen to try to keep as much memory left for these > "extraneous allocations" that G1ReservePercent indicates, in the hope > that the IHOP is "soon" kicking in. Until jdk9, G1 assumes that the > user gave some sane settings according to (roughly) this model. > With jdk9 onwards, the IHOP is determined automatically according to > this model and so far seems to work quite nicely - at least it will > typically give you a decent starting point for setting it on your own. > > Ok, so the reserve acts like a high watermark in 9, used to adjust IHOP > dynamically. It sounds like it's an IHOP++ setting :). > > > > I'm also not sure winding the young gen down helps in cases where old gen > occupancy is growing. Intuitively, that ought to make things worse > actually. Young evacs will occur more frequently, with higher likelihood > that more objects are still live, and need to be kept alive, possibly > causing further promotion. > > > > One way that it helps is there's more frequent feedback to G1 about heap > occupancy (since young evacs occur more frequently), and so it may notice > that things aren't looking so peachy earlier. Is that the idea? > > > > > As for the default value of G1ReservePercent (=10), well, consider it > some default for the "typical" application, trying to strike some > balance between throughput and safety to prevent running out of memory. > > For very large heaps, it might typically be set a bit too large as the > young gen will most of the times be smaller than 10% of the heap due to > pause time constraints (or e.g. G1MaxNewSizePercent) and application > specific boundaries like "useful" allocation rates. Setting it to 40% > seems a bit too cautious, but may be warranted in some cases. Before > JDK9, it may be better to set InitiatingHeapOccupancyPercent properly. > > For very small heaps G1ReservePercent may be too small. > > (jdk9 specific tip: you can use G1ReservePercent to set a maximum IHOP > value). > > Thanks, > Thomas > > > > -- > Sent from my phone > > > > This message may contain information that is confidential or privileged. > If you are not the intended recipient, please advise the sender immediately > and delete this message. See http://www.blackrock.com/ > corporate/en-us/compliance/email-disclaimers for further information. > Please refer to http://www.blackrock.com/corporate/en-us/compliance/ > privacy-policy for more information about BlackRock?s Privacy Policy. > > BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) > Limited are authorised and regulated by the Financial Conduct Authority. > Registered in England No. 796793 and No. 2020394 respectively. BlackRock > Life Limited is authorised by the Prudential Regulation Authority and > regulated by the Financial Conduct Authority and the Prudential Regulation > Authority. Registered in England No. 2223202. Registered Offices: 12 > Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is > authorised and regulated by the Financial Conduct Authority and is a > registered investment adviser with the Securities and Exchange Commission > (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange > Place One, 1 Semple Street, Edinburgh EH3 8BL. > > For a list of BlackRock's office addresses worldwide, see > http://www.blackrock.com/corporate/en-us/about-us/contacts-locations. > > ? 2016 BlackRock, Inc. All rights reserved. > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Tue Oct 11 06:55:26 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 11 Oct 2016 08:55:26 +0200 Subject: G1-GC - Full GC [humongous allocation request failed] In-Reply-To: References: <6aa7148d-cd09-6929-ee88-4f7a467382dc@oracle.com> <70A349A7-9A8B-4667-A301-9A3603D9AFD1@oracle.com> <2DBC61B8-BD6F-45F7-B3F1-5951993C3885@oracle.com> <1476087171.2652.37.camel@oracle.com> Message-ID: <1476168926.2502.3.camel@oracle.com> Hi Vitaly, On Mon, 2016-10-10 at 06:42 -0400, Vitaly Davidovich wrote: > Hi Thomas, > > Thanks for the clarification and insights.? A few comments below ... > > On Monday, October 10, 2016, Thomas Schatzl m> wrote: > > Hi all, > > > > On Fri, 2016-10-07 at 13:44 -0400, Vitaly Davidovich wrote: > > >? > > >?On Friday, October 7, 2016, charlie hunt > > > > >?wrote: > > >?>?I think others are benefiting from your question(s) ? and it?s > > >?>?helping refresh my memory of things too. ;-) > > >?>? > > >?>?Actually, I just looked at what we documented in Java > > > > Performance > > >?>?Companion for G1ReservePercent, this wording may imply a very > > >?>?slightly subtle different definition, ?To reduce the risk of > > >?>?getting a promotion failure, G1 reserves some memory for > > >?>?promotions. This memory will not?be used for the young > > >?>?generation.?? > > >?>? > > >?>?Perhaps one of the G1 engineers can clarify this? > > > > ? the area covered by G1ReservePercent is regular space available > > for any allocation, whether young or old or humongous. > > > > The only difference is that while the heap occupancy is beyond the > > reserve percent threshold, young gen will be minimal (like bounded > > by G1NewSizePercent). I.e. G1 will run in some kind of "degraded > > throughput" mode. "Degraded" as in young gen size is typically > > somehow correlated with allocation throughput, so if you bound > > young gen size, you also bound throughput. > > Ok, so that's a quite different definition of the reserve than pretty > much all sources that I've seen :).? Your explanation makes it sound > like a "yellow zone" for G1, or a throttle/watermark?for the young > gen sizing. I described the effect it has. It should be considered a reserve for unexpected promotions/allocations only, and in general is an area to not allocate into. [...] > > > If young gen size + old gen occupancy starts eating into area 1), > > G1 minimizes young gen to try to keep as much memory left for these > > "extraneous allocations" that G1ReservePercent indicates, in the? > > hope that the IHOP is "soon" kicking in. Until jdk9, G1 assumes? > > that the user gave some sane settings according to (roughly) this? > > model. > > With jdk9 onwards, the IHOP is determined automatically according? > > to this model and so far seems to work quite nicely - at least it? > > will typically give you a decent starting point for setting it on? > > your own. > Ok, so the reserve acts like a high watermark in 9, used to adjust > IHOP dynamically.? It sounds like it's an IHOP++ setting :). ;) In the cases I have seen so far, the adaptive IHOP mechanism makes sure that you don't get into this situation to actually use the reserve at all - unless the application behavior changes a lot over time to avoid full gc. I am sure you can find situations where it fails of course. It is just another heuristic. > I'm also not sure winding the young gen down helps in cases where old > gen occupancy is growing.? Intuitively, that ought to make things > worse actually.? Young evacs will occur more frequently, with higher > likelihood that more objects are still live, and need to be kept > alive, possibly causing further promotion. That depends on the application. Some applications work this way, many others don't, at least beyond a certain threshold of young gen size. There is the option to set G1ReservePercent to zero and set IHOP manually, then the upper bound for eden is just the remaining memory in this situation. You could set the "confidence" the adaptive IHOP has too to get some extra slack. > One way that it helps is there's more frequent feedback to G1 about > heap occupancy (since young evacs occur more frequently), and so it > may notice that things aren't looking so peachy earlier.? Is that the > idea? There is the reason you suggest, i.e. to make sure that the IHOP is checked more frequently to start marking as soon as its threshold is crossed (if it has not been). There may be others. As for the impact, I am not sure, considering that the main problem at this point is that when getting close to G1ReserverPercent of remaining space, you are close to getting to the end of available space! I.e. consider looking at the defaults of G1ReservePercent of 10, this is not a lot compared to defaults for G1NewSizePercent and G1MaxNewSizePercent of 5 and 60 respectively. To use all remaining memory for eden at this point is for obvious reasons not an excellent idea... Now you might argue that maybe one should start marking to make sure that you can use maximum eden at all times. That can be done (set IHOP manually), but the problem is that this likely causes more frequent concurrent marking, that also need CPU resources. Somewhat shorter?gc pause intervals typically do not affect total throughput as much. Additionally, allowing old gen objects to die for longer typically pays off a lot, i.e. makes the mixed gc phase shorter. If you feel that above paragraph is a bit too hand-wavy here: basically, GC heuristics interact in sometimes counter-intuitive ways with the application.? Thanks, ? Thomas From jun.zhuang at hobsons.com Tue Oct 11 18:25:29 2016 From: jun.zhuang at hobsons.com (Jun Zhuang) Date: Tue, 11 Oct 2016 18:25:29 +0000 Subject: About the finalization queue and reference queue Message-ID: Hi, While reading about re-defining the finalize() method explicitly in a class I came across some statements and like to get some clarification from the experts. On http://www.fasterj.com/articles/finalizer2.shtml, the author states that "the GC adds each of those Finalizer objects to the reference queue at java.lang.ref.Finalizer.ReferenceQueue.". Based on this the Finalizer object associated with the finalizeable object goes on the reference queue. On page 311 of book Service-Oriented Computing - ICSOC 2011 Workshops "... all those objects that have a finalize () method and are found to be unreachable(dead) by garbage collector, are pushed into a finalization queue.". So the finalizeable object goes on the finalization queue. Then this site, https://yourkit.com/forum/viewtopic.php?f=3&t=4672, states that "Objects of all classes with redefined finalize() method are added to a queue at the moment of creation. The queue head is referenced from a static field in java.lang.ref.Finalizer. An instance of Finalizer is created for each "finalizeable" object and is stored in that queue, which is in fact a linked list of Finalizers.", so both the finalizeable object and the associated Finalizer object are stored in the same queue? So my questions are: Are there one or two queues involved? Exactly how object finalization works? Appreciate your input, Jun Jun Zhuang Sr. Performance QA Engineer | Hobsons T: +1 513 746 2288 | jun.zhuang at hobsons.com 50 E-Business Way, Suite 300 | Cincinnati, OH 45241 | USA Upgraded by Hobsons - Subscribe Today -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image755000.png Type: image/png Size: 13602 bytes Desc: image755000.png URL: From ecki at zusammenkunft.net Tue Oct 11 19:05:49 2016 From: ecki at zusammenkunft.net (Bernd Eckenfels) Date: Tue, 11 Oct 2016 21:05:49 +0200 Subject: About the finalization queue and reference queue In-Reply-To: References: Message-ID: <20161011210549.00006b33.ecki@zusammenkunft.net> Hello, what is interesting to know is, that each finalizeable object which is tracked is wrapped/tracked with an instance of j.lang.ref.Finalizer (which is a FinalizerReference subclass i.e. a final reference). Generally for references to work you need to keep alive the Reference instance. The finalizer does this with a built-in linked list in the Finalizer instance (static unfinalized points to the head of the list and each Finalizer object has a next/prev pointer. When the VM tracks a finalizeable object it calls the Finalize(Object) constructor which makes sure to add it. So if you have thousands of finalized objects there are all indirectly referenced from this long linear list. When the GC does its thing and a instance becomes unreachable, it will add it to the ReferenceQueue of the finalizer. The FinalizerThread will consume it from there and make sure to finally also remove the Finalizer reference for that object from the double linked list. If you work with heap-dumps and analyse memeory leak informations you need to ignore the memeory consumotion under "Finalizer.unfinalized" with the (possibly long if the Finalizer thread is blocked and unwanted) ReferenceQueue Finalizer#queue#head single linked list. Gruss Bernd Am Tue, 11 Oct 2016 18:25:29 +0000 schrieb Jun Zhuang : > Hi, > > While reading about re-defining the finalize() method explicitly in a > class I came across some statements and like to get some > clarification from the experts. > > On http://www.fasterj.com/articles/finalizer2.shtml, the author > states that "the GC adds each of those Finalizer objects to the > reference queue at java.lang.ref.Finalizer.ReferenceQueue.". Based on > this the Finalizer object associated with the finalizeable object > goes on the reference queue. > > On page 311 of book Service-Oriented Computing - ICSOC 2011 > Workshops > "... all those objects that have a finalize () method and are found > to be unreachable(dead) by garbage collector, are pushed into a > finalization queue.". So the finalizeable object goes on the > finalization queue. > > Then this site, https://yourkit.com/forum/viewtopic.php?f=3&t=4672, > states that "Objects of all classes with redefined finalize() method > are added to a queue at the moment of creation. The queue head is > referenced from a static field in java.lang.ref.Finalizer. An > instance of Finalizer is created for each "finalizeable" object and > is stored in that queue, which is in fact a linked list of > Finalizers.", so both the finalizeable object and the associated > Finalizer object are stored in the same queue? > > So my questions are: Are there one or two queues involved? Exactly > how object finalization works? > > > Appreciate your input, > Jun > > Jun Zhuang > Sr. Performance QA Engineer | > Hobsons > T: +1 513 746 2288 | jun.zhuang at hobsons.com 50 E-Business Way, Suite > 300 | Cincinnati, OH 45241 | USA > > > Upgraded by Hobsons - Subscribe Today > From jun.zhuang at hobsons.com Tue Oct 11 20:55:54 2016 From: jun.zhuang at hobsons.com (Jun Zhuang) Date: Tue, 11 Oct 2016 20:55:54 +0000 Subject: About the finalization queue and reference queue In-Reply-To: <20161011210549.00006b33.ecki@zusammenkunft.net> References: <20161011210549.00006b33.ecki@zusammenkunft.net> Message-ID: Hi Bernd, Appreciate your quick response. I understand following: * A Finalizer instance is created for every finalizeable object * All the Finalizer instances are linked together using a double linked list * All the Finalizer instances are tracked by the java.lang.ref.Finalizer class. Or is it only the first one by the unfinalized field? What I am still not clear are: 1. Is there a finalization queue at all? If so, what does it do? 2. What goes into the ReferenceQueue? The finalizeable objects? Thanks a lot, Jun -----Original Message----- From: Bernd Eckenfels [mailto:ecki at zusammenkunft.net] Sent: Tuesday, October 11, 2016 3:06 PM To: hotspot-gc-use at openjdk.java.net Cc: Jun Zhuang Subject: Re: About the finalization queue and reference queue Hello, what is interesting to know is, that each finalizeable object which is tracked is wrapped/tracked with an instance of j.lang.ref.Finalizer (which is a FinalizerReference subclass i.e. a final reference). Generally for references to work you need to keep alive the Reference instance. The finalizer does this with a built-in linked list in the Finalizer instance (static unfinalized points to the head of the list and each Finalizer object has a next/prev pointer. When the VM tracks a finalizeable object it calls the Finalize(Object) constructor which makes sure to add it. So if you have thousands of finalized objects there are all indirectly referenced from this long linear list. When the GC does its thing and a instance becomes unreachable, it will add it to the ReferenceQueue of the finalizer. The FinalizerThread will consume it from there and make sure to finally also remove the Finalizer reference for that object from the double linked list. If you work with heap-dumps and analyse memeory leak informations you need to ignore the memeory consumotion under "Finalizer.unfinalized" with the (possibly long if the Finalizer thread is blocked and unwanted) ReferenceQueue Finalizer#queue#head single linked list. Gruss Bernd Am Tue, 11 Oct 2016 18:25:29 +0000 schrieb Jun Zhuang >: > Hi, > > While reading about re-defining the finalize() method explicitly in a > class I came across some statements and like to get some clarification > from the experts. > > On http://www.fasterj.com/articles/finalizer2.shtml, the author states > that "the GC adds each of those Finalizer objects to the reference > queue at java.lang.ref.Finalizer.ReferenceQueue.". Based on this the > Finalizer object associated with the finalizeable object goes on the > reference queue. > > On page 311 of book Service-Oriented Computing - ICSOC 2011 > Workshops PA311&dq=java,+finalization+queue&source=bl&ots=LLGiYGWh0L&sig=Glvf0kn > 0zKHrdWoPzM6y6wtsr_M&hl=en&sa=X&ved=0ahUKEwjHrsL2pdPPAhUk3IMKHdXRCtc4C > hDoAQgbMAA#v=onepage&q=finalization%20queue&f=false> > "... all those objects that have a finalize () method and are found to > be unreachable(dead) by garbage collector, are pushed into a > finalization queue.". So the finalizeable object goes on the > finalization queue. > > Then this site, https://yourkit.com/forum/viewtopic.php?f=3&t=4672, > states that "Objects of all classes with redefined finalize() method > are added to a queue at the moment of creation. The queue head is > referenced from a static field in java.lang.ref.Finalizer. An instance > of Finalizer is created for each "finalizeable" object and is stored > in that queue, which is in fact a linked list of Finalizers.", so both > the finalizeable object and the associated Finalizer object are stored > in the same queue? > > So my questions are: Are there one or two queues involved? Exactly how > object finalization works? > > > Appreciate your input, > Jun > > Jun Zhuang > Sr. Performance QA Engineer | > Hobsons tm_campaign=banner_02.12.16_general> > T: +1 513 746 2288 | jun.zhuang at hobsons.com 50 E-Business Way, Suite > 300 | Cincinnati, OH 45241 | USA > > > Upgraded by Hobsons - Subscribe Today > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecki at zusammenkunft.net Tue Oct 11 21:25:51 2016 From: ecki at zusammenkunft.net (Bernd Eckenfels) Date: Tue, 11 Oct 2016 23:25:51 +0200 Subject: About the finalization queue and reference queue In-Reply-To: References: <20161011210549.00006b33.ecki@zusammenkunft.net> Message-ID: <20161011232551.00005ac8.ecki@zusammenkunft.net> Hello, Am Tue, 11 Oct 2016 20:55:54 +0000 schrieb Jun Zhuang : > * A Finalizer instance is created for every finalizeable > object Yes, the java.lang.ref.Finalizer.Finalizer(Object) constructor will put the referent in the referent` field of Finalizer (declared in parent class Referent) and then link this instance with add() method at the head of the list. The head is kept alive by Finalizer#unfinalized (which is static). This constructor is called (via Finalizer#register(Object)) by the JVM when it creates a new finalizeable object. > * All the Finalizer instances are linked together using a > double linked list Yes, they dont use a LikedList class but implement the next/prev fields themself (so this needs no additional instances). > * All the Finalizer instances are tracked by the > java.lang.ref.Finalizer class. Or is it only the first one by the > unfinalized field? The head of the linked list is referenced by `unfinalized field. It points to a Finalizer instance, which points which the next field to the next Finalized and so on (and each of them has a referee). > What I am still not clear are: > > 1. Is there a finalization queue at all? If so, what does it do? There is a ReferenceQueue in the static field queue. This queue is given to all Finalize instances (the field "queue" in Reference holds thatuntil needed). The objets are enqueued to this queue by the GC. The Finalizer thread reads them from this queue and knows "this one is now not reachable anymore" and does its work (and removes it from the queue and also remove the Finalize wrapper from its linked list). > 2. What goes into the ReferenceQueue? The finalizeable objects? References go to the ReferenceQueue, the wrapper Finalizer instance is also a (Final)Reference. In case of FinalReference the queue can look at the Finalize#get() to get the referent and call finalize() on this. Same mechanism asdone for allother Reference types (with the exception of phantom references which do not allow this get). The Finalize class is a real beast: - instances are a Reference wrapping the finalizeable objects - instances form a double linked list of all Finalize instances - the class itself holds the head of the list and the queue alive in statics - the FinalizerThread (which removes Finalizer instances from the ReferenceQueue and invokes the finalizer method on it (once) is a inner class: Finalizer$FinalizerThread - the static initializer of Finalizer actually starts the FinalizerThread (as a daemon with MAX-2 prio). - the static field lock in Finalizer is used to synchronize the daemon thread with secondary finalizer threads (from runAllFinalizers() on shutdown or Runtime.runFinalizazion()) Lots of the finalization logic is done in Java, only the register() and Reference discovery is done by the runtime/gc. You can see that here, note especially the "Invoked by" comments (and parents): http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/2bf254421854/src/java.base/share/classes/java/lang/ref/Finalizer.java Bernd From inurislamov at getintent.com Wed Oct 12 08:39:04 2016 From: inurislamov at getintent.com (Ildar Nurislamov) Date: Wed, 12 Oct 2016 11:39:04 +0300 Subject: JDK-8166500 Adaptive sizing for IHOP causes excessively long mixed GC pauses In-Reply-To: References: <273BC628-AC88-4E21-AB27-32AE2021B8FA@getintent.com> <1475048227.4430.4.camel@oracle.com> Message-ID: <2CA764D0-CCB4-4FA5-9A5A-B005688BD16D@getintent.com> Hi Thomas, It was too early to make conclusions. After some prolonged testing i've noticed that more thorough tuning may be required to avoid this issue completely. And -XX:-G1UseAdaptiveIHOP not always enough too. What bothers me is the steep jump in time required between the last Mixed GC and the previous: In 9th it took 129.8ms to evacuate 104 old region: [64394.771s][info][gc,phases ] GC(38781) Evacuate Collection Set: 129.8ms [64394.771s][info][gc,phases ] GC(38781) Code Roots: 0.0ms [64394.771s][info][gc,phases ] GC(38781) Clear Card Table: 3.4ms [64394.771s][info][gc,phases ] GC(38781) Expand Heap After Collection: 0.0ms [64394.771s][info][gc,phases ] GC(38781) Free Collection Set: 3.9ms [64394.771s][info][gc,phases ] GC(38781) Merge Per-Thread State: 0.1ms [64394.771s][info][gc,phases ] GC(38781) Other: 13.8ms [64394.771s][info][gc,heap ] GC(38781) Eden regions: 37->0(37) [64394.771s][info][gc,heap ] GC(38781) Survivor regions: 3->3(5) [64394.771s][info][gc,heap ] GC(38781) Old regions: 457->353 [64394.771s][info][gc,heap ] GC(38781) Humongous regions: 3->3 [64394.771s][info][gc,metaspace ] GC(38781) Metaspace: 70587K->70587K(83968K) [64394.771s][info][gc ] GC(38781) Pause Mixed (G1 Evacuation Pause) 15972M->11457M(65536M) (64394.620s, 64394.771s) 150.931ms While in 10th (the last) it took 3401.3ms to evacuate 87: [64398.393s][info][gc,phases ] GC(38782) Evacuate Collection Set: 3401.3ms [64398.393s][info][gc,phases ] GC(38782) Code Roots: 0.0ms [64398.393s][info][gc,phases ] GC(38782) Clear Card Table: 2.8ms [64398.393s][info][gc,phases ] GC(38782) Expand Heap After Collection: 0.0ms [64398.393s][info][gc,phases ] GC(38782) Free Collection Set: 4.3ms [64398.393s][info][gc,phases ] GC(38782) Merge Per-Thread State: 0.1ms [64398.393s][info][gc,phases ] GC(38782) Other: 12.2ms [64398.393s][info][gc,heap ] GC(38782) Eden regions: 37->0(37) [64398.393s][info][gc,heap ] GC(38782) Survivor regions: 3->3(5) [64398.393s][info][gc,heap ] GC(38782) Old regions: 353->266 [64398.393s][info][gc,heap ] GC(38782) Humongous regions: 3->3 [64398.393s][info][gc,metaspace ] GC(38782) Metaspace: 70587K->70587K(83968K) [64398.393s][info][gc ] GC(38782) Pause Mixed (G1 Evacuation Pause) 12641M->8678M(65536M) (64394.973s, 64398.393s) 3420.666ms It looks like at average old regions in 10th Mixed GC were 31.5 times more expensive than in 9th and it took 39ms to collect just one region. Does it make sense? To what extent one old region may be more expensive than another? I wish G1Ergonomics similar to "reason: predicted time is too high" but for order of magnitude jump cases worked here even when min old regions number has not been reached. We didn't spend all XX:G1MixedGCCountTarget=12 yet here. Log file: https://www.dropbox.com/s/ubpkosh0a8tomss/jdk9_135_tuned_11_10_16.log.zip?dl=0 Sadly with no ergonomics. Next thing i'm going to try is adjusting XX:G1MixedGCLiveThresholdPercent. Thank you! -- Ildar Nurislamov GetIntent, AdServer Team Leader > On Sep 29, 2016, at 13:42, Ildar Nurislamov wrote: > > Hi Thomas, > > Thank you for really helpful advices. > > I have performed 8-hour testing with: > -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=2 -XX:G1HeapWastePercent=10 -XX:G1MixedGCCountTarget=12 > and they improved situation for both 8u and 9ea. > Longest pause on 9ea now is 400ms with Adaptive sizing for IHOP > > I will continue testing and report if anything interesting pops out. > > -- > Ildar Nurislamov > GetIntent, AdServer Team Leader > >> On Sep 28, 2016, at 10:37, Thomas Schatzl > wrote: >> >> Hi Ildar, >> >> On Fri, 2016-09-23 at 12:40 +0300, Ildar Nurislamov wrote: >>> Hi Thomas Schatzl! >>> >>> Thank you for such prompt responses. >>> I'm going to try you advices and send results next week. >>> >>> Here are log files you have asked about: >>> https://www.dropbox.com/s/i9o4nuuh5gpsf1y/9noaihop_07_09_16.log.zip?d >>> l=0 >>> https://www.dropbox.com/s/xa3cfezvlqwwh6v/8u_log_07_09_16.log.zip?dl= >>> 0 >>> >> >> thanks a lot for the logs. As you may have noticed I closed JDK- >> 8166500 as duplicate of the existing JDK-8159697 issue. They are the >> same after all. >> We will continue working on improving out-of-box experience of G1. :) >> >> As hypothesized in the text for JDK-8166500, the 8u and 9-without-aihop >> show the same general issue. The suggested tunings should improve mixed >> gc times for now. >> >> Thanks, >> Thomas >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Wed Oct 12 12:51:46 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 12 Oct 2016 14:51:46 +0200 Subject: JDK-8166500 Adaptive sizing for IHOP causes excessively long mixed GC pauses In-Reply-To: <2CA764D0-CCB4-4FA5-9A5A-B005688BD16D@getintent.com> References: <273BC628-AC88-4E21-AB27-32AE2021B8FA@getintent.com> <1475048227.4430.4.camel@oracle.com> <2CA764D0-CCB4-4FA5-9A5A-B005688BD16D@getintent.com> Message-ID: <1476276706.2632.92.camel@oracle.com> Hi, On Wed, 2016-10-12 at 11:39 +0300, Ildar Nurislamov wrote: > Hi Thomas, > > It was too early to make conclusions. > After some prolonged testing i've noticed that more thorough tuning > may be required to avoid this issue completely.? > And -XX:-G1UseAdaptiveIHOP not always enough too.? > > What bothers me is the steep jump in time required between the last > Mixed GC and the previous: > In 9th it took 129.8ms to evacuate 104 old region: > [64394.771s][info][gc,phases ? ? ] GC(38781) ? Evacuate Collection > Set: 129.8ms > [64394.771s][info][gc,phases ? ? ] GC(38781) ? Code Roots: 0.0ms > [64394.771s][info][gc,phases ? ? ] GC(38781) ? Clear Card Table: > 3.4ms > [64394.771s][info][gc,phases ? ? ] GC(38781) ? Expand Heap After > Collection: 0.0ms > [64394.771s][info][gc,phases ? ? ] GC(38781) ? Free Collection Set: > 3.9ms > [64394.771s][info][gc,phases ? ? ] GC(38781) ? Merge Per-Thread > State: 0.1ms > [64394.771s][info][gc,phases ? ? ] GC(38781) ? Other: 13.8ms > [64394.771s][info][gc,heap ? ? ? ] GC(38781) Eden regions: 37->0(37) > [64394.771s][info][gc,heap ? ? ? ] GC(38781) Survivor regions: 3- > >3(5) > [64394.771s][info][gc,heap ? ? ? ] GC(38781) Old regions: 457->353 > [64394.771s][info][gc,heap ? ? ? ] GC(38781) Humongous regions: 3->3 > [64394.771s][info][gc,metaspace? ] GC(38781) Metaspace: 70587K- > >70587K(83968K) > [64394.771s][info][gc? ? ? ? ? ? ] GC(38781) Pause Mixed (G1 > Evacuation Pause) 15972M->11457M(65536M) (64394.620s, 64394.771s) > 150.931ms > > While in 10th (the last) it took?3401.3ms to evacuate 87: > [64398.393s][info][gc,phases ? ? ] GC(38782) ? Evacuate Collection > Set: 3401.3ms > [64398.393s][info][gc,phases ? ? ] GC(38782) ? Code Roots: 0.0ms > [64398.393s][info][gc,phases ? ? ] GC(38782) ? Clear Card Table: > 2.8ms > [64398.393s][info][gc,phases ? ? ] GC(38782) ? Expand Heap After > Collection: 0.0ms > [64398.393s][info][gc,phases ? ? ] GC(38782) ? Free Collection Set: > 4.3ms > [64398.393s][info][gc,phases ? ? ] GC(38782) ? Merge Per-Thread > State: 0.1ms > [64398.393s][info][gc,phases ? ? ] GC(38782) ? Other: 12.2ms > [64398.393s][info][gc,heap ? ? ? ] GC(38782) Eden regions: 37->0(37) > [64398.393s][info][gc,heap ? ? ? ] GC(38782) Survivor regions: 3- > >3(5) > [64398.393s][info][gc,heap ? ? ? ] GC(38782) Old regions: 353->266 > [64398.393s][info][gc,heap ? ? ? ] GC(38782) Humongous regions: 3->3 > [64398.393s][info][gc,metaspace? ] GC(38782) Metaspace: 70587K- > >70587K(83968K) > [64398.393s][info][gc? ? ? ? ? ? ] GC(38782) Pause Mixed (G1 > Evacuation Pause) 12641M->8678M(65536M) (64394.973s, 64398.393s) > 3420.666ms > > It looks like at average old regions in 10th Mixed GC were 31.5 times > more expensive than in 9th and it took 39ms to collect just one > region. Does it make sense? To what extent one old region may be more > expensive than another? Mostly remembered set operations. > I wish G1Ergonomics similar to "reason: predicted time is too high" > but for order of magnitude jump cases worked here even when min old > regions number has not been reached. We didn't spend all > XX:G1MixedGCCountTarget=12 yet here. > > Log file:?https://www.dropbox.com/s/ubpkosh0a8tomss/jdk9_135_tuned_11 > _10_16.log.zip?dl=0? > Sadly with no ergonomics. > > Next thing i'm going to try is adjusting > XX:G1MixedGCLiveThresholdPercent. ? I did not have time for a look at the logs yet, but you can try to avoid this by either increasing MixedGCCountTarget further - as you noticed this is a hint for G1 only anyway - or trying to get rid of these expensive regions. One way is to decrease G1MixedGCLiveThresholdPercent (default 85), as regions with lots of occupancy also often have a large remembered set that is expensive to reclaim. Another way to explore is looking at statistics for remembered set sizes directly. There is -XX:G1SummarizeRSetStatsPeriod which takes a number that tells G1 to collect and print these statistics every G1SummarizeRSetStatsPeriod'th GC. Note that this is an expensive operation, so you might only want to do this every 10th or so GC (needs -XX:+UnlockDiagnosticVMOptions). Thanks, ? Thomas From yu.zhang at oracle.com Wed Oct 12 15:31:07 2016 From: yu.zhang at oracle.com (yu.zhang at oracle.com) Date: Wed, 12 Oct 2016 08:31:07 -0700 Subject: JDK-8166500 Adaptive sizing for IHOP causes excessively long mixed GC pauses In-Reply-To: <1476276706.2632.92.camel@oracle.com> References: <273BC628-AC88-4E21-AB27-32AE2021B8FA@getintent.com> <1475048227.4430.4.camel@oracle.com> <2CA764D0-CCB4-4FA5-9A5A-B005688BD16D@getintent.com> <1476276706.2632.92.camel@oracle.com> Message-ID: <356ce0a6-580e-075b-f355-d064ed1529b5@oracle.com> Ildar, Another thing you can try is to increase G1HeapWastePercent to get rid of the expensive mixed gcs. From the log snip, the heap is not tight. Thanks Jenny On 10/12/2016 05:51 AM, Thomas Schatzl wrote: > Hi, > > On Wed, 2016-10-12 at 11:39 +0300, Ildar Nurislamov wrote: >> Hi Thomas, >> >> It was too early to make conclusions. >> After some prolonged testing i've noticed that more thorough tuning >> may be required to avoid this issue completely. >> And -XX:-G1UseAdaptiveIHOP not always enough too. >> >> What bothers me is the steep jump in time required between the last >> Mixed GC and the previous: >> In 9th it took 129.8ms to evacuate 104 old region: >> [64394.771s][info][gc,phases ] GC(38781) Evacuate Collection >> Set: 129.8ms >> [64394.771s][info][gc,phases ] GC(38781) Code Roots: 0.0ms >> [64394.771s][info][gc,phases ] GC(38781) Clear Card Table: >> 3.4ms >> [64394.771s][info][gc,phases ] GC(38781) Expand Heap After >> Collection: 0.0ms >> [64394.771s][info][gc,phases ] GC(38781) Free Collection Set: >> 3.9ms >> [64394.771s][info][gc,phases ] GC(38781) Merge Per-Thread >> State: 0.1ms >> [64394.771s][info][gc,phases ] GC(38781) Other: 13.8ms >> [64394.771s][info][gc,heap ] GC(38781) Eden regions: 37->0(37) >> [64394.771s][info][gc,heap ] GC(38781) Survivor regions: 3- >>> 3(5) >> [64394.771s][info][gc,heap ] GC(38781) Old regions: 457->353 >> [64394.771s][info][gc,heap ] GC(38781) Humongous regions: 3->3 >> [64394.771s][info][gc,metaspace ] GC(38781) Metaspace: 70587K- >>> 70587K(83968K) >> [64394.771s][info][gc ] GC(38781) Pause Mixed (G1 >> Evacuation Pause) 15972M->11457M(65536M) (64394.620s, 64394.771s) >> 150.931ms >> >> While in 10th (the last) it took 3401.3ms to evacuate 87: >> [64398.393s][info][gc,phases ] GC(38782) Evacuate Collection >> Set: 3401.3ms >> [64398.393s][info][gc,phases ] GC(38782) Code Roots: 0.0ms >> [64398.393s][info][gc,phases ] GC(38782) Clear Card Table: >> 2.8ms >> [64398.393s][info][gc,phases ] GC(38782) Expand Heap After >> Collection: 0.0ms >> [64398.393s][info][gc,phases ] GC(38782) Free Collection Set: >> 4.3ms >> [64398.393s][info][gc,phases ] GC(38782) Merge Per-Thread >> State: 0.1ms >> [64398.393s][info][gc,phases ] GC(38782) Other: 12.2ms >> [64398.393s][info][gc,heap ] GC(38782) Eden regions: 37->0(37) >> [64398.393s][info][gc,heap ] GC(38782) Survivor regions: 3- >>> 3(5) >> [64398.393s][info][gc,heap ] GC(38782) Old regions: 353->266 >> [64398.393s][info][gc,heap ] GC(38782) Humongous regions: 3->3 >> [64398.393s][info][gc,metaspace ] GC(38782) Metaspace: 70587K- >>> 70587K(83968K) >> [64398.393s][info][gc ] GC(38782) Pause Mixed (G1 >> Evacuation Pause) 12641M->8678M(65536M) (64394.973s, 64398.393s) >> 3420.666ms >> >> It looks like at average old regions in 10th Mixed GC were 31.5 times >> more expensive than in 9th and it took 39ms to collect just one >> region. Does it make sense? To what extent one old region may be more >> expensive than another? > Mostly remembered set operations. > >> I wish G1Ergonomics similar to "reason: predicted time is too high" >> but for order of magnitude jump cases worked here even when min old >> regions number has not been reached. We didn't spend all >> XX:G1MixedGCCountTarget=12 yet here. >> >> Log file: https://www.dropbox.com/s/ubpkosh0a8tomss/jdk9_135_tuned_11 >> _10_16.log.zip?dl=0 >> Sadly with no ergonomics. >> >> Next thing i'm going to try is adjusting >> XX:G1MixedGCLiveThresholdPercent. > I did not have time for a look at the logs yet, but you can try to > avoid this by either increasing MixedGCCountTarget further - as you > noticed this is a hint for G1 only anyway - or trying to get rid of > these expensive regions. One way is to decrease > G1MixedGCLiveThresholdPercent (default 85), as regions with lots of > occupancy also often have a large remembered set that is expensive to > reclaim. > > Another way to explore is looking at statistics for remembered set > sizes directly. There is -XX:G1SummarizeRSetStatsPeriod which takes a > number that tells G1 to collect and print these statistics every > G1SummarizeRSetStatsPeriod'th GC. Note that this is an expensive > operation, so you might only want to do this every 10th or so GC (needs > -XX:+UnlockDiagnosticVMOptions). > > Thanks, > Thomas > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From jun.zhuang at hobsons.com Thu Oct 13 15:21:56 2016 From: jun.zhuang at hobsons.com (Jun Zhuang) Date: Thu, 13 Oct 2016 15:21:56 +0000 Subject: Questions regarding Java string literal pool Message-ID: Hi, I have a few questions related to the Java String pool, I wonder if I can get some clarification from the experts? 1. Location of the String pool Following are from some of the posts I read but with conflicting information: ? http://java-performance.info/string-intern-in-java-6-7-8/ ?In those good old days [before java 7] all interned strings were stored in the PermGen ? the fixed size part of heap mainly used for storing loaded classes and string pool.? ? ?in Java 7 ? the string pool was relocated to the heap. ... All strings are now located in the heap, as most of other ordinary objects? Above statement suggests that both the interned strings and the string pool are in the PermGen prior to java 7 but being relocated to the heap in 7. ? https://www.javaranch.com/journal/200409/ScjpTipLine-StringsLiterally.html ?Objects are created on the heap and Strings are no exception. So, Strings that are part of the "String Literal Pool" still live on the heap, but they have references to them from the String Literal Pool.? This post suggests that string literals are created on the heap as other objects but did not tie that to any java version. ? http://www.javamadesoeasy.com/2015/05/string-pool-string-literal-pool-string.html ?From java 7 String pool is a storage area in java heap memory, where all the other objects are created. Prior to Java 7 String pool was created in permgen space of heap.? So prior to java 7 the string pool was in the PermGen; beginning with 7 it?s in the heap. Same as the 1st post. My questions are: 1. Where is the string pool located prior and after java 7 2. Are the string literals & interned strings objects created in the PermGen prior to java 7 then being created on the heap after? 2. Can string literals be garbage collected? The post @https://www.javaranch.com/journal/200409/ScjpTipLine-StringsLiterally.html says ?Unlike most objects, String literals always have a reference to them from the String Literal Pool. That means that they always have a reference to them and are, therefore, not eligible for garbage collection.? But this one @http://java-performance.info/string-intern-in-java-6-7-8/ says ?Yes, all strings in the JVM string pool are eligible for garbage collection if there are no references to them from your program roots.? Are they both true under certain conditions? Appreciate your help, Jun Jun Zhuang Sr. Performance QA Engineer | Hobsons T: +1 513 746 2288 | jun.zhuang at hobsons.com 50 E-Business Way, Suite 300 | Cincinnati, OH 45241 | USA Upgraded by Hobsons - Subscribe Today -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image857000.png Type: image/png Size: 13602 bytes Desc: image857000.png URL: From hongkai.liu at ericsson.com Wed Oct 19 20:16:01 2016 From: hongkai.liu at ericsson.com (Hongkai Liu) Date: Wed, 19 Oct 2016 20:16:01 +0000 Subject: G1GC and finalizer queue Message-ID: Hi, our application (Gerrit) consumes more and more memory and Yourkit showed up with 18M Jdbc4PreparedStatement objects in "pending finalization" which uses up 21G of mem. The heap is taken immediately after two GCs. I wonder why those objects survived of GCs. According to Yourkit doc, the objects in "pending finalization" are from the class with an implemenation of finalize() method while Jdbc4PreparedStatement is without it. Is it about G1GC? Any hint is appreciated. BR, Hongkai ================================ Here are the screenshots of Yourkit and App info. [cid:82499a3c-9de5-4ae9-b0f3-c9f2e173aa40] [cid:5dd8cdb6-7736-4294-9bc5-698ad47ecf29] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2016-10-19 15:51:04.png Type: image/png Size: 54687 bytes Desc: Screenshot from 2016-10-19 15:51:04.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2016-10-19 15:53:51.png Type: image/png Size: 27127 bytes Desc: Screenshot from 2016-10-19 15:53:51.png URL: From ecki at zusammenkunft.net Wed Oct 19 20:39:55 2016 From: ecki at zusammenkunft.net (Bernd Eckenfels) Date: Wed, 19 Oct 2016 22:39:55 +0200 Subject: G1GC and finalizer queue In-Reply-To: References: Message-ID: <20161019223955.0000207d.ecki@zusammenkunft.net> Hello, the finalize() is in one of the parent classes. http://grepcode.com/file/repo1.maven.org/maven2/postgresql/postgresql/9.1-901.jdbc4/org/postgresql/jdbc2/AbstractJdbc2Statement.java#803 I am not sure if youkit shows unreferenced or referenced objects in "pending finalization". If it is referenced objects, the statements might hang around in a prepared statement cache. If they are unreferenced the finalizer thread might be slow or blocked. I would try to do an heapdump to investigate. When properly using datasource pools and tomcat facilities a leak is unlikely. If you have some hardcoded jdbc code, that might also be a possible explanation for the number. Gruss Bernd Am Wed, 19 Oct 2016 20:16:01 +0000 schrieb Hongkai Liu : > Hi, > > > our application (Gerrit) consumes more and more memory and Yourkit > showed up with 18M > Jdbc4PreparedStatement > objects in "pending finalization" which uses up 21G of mem. > > The heap is taken immediately after two GCs. > > > I wonder why those objects survived of GCs. > > According to Yourkit > doc, the > objects in "pending finalization" are from the class with an > implemenation of finalize() method while > Jdbc4PreparedStatement > is without it. > > Is it about G1GC? > > > Any hint is appreciated. > > > BR, > > Hongkai > > > ================================ > > > Here are the screenshots of Yourkit and App info. > > [cid:82499a3c-9de5-4ae9-b0f3-c9f2e173aa40] > > [cid:5dd8cdb6-7736-4294-9bc5-698ad47ecf29] > > From vitalyd at gmail.com Wed Oct 19 21:52:09 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 19 Oct 2016 17:52:09 -0400 Subject: G1GC and finalizer queue In-Reply-To: <20161019223955.0000207d.ecki@zusammenkunft.net> References: <20161019223955.0000207d.ecki@zusammenkunft.net> Message-ID: On Wednesday, October 19, 2016, Bernd Eckenfels wrote: > Hello, > > the finalize() is in one of the parent classes. > > http://grepcode.com/file/repo1.maven.org/maven2/ > postgresql/postgresql/9.1-901.jdbc4/org/postgresql/jdbc2/ > AbstractJdbc2Statement.java#803 > > I am not sure if youkit shows unreferenced or referenced objects in > "pending finalization". If it is referenced objects, the statements > might hang around in a prepared statement cache. If they are > unreferenced the finalizer thread might be slow or blocked. YK docs indicate it's showing (strongly) unreachable objects that are sitting on the finalization queue. This implies (unless YK is broken) these instances have already been discovered by G1 to be unreachable, and thus got enqueued for finalization. I'd jstack/sigquit the Java process to get a thread dump and see what the Finalizer thread is up to. > > I would try to do an heapdump to investigate. > > When properly using datasource pools and tomcat facilities a leak is > unlikely. If you have some hardcoded jdbc code, that might also be a > possible explanation for the number. > > Gruss > Bernd > > Am Wed, 19 Oct 2016 20:16:01 +0000 > schrieb Hongkai Liu >: > > > Hi, > > > > > > our application (Gerrit) consumes more and more memory and Yourkit > > showed up with 18M > > Jdbc4PreparedStatement org/maven2/postgresql/postgresql/9.1-901.jdbc4/org/postgresql/jdbc4/ > Jdbc4PreparedStatement.java> > > objects in "pending finalization" which uses up 21G of mem. > > > > The heap is taken immediately after two GCs. > > > > > > I wonder why those objects survived of GCs. > > > > According to Yourkit > > doc, the > > objects in "pending finalization" are from the class with an > > implemenation of finalize() method while > > Jdbc4PreparedStatement org/maven2/postgresql/postgresql/9.1-901.jdbc4/org/postgresql/jdbc4/ > Jdbc4PreparedStatement.java> > > is without it. > > > > Is it about G1GC? > > > > > > Any hint is appreciated. > > > > > > BR, > > > > Hongkai > > > > > > ================================ > > > > > > Here are the screenshots of Yourkit and App info. > > > > [cid:82499a3c-9de5-4ae9-b0f3-c9f2e173aa40] > > > > [cid:5dd8cdb6-7736-4294-9bc5-698ad47ecf29] > > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.toal at gmail.com Fri Oct 21 23:55:39 2016 From: brian.toal at gmail.com (Brian Toal) Date: Fri, 21 Oct 2016 16:55:39 -0700 Subject: metaspace proportion of fragmentation Message-ID: Good evening. In a application that I'm responsible for, Metaspace is set to 1.1GB. Specifically the following flags are set: -XX:MetaspaceSize=1152m -XX:MaxMetaspaceSize=1152m -XX:MinMetaspaceFreeRatio=0 -XX:MaxMetaspaceFreeRatio=100 However we are getting a OOME when metaspace size hits 80% of 1.1GB. Doing a bit or research it seems that Metaspace is known to fragement the memory when a loader needs to acquire memory from the current chunk, and the current chuck can't accomodate the request, the pointer is bumped to the next available chunk, meaning any free memory in the previous chunks block is gone with the wind. More than happy if someone corrects my understand here or can point me to a good reference that explains this in detail. My question is, how to do I monitor current usage + fragmentation so the proportion of free space can be monitored? Also is there any tuning that can take place to reduce the proportion of fragmentation? Does compressed cache acquire memory from memory set aside from memory allocated via MaxMetaspaceSize? Thanks in advance. -------------- next part -------------- An HTML attachment was scrubbed... URL: From prasanna.gopal at blackrock.com Tue Oct 25 08:41:53 2016 From: prasanna.gopal at blackrock.com (Gopal, Prasanna CWK) Date: Tue, 25 Oct 2016 08:41:53 +0000 Subject: G1 GC Humongous Objects - Garbage collection Message-ID: <3749e09396984048a2485a1324b86e6c@UKPMSEXD202N02.na.blkint.com> Hi All I have the following question about Garbage collection of Humongous objects. 1) When will the humongous objects will get reclaimed ? 2) Is there is any behaviour difference between Jdk 7 and Jdk 8 run time ? 3) I understand, in pre-jdk 8 G1 GC , the humongous objects gets collected only through Full GC. In my application , I couldn?t see Full GC happening for long time (running on jdk_7u40_x64) , does this means the humongous objects stay in memory , till we have a full GC ? Appreciate your help in answering these questions. Thanks and Regards Prasanna This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information. Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock?s Privacy Policy. BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) Limited are authorised and regulated by the Financial Conduct Authority. Registered in England No. 796793 and No. 2020394 respectively. BlackRock Life Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Registered in England No. 2223202. Registered Offices: 12 Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is authorised and regulated by the Financial Conduct Authority and is a registered investment adviser with the Securities and Exchange Commission (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange Place One, 1 Semple Street, Edinburgh EH3 8BL. For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations. ? 2016 BlackRock, Inc. All rights reserved. -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Tue Oct 25 09:24:20 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 25 Oct 2016 11:24:20 +0200 Subject: G1 GC Humongous Objects - Garbage collection In-Reply-To: <3749e09396984048a2485a1324b86e6c@UKPMSEXD202N02.na.blkint.com> References: <3749e09396984048a2485a1324b86e6c@UKPMSEXD202N02.na.blkint.com> Message-ID: <1477387460.2969.14.camel@oracle.com> Hi, On Tue, 2016-10-25 at 08:41 +0000, Gopal, Prasanna CWK wrote: > Hi All > ? > I have the following question about Garbage collection of? Humongous > objects. > ? > 1)???? When will the humongous objects will get reclaimed ? > 2)???? Is there is any behaviour difference between Jdk 7 and Jdk 8 > run time ? > 3)???? I understand, in pre-jdk 8 G1 GC , the humongous objects gets > collected only through Full GC. In my application , I couldn?t see > Full GC happening for long time (running on jdk_7u40_x64) , does this > means the humongous objects stay in memory , till we have a full GC ? G1 can reclaim humongous objects... * at the end of marking in the GC Cleanup pause. * during full gc. * JDK8u60+ can also reclaim particular types of humongous objects (arrays that do _not_ consist of references to objects) at every young GC. See the release notes for 8u60 at?http://www.oracle.com/technetwork /java/javase/8u60-relnotes-2620227.html?under "New Features and Changes" for how to control this. (It works for any array of primitive type, not limited to the examples given there - just in case you wonder). Thanks, ? Thomas From thomas.stuefe at gmail.com Tue Oct 25 09:37:40 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 25 Oct 2016 11:37:40 +0200 Subject: metaspace proportion of fragmentation In-Reply-To: References: Message-ID: Hi Brian, On Sat, Oct 22, 2016 at 1:55 AM, Brian Toal wrote: > Good evening. In a application that I'm responsible for, Metaspace is set > to 1.1GB. Specifically the following flags are set: > > -XX:MetaspaceSize=1152m > -XX:MaxMetaspaceSize=1152m > -XX:MinMetaspaceFreeRatio=0 > -XX:MaxMetaspaceFreeRatio=100 > > However we are getting a OOME when metaspace size hits 80% of 1.1GB. > Out of metaspace or out of compressed class space? If the latter, have you set CompressedClassSpaceSize? > Doing a bit or research it seems that Metaspace is known to fragement the > memory when a loader needs to acquire memory from the current chunk, and > the current chuck can't accomodate the request, the pointer is bumped to > the next available chunk, meaning any free memory in the previous chunks > block is gone with the wind. > No. The remaining space is put into freelists (both on chunk and on block level) and used for follow-up requests, should the size fit. In our experience, we see very little wastage due to "half-eaten blocks/chunks. There are other possible waste scenarios: 1) you have a lot of class loaders living in parallel. Each one will take a chunk of memory (its current chunk) and satisfy memory requests from there. This means that the current chunk always contains a portion of still-unused memory which may be used by this class loader in the future but already counts against MaxMetaspaceSize. However, to make this hurt, you really have to have very many class loaders in parallel, as the maximum possible overhead for this scenario cannot exceed the size of a medium chunk per classloader (64k?) 2) The scenario described in this JEP here: https://bugs.openjdk.java.net/browse/JDK-8166690 . 3) real fragmentation (i.e. a mixture of in-use and free chunks). In my practice, I keep seeing (2). Hence the JEP, which will hopefully help. Kind Regards, Thomas > More than happy if someone corrects my understand here or can point me to > a good reference that explains this in detail. > > My question is, how to do I monitor current usage + fragmentation so the > proportion of free space can be monitored? > > Also is there any tuning that can take place to reduce the proportion of > fragmentation? > > Does compressed cache acquire memory from memory set aside from memory > allocated via MaxMetaspaceSize? > > Thanks in advance. > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From prasanna.gopal at blackrock.com Tue Oct 25 09:53:34 2016 From: prasanna.gopal at blackrock.com (Gopal, Prasanna CWK) Date: Tue, 25 Oct 2016 09:53:34 +0000 Subject: G1 GC Humongous Objects - Garbage collection In-Reply-To: <1477387460.2969.14.camel@oracle.com> References: <3749e09396984048a2485a1324b86e6c@UKPMSEXD202N02.na.blkint.com> <1477387460.2969.14.camel@oracle.com> Message-ID: <17856615e3ec4df3a0559a7ab31122e8@UKPMSEXD202N02.na.blkint.com> ?Hi Thomas Thanks for your explanation. Appreciate your help. Thanks and Regards Prasanna -----Original Message----- From: Thomas Schatzl [mailto:thomas.schatzl at oracle.com] Sent: 25 October 2016 10:24 To: Gopal, Prasanna CWK ; hotspot-gc-use at openjdk.java.net Subject: Re: G1 GC Humongous Objects - Garbage collection Hi, On Tue, 2016-10-25 at 08:41 +0000, Gopal, Prasanna CWK wrote: > Hi All > ? > I have the following question about Garbage collection of? Humongous > objects. > ? > 1)???? When will the humongous objects will get reclaimed ? > 2)???? Is there is any behaviour difference between Jdk 7 and Jdk 8 > run time ? > 3)???? I understand, in pre-jdk 8 G1 GC , the humongous objects gets > collected only through Full GC. In my application , I couldn?t see > Full GC happening for long time (running on jdk_7u40_x64) , does this > means the humongous objects stay in memory , till we have a full GC ? G1 can reclaim humongous objects... * at the end of marking in the GC Cleanup pause. * during full gc. * JDK8u60+ can also reclaim particular types of humongous objects (arrays that do _not_ consist of references to objects) at every young GC. See the release notes for 8u60 at?https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oracle.com_technetwork&d=DQIFaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=zRhnqN6xuCQh8NZ-MtoiYBMlItU6r8UBO9AjZ3c3DEY&m=5pQkGSufUB_aL1XJUcW86zVuBn5xYh1XrUD5N2zcu1M&s=OKbYPqGNR3NGiLzOFh6tXk2cXLnbhFxrp8H4Svff20A&e= /java/javase/8u60-relnotes-2620227.html?under "New Features and Changes" for how to control this. (It works for any array of primitive type, not limited to the examples given there - just in case you wonder). Thanks, ? Thomas This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/en-us/compliance/email-disclaimers for further information. Please refer to http://www.blackrock.com/corporate/en-us/compliance/privacy-policy for more information about BlackRock?s Privacy Policy. BlackRock Advisors (UK) Limited and BlackRock Investment Management (UK) Limited are authorised and regulated by the Financial Conduct Authority. Registered in England No. 796793 and No. 2020394 respectively. BlackRock Life Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Registered in England No. 2223202. Registered Offices: 12 Throgmorton Avenue, London EC2N 2DL. BlackRock International Limited is authorised and regulated by the Financial Conduct Authority and is a registered investment adviser with the Securities and Exchange Commission (SEC). Registered in Scotland No. SC160821. Registered Office: Exchange Place One, 1 Semple Street, Edinburgh EH3 8BL. For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/en-us/about-us/contacts-locations. ? 2016 BlackRock, Inc. All rights reserved. From brian.toal at gmail.com Tue Oct 25 17:05:02 2016 From: brian.toal at gmail.com (Brian Toal) Date: Tue, 25 Oct 2016 10:05:02 -0700 Subject: metaspace proportion of fragmentation In-Reply-To: References: Message-ID: Thanks for the reply Thomas. We are getting a OOME on metaspace and do not set CompressedClassSpaceSize. It seems our used CompressedClassSpaceSize is ~100MB. If this isn't set, will it use as much of the specified XX:MaxMetaspaceSize as needed? We do have a lot of class loaders, after looking closely at the heap. Looks like ~11k loaders, where the majority of them are sun.reflect.DelegatingClassLoader's corresponding to reflective method instances that are being strongly referenced. I'm not sure if the chunk size is 64k because that would lead to ~687MB of Metaspace going to the initial allocation for each loader, however looking at the output of "jcmd GC.class_stats" and summing up the total for all DelegatingClassLoader's it shows only ~46MB is used and the remaining classes account for ~840MB. Maybe the total accumulated chunk memory is not part of the output of "jcmd GC.class_stats", do you know if allocated by unused space by the loader is reported here? If not is there anyway to get this info on a production JVM? Looking at the pdf was on the reference of 2, it seem like the DelegatingClassLoader's are consuming 4 small chunks of 512 words which is 4 x 512 words x 8 bytes/word so as a lower bound i would expect the cost of all those loaders to be ~171MB. The metaspace usage in the JVM is ~940MB and compressed class space is ~100mb, so adding the ~171MB it seems to bring us to a total of ~1.18G which is roughly the value of XX:MaxMetaspaceSize. I've found a few ways to limit the number of DelegatingClassLoader's by either changing the inflation threshold or we could possibly just break the path back to gc root of the class instance that is owned by the corresponding DelegatingClassLoader. It's a shame that the same DelegatingClassLoader isn't reused, but I suppose the finer granularity of method to classloader is to increase the chances that the loader can be unloaded when the method is not longer referenced. On Tue, Oct 25, 2016 at 2:37 AM, Thomas St?fe wrote: > Hi Brian, > > On Sat, Oct 22, 2016 at 1:55 AM, Brian Toal wrote: > >> Good evening. In a application that I'm responsible for, Metaspace is >> set to 1.1GB. Specifically the following flags are set: >> >> -XX:MetaspaceSize=1152m >> -XX:MaxMetaspaceSize=1152m >> -XX:MinMetaspaceFreeRatio=0 >> -XX:MaxMetaspaceFreeRatio=100 >> >> However we are getting a OOME when metaspace size hits 80% of 1.1GB. >> > > Out of metaspace or out of compressed class space? If the latter, have you > set CompressedClassSpaceSize? > > >> Doing a bit or research it seems that Metaspace is known to fragement the >> memory when a loader needs to acquire memory from the current chunk, and >> the current chuck can't accomodate the request, the pointer is bumped to >> the next available chunk, meaning any free memory in the previous chunks >> block is gone with the wind. >> > > No. The remaining space is put into freelists (both on chunk and on block > level) and used for follow-up requests, should the size fit. In our > experience, we see very little wastage due to "half-eaten blocks/chunks. > > There are other possible waste scenarios: > > 1) you have a lot of class loaders living in parallel. Each one will take > a chunk of memory (its current chunk) and satisfy memory requests from > there. This means that the current chunk always contains a portion of > still-unused memory which may be used by this class loader in the future > but already counts against MaxMetaspaceSize. However, to make this hurt, > you really have to have very many class loaders in parallel, as the maximum > possible overhead for this scenario cannot exceed the size of a medium > chunk per classloader (64k?) > > 2) The scenario described in this JEP here: https://bugs.openjdk. > java.net/browse/JDK-8166690 . > > 3) real fragmentation (i.e. a mixture of in-use and free chunks). > > In my practice, I keep seeing (2). Hence the JEP, which will hopefully > help. > > Kind Regards, Thomas > > > >> More than happy if someone corrects my understand here or can point me to >> a good reference that explains this in detail. >> >> My question is, how to do I monitor current usage + fragmentation so the >> proportion of free space can be monitored? >> >> Also is there any tuning that can take place to reduce the proportion of >> fragmentation? >> >> Does compressed cache acquire memory from memory set aside from memory >> allocated via MaxMetaspaceSize? >> >> Thanks in advance. >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.ely at unboundid.com Fri Oct 28 22:43:01 2016 From: david.ely at unboundid.com (David Ely) Date: Fri, 28 Oct 2016 17:43:01 -0500 Subject: occasional ParNew times of 15+ seconds Message-ID: While typical ParNew GC times are 50ms, our application is occasionally hitting ParNew times that are over 15 seconds for one of our customers, and we have no idea why. Looking at the full GC log file: 382250 ParNew GCs are < 1 second 9303 are 100ms to 1 second 1267 are 1 second to 2 seconds 99 are 2 seconds to 10 seconds 24 are > 10 seconds, 48 seconds being the max The long ones are somewhat bursty as you can see from looking at the line numbers in the GC log: $ egrep -n '(ParNew.*real=[1-9][0-9]\)' gc.log.0 12300:2016-10-21T01:03:20.380+0000: 20278.069: [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew: 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K), 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs] 43730:2016-10-21T14:12:25.050+0000: 67622.740: [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew: 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K), 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs] 44079:2016-10-21T14:18:55.172+0000: 68012.862: [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew: 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K), 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs] 50151:2016-10-21T17:10:14.471+0000: 78292.160: [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew: 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K), 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs] 56073:2016-10-21T19:59:36.847+0000: 88454.536: [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew: 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K), 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs] 78987:2016-10-22T05:49:25.623+0000: 123843.312: [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew: 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K), 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs] 79104:2016-10-22T05:59:26.382+0000: 124444.071: [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew: 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K), 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs] 79504:2016-10-22T06:09:36.983+0000: 125054.672: [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew: 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K), 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs] 79772:2016-10-22T06:30:36.130+0000: 126313.819: [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew: 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K), 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs] 80087:2016-10-22T06:37:07.202+0000: 126704.891: [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew: 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K), 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs] 89969:2016-10-22T13:54:27.978+0000: 152945.667: [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew: 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K), 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs] 90200:2016-10-22T14:05:02.717+0000: 153580.407: [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew: 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K), 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs] 90299:2016-10-22T14:14:30.521+0000: 154148.210: [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew: 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K), 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs] 261329:2016-10-26T00:06:44.499+0000: 448882.189: [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew: 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K), 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs] 261935:2016-10-26T00:13:34.277+0000: 449291.967: [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew: 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K), 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs] 262143:2016-10-26T00:20:09.397+0000: 449687.087: [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew: 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K), 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs] 262275:2016-10-26T00:27:02.196+0000: 450099.886: [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew: 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K), 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs] 262282:2016-10-26T00:27:29.448+0000: 450127.138: [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew: 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K), 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs] 262631:2016-10-26T00:34:17.632+0000: 450535.321: [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew: 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K), 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs] 262844:2016-10-26T00:41:08.118+0000: 450945.808: [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew: 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K), 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs] 345421:2016-10-27T04:17:59.617+0000: 550357.306: [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew: 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K), 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs] 345510:2016-10-27T04:24:11.721+0000: 550729.411: [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew: 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K), 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs] 345514:2016-10-27T04:24:36.695+0000: 550754.385: [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew: 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K), 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs] 345777:2016-10-27T04:31:30.102+0000: 551167.791: [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] Context around a single instance is fairly normal: 345773-2016-10-27T04:31:28.032+0000: 551165.721: [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew: 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K), 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs] 345774-2016-10-27T04:31:28.635+0000: 551166.324: [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew: 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K), 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs] 345775-2016-10-27T04:31:29.205+0000: 551166.894: [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew: 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K), 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs] 345776-2016-10-27T04:31:29.798+0000: 551167.487: [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew: 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K), 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs] 345777:2016-10-27T04:31:30.102+0000: 551167.791: [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] 345778-2016-10-27T04:32:08.449+0000: 551206.139: [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew: 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K), 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs] 345779-2016-10-27T04:32:09.090+0000: 551206.779: [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew: 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K), 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs] 345780-2016-10-27T04:32:09.802+0000: 551207.491: [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew: 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K), 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs] 345781-2016-10-27T04:32:10.536+0000: 551208.226: [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew: 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K), 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs] 345782-2016-10-27T04:32:11.137+0000: 551208.826: [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew: 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K), 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs] 345783-2016-10-27T04:32:11.642+0000: 551209.332: [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew: 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K), 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs] Since the user times are high as well, I don't think this could be swapping. Here are the hard-earned set of JVM arguments that we're using: -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \ -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \ -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \ -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \ -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \ -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC -XX:+UseBiasedLocking \ -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \ -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \ -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages \ -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags \ -XX:+UseGCLogFileRotation \ -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \ -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log This is on Linux with Java 1.7.0_72. Does this look familiar to anyone? Alternatively, are there some more JVM options that we could include to get more information? One of the first things that we'll try is to move to a later JVM, but it will be easier to get the customer to do that if we can point to a specific issue that has been addressed. Thanks for your help. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From Peter.B.Kessler at Oracle.COM Fri Oct 28 23:30:24 2016 From: Peter.B.Kessler at Oracle.COM (Peter B. Kessler) Date: Fri, 28 Oct 2016 16:30:24 -0700 Subject: occasional ParNew times of 15+ seconds In-Reply-To: References: Message-ID: <3f2b0f43-d5ae-ce6b-f525-80d61022406e@Oracle.COM> Look at the promotion rates (sort of) in the "context around a single instance": subtract the "after" size of one line from the "after" size of the next line. I see Size Duration PromotedK ------------------------------- --------- --------- 49545909K->47870050K(84724992K) 0.049020 49547874K->47872545K(84724992K) 0.047341 2495 49550369K->47876404K(84724992K) 0.049631 3859 49554228K->47892320K(84724992K) 0.048118 15916 49570144K->48422333K(84724992K) 38.098495 530013 50100157K->48528020K(84724992K) 0.067286 105687 50205844K->48541030K(84724992K) 0.069611 13010 50218854K->48542651K(84724992K) 0.051600 1621 50220475K->48545932K(84724992K) 0.067547 3281 50223756K->48540797K(84724992K) 0.063965 -5135 50218621K->48545033K(84724992K) 0.051678 4236 (I hope that survives mail reformatting, but you get the idea.) The interval around the long collection is not like the rest of the context. It promotes 100x the amount of memory, with some ramp up in the collection before and soma ramp down in the 2 collections after. What's up with that? Even 100x longer wouldn't be 38 seconds. And the collections before and after copy more data but don't take longer. So there's something that doesn't scale about the objects being copied, too. Chasing a long list? Copying a big array of references that then also have to be promoted? Those would both happen all in one collection, not smeared out over 4 collections. But it seems like application behavior is involved, not just a failure of the collector. Do you know what the application is doing at that time? Is is doing that at the other times with long pauses? Do the contexts for the other long pauses look like that? Or it could be something else. It often is. :-) I'm just looking under the streetlight. ... peter On 10/28/16 03:43 PM, David Ely wrote: > While typical ParNew GC times are 50ms, our application is occasionally hitting ParNew times that are over 15 seconds for one of our customers, and we have no idea why. Looking at the full GC log file: > > 382250 ParNew GCs are < 1 second > 9303 are 100ms to 1 second > 1267 are 1 second to 2 seconds > 99 are 2 seconds to 10 seconds > 24 are > 10 seconds, 48 seconds being the max > > The long ones are somewhat bursty as you can see from looking at the line numbers in the GC log: > > $ egrep -n '(ParNew.*real=[1-9][0-9]\)' gc.log.0 > > 12300:2016-10-21T01:03:20.380+0000: 20278.069: [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew: 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K), 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs] > 43730:2016-10-21T14:12:25.050+0000: 67622.740: [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew: 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K), 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs] > 44079:2016-10-21T14:18:55.172+0000: 68012.862: [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew: 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K), 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs] > 50151:2016-10-21T17:10:14.471+0000: 78292.160: [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew: 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K), 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs] > 56073:2016-10-21T19:59:36.847+0000: 88454.536: [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew: 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K), 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs] > 78987:2016-10-22T05:49:25.623+0000: 123843.312: [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew: 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K), 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs] > 79104:2016-10-22T05:59:26.382+0000: 124444.071: [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew: 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K), 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs] > 79504:2016-10-22T06:09:36.983+0000: 125054.672: [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew: 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K), 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs] > 79772:2016-10-22T06:30:36.130+0000: 126313.819: [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew: 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K), 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs] > 80087:2016-10-22T06:37:07.202+0000: 126704.891: [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew: 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K), 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs] > 89969:2016-10-22T13:54:27.978+0000: 152945.667: [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew: 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K), 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs] > 90200:2016-10-22T14:05:02.717+0000: 153580.407: [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew: 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K), 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs] > 90299:2016-10-22T14:14:30.521+0000: 154148.210: [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew: 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K), 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs] > 261329:2016-10-26T00:06:44.499+0000: 448882.189: [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew: 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K), 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs] > 261935:2016-10-26T00:13:34.277+0000: 449291.967: [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew: 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K), 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs] > 262143:2016-10-26T00:20:09.397+0000: 449687.087: [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew: 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K), 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs] > 262275:2016-10-26T00:27:02.196+0000: 450099.886: [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew: 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K), 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs] > 262282:2016-10-26T00:27:29.448+0000: 450127.138: [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew: 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K), 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs] > 262631:2016-10-26T00:34:17.632+0000: 450535.321: [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew: 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K), 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs] > 262844:2016-10-26T00:41:08.118+0000: 450945.808: [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew: 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K), 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs] > 345421:2016-10-27T04:17:59.617+0000: 550357.306: [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew: 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K), 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs] > 345510:2016-10-27T04:24:11.721+0000: 550729.411: [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew: 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K), 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs] > 345514:2016-10-27T04:24:36.695+0000: 550754.385: [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew: 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K), 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs] > 345777:2016-10-27T04:31:30.102+0000: 551167.791: [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] > > Context around a single instance is fairly normal: > > 345773-2016-10-27T04:31:28.032+0000: 551165.721: [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew: 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K), 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs] > 345774-2016-10-27T04:31:28.635+0000: 551166.324: [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew: 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K), 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs] > 345775-2016-10-27T04:31:29.205+0000: 551166.894: [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew: 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K), 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs] > 345776-2016-10-27T04:31:29.798+0000: 551167.487: [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew: 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K), 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs] > 345777:2016-10-27T04:31:30.102+0000: 551167.791: [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] > 345778-2016-10-27T04:32:08.449+0000: 551206.139: [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew: 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K), 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs] > 345779-2016-10-27T04:32:09.090+0000: 551206.779: [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew: 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K), 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs] > 345780-2016-10-27T04:32:09.802+0000: 551207.491: [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew: 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K), 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs] > 345781-2016-10-27T04:32:10.536+0000: 551208.226: [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew: 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K), 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs] > 345782-2016-10-27T04:32:11.137+0000: 551208.826: [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew: 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K), 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs] > 345783-2016-10-27T04:32:11.642+0000: 551209.332: [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew: 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K), 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs] > > Since the user times are high as well, I don't think this could be swapping. > > Here are the hard-earned set of JVM arguments that we're using: > > -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \ > -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \ > -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \ > -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \ > -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \ > -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC -XX:+UseBiasedLocking \ > -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \ > -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \ > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages \ > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags \ > -XX:+UseGCLogFileRotation \ > -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \ > -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log > > This is on Linux with Java 1.7.0_72. > > Does this look familiar to anyone? Alternatively, are there some more JVM options that we could include to get more information? > > One of the first things that we'll try is to move to a later JVM, but it will be easier to get the customer to do that if we can point to a specific issue that has been addressed. > > Thanks for your help. > > David > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From vitalyd at gmail.com Sat Oct 29 01:04:54 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 28 Oct 2016 21:04:54 -0400 Subject: occasional ParNew times of 15+ seconds In-Reply-To: References: Message-ID: On Friday, October 28, 2016, David Ely wrote: > While typical ParNew GC times are 50ms, our application is occasionally > hitting ParNew times that are over 15 seconds for one of our customers, and > we have no idea why. Looking at the full GC log file: > > 382250 ParNew GCs are < 1 second > 9303 are 100ms to 1 second > 1267 are 1 second to 2 seconds > 99 are 2 seconds to 10 seconds > 24 are > 10 seconds, 48 seconds being the max > > The long ones are somewhat bursty as you can see from looking at the line > numbers in the GC log: > > $ egrep -n '(ParNew.*real=[1-9][0-9]\)' gc.log.0 > > 12300:2016-10-21T01:03:20.380+0000: 20278.069: [GC2016-10-21T01:03:20.380+0000: > 20278.070: [ParNew: 1697741K->10024K(1887488K), 16.9913450 secs] > 33979542K->32817239K(84724992K), 16.9921050 secs] [Times: user=541.32 > sys=14.37, real=16.99 secs] > 43730:2016-10-21T14:12:25.050+0000: 67622.740: [GC2016-10-21T14:12:25.051+0000: > 67622.740: [ParNew: 1728194K->33817K(1887488K), 12.7508470 secs] > 49737924K->48320707K(84724992K), 12.7517840 secs] [Times: user=405.89 > sys=11.05, real=12.75 secs] > 44079:2016-10-21T14:18:55.172+0000: 68012.862: [GC2016-10-21T14:18:55.173+0000: > 68012.862: [ParNew: 1698371K->26958K(1887488K), 12.7384460 secs] > 50339815K->48930730K(84724992K), 12.7392360 secs] [Times: user=406.58 > sys=11.29, real=12.73 secs] > 50151:2016-10-21T17:10:14.471+0000: 78292.160: [GC2016-10-21T17:10:14.471+0000: > 78292.161: [ParNew: 1713813K->40968K(1887488K), 18.6593320 secs] > 49366906K->47959129K(84724992K), 18.6602550 secs] [Times: user=590.03 > sys=17.45, real=18.66 secs] > 56073:2016-10-21T19:59:36.847+0000: 88454.536: [GC2016-10-21T19:59:36.847+0000: > 88454.537: [ParNew: 1685720K->20763K(1887488K), 16.0840200 secs] > 50704025K->49302131K(84724992K), 16.0848810 secs] [Times: user=487.00 > sys=16.84, real=16.09 secs] > 78987:2016-10-22T05:49:25.623+0000: 123843.312: > [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew: > 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K), > 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs] > 79104:2016-10-22T05:59:26.382+0000: 124444.071: > [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew: > 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K), > 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs] > 79504:2016-10-22T06:09:36.983+0000: 125054.672: > [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew: > 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K), > 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs] > 79772:2016-10-22T06:30:36.130+0000: 126313.819: > [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew: > 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K), > 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs] > 80087:2016-10-22T06:37:07.202+0000: 126704.891: > [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew: > 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K), > 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs] > 89969:2016-10-22T13:54:27.978+0000: 152945.667: > [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew: > 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K), > 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs] > 90200:2016-10-22T14:05:02.717+0000: 153580.407: > [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew: > 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K), > 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs] > 90299:2016-10-22T14:14:30.521+0000: 154148.210: > [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew: > 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K), > 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs] > 261329:2016-10-26T00:06:44.499+0000: 448882.189: > [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew: > 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K), > 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs] > 261935:2016-10-26T00:13:34.277+0000: 449291.967: > [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew: > 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K), > 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs] > 262143:2016-10-26T00:20:09.397+0000: 449687.087: > [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew: > 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K), > 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs] > 262275:2016-10-26T00:27:02.196+0000: 450099.886: > [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew: > 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K), > 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs] > 262282:2016-10-26T00:27:29.448+0000: 450127.138: > [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew: > 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K), > 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs] > 262631:2016-10-26T00:34:17.632+0000: 450535.321: > [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew: > 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K), > 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs] > 262844:2016-10-26T00:41:08.118+0000: 450945.808: > [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew: > 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K), > 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs] > 345421:2016-10-27T04:17:59.617+0000: 550357.306: > [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew: > 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K), > 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs] > 345510:2016-10-27T04:24:11.721+0000: 550729.411: > [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew: > 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K), > 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs] > 345514:2016-10-27T04:24:36.695+0000: 550754.385: > [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew: > 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K), > 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs] > 345777:2016-10-27T04:31:30.102+0000: 551167.791: > [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: > 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), > 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] > > Context around a single instance is fairly normal: > > 345773-2016-10-27T04:31:28.032+0000: 551165.721: > [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew: > 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K), > 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs] > 345774-2016-10-27T04:31:28.635+0000: 551166.324: > [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew: > 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K), > 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs] > 345775-2016-10-27T04:31:29.205+0000: 551166.894: > [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew: > 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K), > 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs] > 345776-2016-10-27T04:31:29.798+0000: 551167.487: > [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew: > 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K), > 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs] > 345777:2016-10-27T04:31:30.102+0000: 551167.791: > [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: > 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), > 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] > 345778-2016-10-27T04:32:08.449+0000: 551206.139: > [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew: > 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K), > 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs] > 345779-2016-10-27T04:32:09.090+0000: 551206.779: > [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew: > 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K), > 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs] > 345780-2016-10-27T04:32:09.802+0000: 551207.491: > [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew: > 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K), > 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs] > 345781-2016-10-27T04:32:10.536+0000: 551208.226: > [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew: > 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K), > 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs] > 345782-2016-10-27T04:32:11.137+0000: 551208.826: > [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew: > 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K), > 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs] > 345783-2016-10-27T04:32:11.642+0000: 551209.332: > [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew: > 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K), > 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs] > > Since the user times are high as well, I don't think this could be > swapping. > Can you ask the customer if they're using transparent hugepages (THP)? > > Here are the hard-earned set of JVM arguments that we're using: > > -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \ > -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \ > -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \ > -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \ > -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \ > -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC > -XX:+UseBiasedLocking \ > -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \ > -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \ > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages \ > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags \ > -XX:+UseGCLogFileRotation \ > -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \ > -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log > > This is on Linux with Java 1.7.0_72. > > Does this look familiar to anyone? Alternatively, are there some more JVM > options that we could include to get more information? > > One of the first things that we'll try is to move to a later JVM, but it > will be easier to get the customer to do that if we can point to a specific > issue that has been addressed. > > Thanks for your help. > > David > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.ely at unboundid.com Sat Oct 29 13:40:02 2016 From: david.ely at unboundid.com (David Ely) Date: Sat, 29 Oct 2016 08:40:02 -0500 Subject: occasional ParNew times of 15+ seconds In-Reply-To: References: Message-ID: Thank you for the response. Yes. meminfo (see full output below) shows ~80GB of AnonHugePages, which is pretty close to the size of the JVM (full output below). Looking back through previous information that we have from this customer, transparent huge pages have been turned on for years. We've asked them for anything else that might have changed in this environment. Are there any other JVM options that we could enable that would shed light on what's going on within the ParNew? Would -XX:+PrintTLAB -XX:+PrintPLAB -XX:PrintFLSStatistics=1 show anything useful? David MemTotal: 264396572 kB MemFree: 2401576 kB Buffers: 381564 kB Cached: 172673120 kB SwapCached: 0 kB Active: 163439836 kB Inactive: 90737452 kB Active(anon): 76910848 kB Inactive(anon): 4212580 kB Active(file): 86528988 kB Inactive(file): 86524872 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 16236540 kB SwapFree: 16236540 kB Dirty: 14552 kB Writeback: 0 kB AnonPages: 81111768 kB Mapped: 31312 kB Shmem: 212 kB Slab: 6078732 kB SReclaimable: 5956052 kB SUnreclaim: 122680 kB KernelStack: 41296 kB PageTables: 171324 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 148434824 kB Committed_AS: 93124984 kB VmallocTotal: 34359738367 kB VmallocUsed: 686780 kB VmallocChunk: 34225639420 kB HardwareCorrupted: 0 kB *AnonHugePages: 80519168 kB* HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 5132 kB DirectMap2M: 1957888 kB DirectMap1G: 266338304 kB On Fri, Oct 28, 2016 at 8:04 PM, Vitaly Davidovich wrote: > > > On Friday, October 28, 2016, David Ely wrote: > >> While typical ParNew GC times are 50ms, our application is occasionally >> hitting ParNew times that are over 15 seconds for one of our customers, and >> we have no idea why. Looking at the full GC log file: >> >> 382250 ParNew GCs are < 1 second >> 9303 are 100ms to 1 second >> 1267 are 1 second to 2 seconds >> 99 are 2 seconds to 10 seconds >> 24 are > 10 seconds, 48 seconds being the max >> >> The long ones are somewhat bursty as you can see from looking at the line >> numbers in the GC log: >> >> $ egrep -n '(ParNew.*real=[1-9][0-9]\)' gc.log.0 >> >> 12300:2016-10-21T01:03:20.380+0000: 20278.069: >> [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew: >> 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K), >> 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs] >> 43730:2016-10-21T14:12:25.050+0000: 67622.740: >> [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew: >> 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K), >> 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs] >> 44079:2016-10-21T14:18:55.172+0000: 68012.862: >> [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew: >> 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K), >> 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs] >> 50151:2016-10-21T17:10:14.471+0000: 78292.160: >> [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew: >> 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K), >> 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs] >> 56073:2016-10-21T19:59:36.847+0000: 88454.536: >> [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew: >> 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K), >> 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs] >> 78987:2016-10-22T05:49:25.623+0000: 123843.312: >> [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew: >> 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K), >> 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs] >> 79104:2016-10-22T05:59:26.382+0000: 124444.071: >> [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew: >> 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K), >> 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs] >> 79504:2016-10-22T06:09:36.983+0000: 125054.672: >> [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew: >> 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K), >> 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs] >> 79772:2016-10-22T06:30:36.130+0000: 126313.819: >> [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew: >> 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K), >> 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs] >> 80087:2016-10-22T06:37:07.202+0000: 126704.891: >> [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew: >> 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K), >> 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs] >> 89969:2016-10-22T13:54:27.978+0000: 152945.667: >> [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew: >> 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K), >> 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs] >> 90200:2016-10-22T14:05:02.717+0000: 153580.407: >> [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew: >> 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K), >> 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs] >> 90299:2016-10-22T14:14:30.521+0000: 154148.210: >> [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew: >> 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K), >> 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs] >> 261329:2016-10-26T00:06:44.499+0000: 448882.189: >> [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew: >> 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K), >> 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs] >> 261935:2016-10-26T00:13:34.277+0000: 449291.967: >> [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew: >> 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K), >> 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs] >> 262143:2016-10-26T00:20:09.397+0000: 449687.087: >> [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew: >> 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K), >> 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs] >> 262275:2016-10-26T00:27:02.196+0000: 450099.886: >> [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew: >> 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K), >> 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs] >> 262282:2016-10-26T00:27:29.448+0000: 450127.138: >> [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew: >> 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K), >> 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs] >> 262631:2016-10-26T00:34:17.632+0000: 450535.321: >> [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew: >> 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K), >> 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs] >> 262844:2016-10-26T00:41:08.118+0000: 450945.808: >> [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew: >> 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K), >> 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs] >> 345421:2016-10-27T04:17:59.617+0000: 550357.306: >> [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew: >> 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K), >> 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs] >> 345510:2016-10-27T04:24:11.721+0000: 550729.411: >> [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew: >> 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K), >> 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs] >> 345514:2016-10-27T04:24:36.695+0000: 550754.385: >> [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew: >> 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K), >> 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs] >> 345777:2016-10-27T04:31:30.102+0000: 551167.791: >> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: >> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), >> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] >> >> Context around a single instance is fairly normal: >> >> 345773-2016-10-27T04:31:28.032+0000: 551165.721: >> [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew: >> 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K), >> 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs] >> 345774-2016-10-27T04:31:28.635+0000: 551166.324: >> [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew: >> 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K), >> 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs] >> 345775-2016-10-27T04:31:29.205+0000: 551166.894: >> [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew: >> 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K), >> 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs] >> 345776-2016-10-27T04:31:29.798+0000: 551167.487: >> [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew: >> 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K), >> 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs] >> 345777:2016-10-27T04:31:30.102+0000: 551167.791: >> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: >> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), >> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] >> 345778-2016-10-27T04:32:08.449+0000: 551206.139: >> [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew: >> 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K), >> 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs] >> 345779-2016-10-27T04:32:09.090+0000: 551206.779: >> [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew: >> 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K), >> 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs] >> 345780-2016-10-27T04:32:09.802+0000: 551207.491: >> [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew: >> 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K), >> 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs] >> 345781-2016-10-27T04:32:10.536+0000: 551208.226: >> [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew: >> 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K), >> 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs] >> 345782-2016-10-27T04:32:11.137+0000: 551208.826: >> [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew: >> 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K), >> 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs] >> 345783-2016-10-27T04:32:11.642+0000: 551209.332: >> [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew: >> 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K), >> 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs] >> >> Since the user times are high as well, I don't think this could be >> swapping. >> > Can you ask the customer if they're using transparent hugepages (THP)? > >> >> Here are the hard-earned set of JVM arguments that we're using: >> >> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \ >> -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \ >> -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \ >> -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \ >> -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \ >> -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC >> -XX:+UseBiasedLocking \ >> -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \ >> -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \ >> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages \ >> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags \ >> -XX:+UseGCLogFileRotation \ >> -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \ >> -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log >> >> This is on Linux with Java 1.7.0_72. >> >> Does this look familiar to anyone? Alternatively, are there some more JVM >> options that we could include to get more information? >> >> One of the first things that we'll try is to move to a later JVM, but it >> will be easier to get the customer to do that if we can point to a specific >> issue that has been addressed. >> >> Thanks for your help. >> >> David >> > > > -- > Sent from my phone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.ely at unboundid.com Sat Oct 29 14:56:57 2016 From: david.ely at unboundid.com (David Ely) Date: Sat, 29 Oct 2016 09:56:57 -0500 Subject: occasional ParNew times of 15+ seconds In-Reply-To: <3f2b0f43-d5ae-ce6b-f525-80d61022406e@Oracle.COM> References: <3f2b0f43-d5ae-ce6b-f525-80d61022406e@Oracle.COM> Message-ID: Thanks for the response. It does seem to be related to the amount of data promoted, but that isn't the only factor at play, Here's a plot of the amount of data promoted per ParNew above the ParNew duration for a two hour window: [image: Inline image 2] As you can see long ParNews imply a large promotion but not the reverse. What second factor might be involved? We're looking into what is different in the application at this time. The majority of the heap and hence promoted data is part of a Berkeley DB Java Edition database cache. The database cache holds all of the data and is otherwise stable. There are other activities like database checkpointing and cleaning that happen in the background, but those are going on all of the time. Are there any more JVM options that we could shed light on what's happening during the ParNew collections? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Sat Oct 29 15:07:58 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Sat, 29 Oct 2016 11:07:58 -0400 Subject: occasional ParNew times of 15+ seconds In-Reply-To: References: Message-ID: David, Ask them to turn off THP - it's a known source of large latency due to the kernel doing page defragmentation; your app takes a page fault, and boom - the kernel may start doing defragmentation to make a huge page available. You can search online for THP issues. The symptoms are similar to yours - very high sys time. If they turn it off and still get same lengthy parnew pauses, then it's clearly something else but at least we'll eliminate THP as the culprit. On Saturday, October 29, 2016, David Ely wrote: > Thank you for the response. Yes. meminfo (see full output below) shows > ~80GB of AnonHugePages, which is pretty close to the size of the JVM (full > output below). Looking back through previous information that we have from > this customer, transparent huge pages have been turned on for years. > We've asked them for anything else that might have changed in this > environment. > > Are there any other JVM options that we could enable that would shed light > on what's going on within the ParNew? Would -XX:+PrintTLAB -XX:+PrintPLAB > -XX:PrintFLSStatistics=1 show anything useful? > > David > > > MemTotal: 264396572 kB > MemFree: 2401576 kB > Buffers: 381564 kB > Cached: 172673120 kB > SwapCached: 0 kB > Active: 163439836 kB > Inactive: 90737452 kB > Active(anon): 76910848 kB > Inactive(anon): 4212580 kB > Active(file): 86528988 kB > Inactive(file): 86524872 kB > Unevictable: 0 kB > Mlocked: 0 kB > SwapTotal: 16236540 kB > SwapFree: 16236540 kB > Dirty: 14552 kB > Writeback: 0 kB > AnonPages: 81111768 kB > Mapped: 31312 kB > Shmem: 212 kB > Slab: 6078732 kB > SReclaimable: 5956052 kB > SUnreclaim: 122680 kB > KernelStack: 41296 kB > PageTables: 171324 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 148434824 kB > Committed_AS: 93124984 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 686780 kB > VmallocChunk: 34225639420 kB > HardwareCorrupted: 0 kB > *AnonHugePages: 80519168 kB* > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > DirectMap4k: 5132 kB > DirectMap2M: 1957888 kB > DirectMap1G: 266338304 kB > > > On Fri, Oct 28, 2016 at 8:04 PM, Vitaly Davidovich > wrote: > >> >> >> On Friday, October 28, 2016, David Ely > > wrote: >> >>> While typical ParNew GC times are 50ms, our application is occasionally >>> hitting ParNew times that are over 15 seconds for one of our customers, and >>> we have no idea why. Looking at the full GC log file: >>> >>> 382250 ParNew GCs are < 1 second >>> 9303 are 100ms to 1 second >>> 1267 are 1 second to 2 seconds >>> 99 are 2 seconds to 10 seconds >>> 24 are > 10 seconds, 48 seconds being the max >>> >>> The long ones are somewhat bursty as you can see from looking at the >>> line numbers in the GC log: >>> >>> $ egrep -n '(ParNew.*real=[1-9][0-9]\)' gc.log.0 >>> >>> 12300:2016-10-21T01:03:20.380+0000: 20278.069: >>> [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew: >>> 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K), >>> 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs] >>> 43730:2016-10-21T14:12:25.050+0000: 67622.740: >>> [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew: >>> 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K), >>> 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs] >>> 44079:2016-10-21T14:18:55.172+0000: 68012.862: >>> [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew: >>> 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K), >>> 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs] >>> 50151:2016-10-21T17:10:14.471+0000: 78292.160: >>> [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew: >>> 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K), >>> 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs] >>> 56073:2016-10-21T19:59:36.847+0000: 88454.536: >>> [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew: >>> 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K), >>> 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs] >>> 78987:2016-10-22T05:49:25.623+0000: 123843.312: >>> [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew: >>> 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K), >>> 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs] >>> 79104:2016-10-22T05:59:26.382+0000: 124444.071: >>> [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew: >>> 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K), >>> 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs] >>> 79504:2016-10-22T06:09:36.983+0000: 125054.672: >>> [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew: >>> 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K), >>> 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs] >>> 79772:2016-10-22T06:30:36.130+0000: 126313.819: >>> [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew: >>> 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K), >>> 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs] >>> 80087:2016-10-22T06:37:07.202+0000: 126704.891: >>> [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew: >>> 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K), >>> 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs] >>> 89969:2016-10-22T13:54:27.978+0000: 152945.667: >>> [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew: >>> 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K), >>> 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs] >>> 90200:2016-10-22T14:05:02.717+0000: 153580.407: >>> [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew: >>> 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K), >>> 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs] >>> 90299:2016-10-22T14:14:30.521+0000: 154148.210: >>> [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew: >>> 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K), >>> 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs] >>> 261329:2016-10-26T00:06:44.499+0000: 448882.189: >>> [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew: >>> 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K), >>> 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs] >>> 261935:2016-10-26T00:13:34.277+0000: 449291.967: >>> [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew: >>> 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K), >>> 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs] >>> 262143:2016-10-26T00:20:09.397+0000: 449687.087: >>> [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew: >>> 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K), >>> 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs] >>> 262275:2016-10-26T00:27:02.196+0000: 450099.886: >>> [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew: >>> 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K), >>> 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs] >>> 262282:2016-10-26T00:27:29.448+0000: 450127.138: >>> [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew: >>> 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K), >>> 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs] >>> 262631:2016-10-26T00:34:17.632+0000: 450535.321: >>> [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew: >>> 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K), >>> 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs] >>> 262844:2016-10-26T00:41:08.118+0000: 450945.808: >>> [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew: >>> 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K), >>> 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs] >>> 345421:2016-10-27T04:17:59.617+0000: 550357.306: >>> [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew: >>> 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K), >>> 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs] >>> 345510:2016-10-27T04:24:11.721+0000: 550729.411: >>> [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew: >>> 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K), >>> 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs] >>> 345514:2016-10-27T04:24:36.695+0000: 550754.385: >>> [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew: >>> 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K), >>> 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs] >>> 345777:2016-10-27T04:31:30.102+0000: 551167.791: >>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: >>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), >>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] >>> >>> Context around a single instance is fairly normal: >>> >>> 345773-2016-10-27T04:31:28.032+0000: 551165.721: >>> [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew: >>> 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K), >>> 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs] >>> 345774-2016-10-27T04:31:28.635+0000: 551166.324: >>> [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew: >>> 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K), >>> 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs] >>> 345775-2016-10-27T04:31:29.205+0000: 551166.894: >>> [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew: >>> 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K), >>> 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs] >>> 345776-2016-10-27T04:31:29.798+0000: 551167.487: >>> [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew: >>> 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K), >>> 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs] >>> 345777:2016-10-27T04:31:30.102+0000: 551167.791: >>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: >>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), >>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] >>> 345778-2016-10-27T04:32:08.449+0000: 551206.139: >>> [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew: >>> 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K), >>> 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs] >>> 345779-2016-10-27T04:32:09.090+0000: 551206.779: >>> [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew: >>> 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K), >>> 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs] >>> 345780-2016-10-27T04:32:09.802+0000: 551207.491: >>> [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew: >>> 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K), >>> 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs] >>> 345781-2016-10-27T04:32:10.536+0000: 551208.226: >>> [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew: >>> 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K), >>> 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs] >>> 345782-2016-10-27T04:32:11.137+0000: 551208.826: >>> [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew: >>> 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K), >>> 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs] >>> 345783-2016-10-27T04:32:11.642+0000: 551209.332: >>> [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew: >>> 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K), >>> 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs] >>> >>> Since the user times are high as well, I don't think this could be >>> swapping. >>> >> Can you ask the customer if they're using transparent hugepages (THP)? >> >>> >>> Here are the hard-earned set of JVM arguments that we're using: >>> >>> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \ >>> -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \ >>> -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \ >>> -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \ >>> -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \ >>> -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC >>> -XX:+UseBiasedLocking \ >>> -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \ >>> -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \ >>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages \ >>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags \ >>> -XX:+UseGCLogFileRotation \ >>> -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \ >>> -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log >>> >>> This is on Linux with Java 1.7.0_72. >>> >>> Does this look familiar to anyone? Alternatively, are there some more >>> JVM options that we could include to get more information? >>> >>> One of the first things that we'll try is to move to a later JVM, but it >>> will be easier to get the customer to do that if we can point to a specific >>> issue that has been addressed. >>> >>> Thanks for your help. >>> >>> David >>> >> >> >> -- >> Sent from my phone >> > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlie.hunt at oracle.com Sat Oct 29 23:15:37 2016 From: charlie.hunt at oracle.com (charlie hunt) Date: Sat, 29 Oct 2016 18:15:37 -0500 Subject: occasional ParNew times of 15+ seconds In-Reply-To: References: Message-ID: +1 on disabling THP Charlie > On Oct 29, 2016, at 10:07 AM, Vitaly Davidovich wrote: > > David, > > Ask them to turn off THP - it's a known source of large latency due to the kernel doing page defragmentation; your app takes a page fault, and boom - the kernel may start doing defragmentation to make a huge page available. You can search online for THP issues. The symptoms are similar to yours - very high sys time. > > If they turn it off and still get same lengthy parnew pauses, then it's clearly something else but at least we'll eliminate THP as the culprit. > >> On Saturday, October 29, 2016, David Ely wrote: >> Thank you for the response. Yes. meminfo (see full output below) shows ~80GB of AnonHugePages, which is pretty close to the size of the JVM (full output below). Looking back through previous information that we have from this customer, transparent huge pages have been turned on for years. We've asked them for anything else that might have changed in this environment. >> >> Are there any other JVM options that we could enable that would shed light on what's going on within the ParNew? Would -XX:+PrintTLAB -XX:+PrintPLAB -XX:PrintFLSStatistics=1 show anything useful? >> >> David >> >> >> MemTotal: 264396572 kB >> MemFree: 2401576 kB >> Buffers: 381564 kB >> Cached: 172673120 kB >> SwapCached: 0 kB >> Active: 163439836 kB >> Inactive: 90737452 kB >> Active(anon): 76910848 kB >> Inactive(anon): 4212580 kB >> Active(file): 86528988 kB >> Inactive(file): 86524872 kB >> Unevictable: 0 kB >> Mlocked: 0 kB >> SwapTotal: 16236540 kB >> SwapFree: 16236540 kB >> Dirty: 14552 kB >> Writeback: 0 kB >> AnonPages: 81111768 kB >> Mapped: 31312 kB >> Shmem: 212 kB >> Slab: 6078732 kB >> SReclaimable: 5956052 kB >> SUnreclaim: 122680 kB >> KernelStack: 41296 kB >> PageTables: 171324 kB >> NFS_Unstable: 0 kB >> Bounce: 0 kB >> WritebackTmp: 0 kB >> CommitLimit: 148434824 kB >> Committed_AS: 93124984 kB >> VmallocTotal: 34359738367 kB >> VmallocUsed: 686780 kB >> VmallocChunk: 34225639420 kB >> HardwareCorrupted: 0 kB >> AnonHugePages: 80519168 kB >> HugePages_Total: 0 >> HugePages_Free: 0 >> HugePages_Rsvd: 0 >> HugePages_Surp: 0 >> Hugepagesize: 2048 kB >> DirectMap4k: 5132 kB >> DirectMap2M: 1957888 kB >> DirectMap1G: 266338304 kB >> >> >>> On Fri, Oct 28, 2016 at 8:04 PM, Vitaly Davidovich wrote: >>> >>> >>>> On Friday, October 28, 2016, David Ely wrote: >>>> While typical ParNew GC times are 50ms, our application is occasionally hitting ParNew times that are over 15 seconds for one of our customers, and we have no idea why. Looking at the full GC log file: >>>> >>>> 382250 ParNew GCs are < 1 second >>>> 9303 are 100ms to 1 second >>>> 1267 are 1 second to 2 seconds >>>> 99 are 2 seconds to 10 seconds >>>> 24 are > 10 seconds, 48 seconds being the max >>>> >>>> The long ones are somewhat bursty as you can see from looking at the line numbers in the GC log: >>>> >>>> $ egrep -n '(ParNew.*real=[1-9][0-9]\)' gc.log.0 >>>> >>>> 12300:2016-10-21T01:03:20.380+0000: 20278.069: [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew: 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K), 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs] >>>> 43730:2016-10-21T14:12:25.050+0000: 67622.740: [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew: 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K), 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs] >>>> 44079:2016-10-21T14:18:55.172+0000: 68012.862: [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew: 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K), 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs] >>>> 50151:2016-10-21T17:10:14.471+0000: 78292.160: [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew: 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K), 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs] >>>> 56073:2016-10-21T19:59:36.847+0000: 88454.536: [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew: 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K), 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs] >>>> 78987:2016-10-22T05:49:25.623+0000: 123843.312: [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew: 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K), 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs] >>>> 79104:2016-10-22T05:59:26.382+0000: 124444.071: [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew: 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K), 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs] >>>> 79504:2016-10-22T06:09:36.983+0000: 125054.672: [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew: 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K), 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs] >>>> 79772:2016-10-22T06:30:36.130+0000: 126313.819: [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew: 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K), 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs] >>>> 80087:2016-10-22T06:37:07.202+0000: 126704.891: [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew: 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K), 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs] >>>> 89969:2016-10-22T13:54:27.978+0000: 152945.667: [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew: 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K), 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs] >>>> 90200:2016-10-22T14:05:02.717+0000: 153580.407: [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew: 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K), 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs] >>>> 90299:2016-10-22T14:14:30.521+0000: 154148.210: [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew: 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K), 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs] >>>> 261329:2016-10-26T00:06:44.499+0000: 448882.189: [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew: 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K), 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs] >>>> 261935:2016-10-26T00:13:34.277+0000: 449291.967: [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew: 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K), 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs] >>>> 262143:2016-10-26T00:20:09.397+0000: 449687.087: [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew: 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K), 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs] >>>> 262275:2016-10-26T00:27:02.196+0000: 450099.886: [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew: 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K), 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs] >>>> 262282:2016-10-26T00:27:29.448+0000: 450127.138: [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew: 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K), 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs] >>>> 262631:2016-10-26T00:34:17.632+0000: 450535.321: [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew: 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K), 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs] >>>> 262844:2016-10-26T00:41:08.118+0000: 450945.808: [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew: 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K), 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs] >>>> 345421:2016-10-27T04:17:59.617+0000: 550357.306: [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew: 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K), 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs] >>>> 345510:2016-10-27T04:24:11.721+0000: 550729.411: [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew: 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K), 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs] >>>> 345514:2016-10-27T04:24:36.695+0000: 550754.385: [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew: 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K), 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs] >>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791: [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] >>>> >>>> Context around a single instance is fairly normal: >>>> >>>> 345773-2016-10-27T04:31:28.032+0000: 551165.721: [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew: 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K), 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs] >>>> 345774-2016-10-27T04:31:28.635+0000: 551166.324: [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew: 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K), 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs] >>>> 345775-2016-10-27T04:31:29.205+0000: 551166.894: [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew: 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K), 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs] >>>> 345776-2016-10-27T04:31:29.798+0000: 551167.487: [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew: 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K), 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs] >>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791: [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] >>>> 345778-2016-10-27T04:32:08.449+0000: 551206.139: [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew: 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K), 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs] >>>> 345779-2016-10-27T04:32:09.090+0000: 551206.779: [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew: 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K), 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs] >>>> 345780-2016-10-27T04:32:09.802+0000: 551207.491: [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew: 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K), 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs] >>>> 345781-2016-10-27T04:32:10.536+0000: 551208.226: [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew: 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K), 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs] >>>> 345782-2016-10-27T04:32:11.137+0000: 551208.826: [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew: 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K), 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs] >>>> 345783-2016-10-27T04:32:11.642+0000: 551209.332: [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew: 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K), 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs] >>>> >>>> Since the user times are high as well, I don't think this could be swapping. >>> >>> Can you ask the customer if they're using transparent hugepages (THP)? >>>> >>>> Here are the hard-earned set of JVM arguments that we're using: >>>> >>>> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \ >>>> -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \ >>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \ >>>> -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \ >>>> -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \ >>>> -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC -XX:+UseBiasedLocking \ >>>> -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \ >>>> -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \ >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages \ >>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags \ >>>> -XX:+UseGCLogFileRotation \ >>>> -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \ >>>> -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log >>>> >>>> This is on Linux with Java 1.7.0_72. >>>> >>>> Does this look familiar to anyone? Alternatively, are there some more JVM options that we could include to get more information? >>>> >>>> One of the first things that we'll try is to move to a later JVM, but it will be easier to get the customer to do that if we can point to a specific issue that has been addressed. >>>> >>>> Thanks for your help. >>>> >>>> David >>> >>> >>> -- >>> Sent from my phone >> > > > -- > Sent from my phone > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.ely at unboundid.com Sun Oct 30 14:14:38 2016 From: david.ely at unboundid.com (David Ely) Date: Sun, 30 Oct 2016 09:14:38 -0500 Subject: occasional ParNew times of 15+ seconds In-Reply-To: References: Message-ID: Thank you Vitaly and Charlie. We will have them disable THP, move to a later version of the JVM, and add in some additional GC logging JVM options. Looking more at the GC log, it appears that the long ParNew pauses only occur when the old generation usage is at least half of the distance between the live size and when CMS is triggered via CMSInitiatingOccupancyFraction. After a CMS collection, the long pauses stop. However, there are plenty of CMS cycles where we don't see any long pauses, and there are plenty of places where we promote the same amount of data associated with a long pause but don't experience a long pause. Is this behavior consistent with the THP diagnosis? David On Sat, Oct 29, 2016 at 6:15 PM, charlie hunt wrote: > +1 on disabling THP > > Charlie > > On Oct 29, 2016, at 10:07 AM, Vitaly Davidovich wrote: > > David, > > Ask them to turn off THP - it's a known source of large latency due to the > kernel doing page defragmentation; your app takes a page fault, and boom - > the kernel may start doing defragmentation to make a huge page available. > You can search online for THP issues. The symptoms are similar to yours - > very high sys time. > > If they turn it off and still get same lengthy parnew pauses, then it's > clearly something else but at least we'll eliminate THP as the culprit. > > On Saturday, October 29, 2016, David Ely wrote: > >> Thank you for the response. Yes. meminfo (see full output below) shows >> ~80GB of AnonHugePages, which is pretty close to the size of the JVM (full >> output below). Looking back through previous information that we have from >> this customer, transparent huge pages have been turned on for years. >> We've asked them for anything else that might have changed in this >> environment. >> >> Are there any other JVM options that we could enable that would shed >> light on what's going on within the ParNew? Would -XX:+PrintTLAB >> -XX:+PrintPLAB -XX:PrintFLSStatistics=1 show anything useful? >> >> David >> >> >> MemTotal: 264396572 kB >> MemFree: 2401576 kB >> Buffers: 381564 kB >> Cached: 172673120 kB >> SwapCached: 0 kB >> Active: 163439836 kB >> Inactive: 90737452 kB >> Active(anon): 76910848 kB >> Inactive(anon): 4212580 kB >> Active(file): 86528988 kB >> Inactive(file): 86524872 kB >> Unevictable: 0 kB >> Mlocked: 0 kB >> SwapTotal: 16236540 kB >> SwapFree: 16236540 kB >> Dirty: 14552 kB >> Writeback: 0 kB >> AnonPages: 81111768 kB >> Mapped: 31312 kB >> Shmem: 212 kB >> Slab: 6078732 kB >> SReclaimable: 5956052 kB >> SUnreclaim: 122680 kB >> KernelStack: 41296 kB >> PageTables: 171324 kB >> NFS_Unstable: 0 kB >> Bounce: 0 kB >> WritebackTmp: 0 kB >> CommitLimit: 148434824 kB >> Committed_AS: 93124984 kB >> VmallocTotal: 34359738367 kB >> VmallocUsed: 686780 kB >> VmallocChunk: 34225639420 kB >> HardwareCorrupted: 0 kB >> *AnonHugePages: 80519168 kB* >> HugePages_Total: 0 >> HugePages_Free: 0 >> HugePages_Rsvd: 0 >> HugePages_Surp: 0 >> Hugepagesize: 2048 kB >> DirectMap4k: 5132 kB >> DirectMap2M: 1957888 kB >> DirectMap1G: 266338304 kB >> >> >> On Fri, Oct 28, 2016 at 8:04 PM, Vitaly Davidovich >> wrote: >> >>> >>> >>> On Friday, October 28, 2016, David Ely wrote: >>> >>>> While typical ParNew GC times are 50ms, our application is occasionally >>>> hitting ParNew times that are over 15 seconds for one of our customers, and >>>> we have no idea why. Looking at the full GC log file: >>>> >>>> 382250 ParNew GCs are < 1 second >>>> 9303 are 100ms to 1 second >>>> 1267 are 1 second to 2 seconds >>>> 99 are 2 seconds to 10 seconds >>>> 24 are > 10 seconds, 48 seconds being the max >>>> >>>> The long ones are somewhat bursty as you can see from looking at the >>>> line numbers in the GC log: >>>> >>>> $ egrep -n '(ParNew.*real=[1-9][0-9]\)' gc.log.0 >>>> >>>> 12300:2016-10-21T01:03:20.380+0000: 20278.069: >>>> [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew: >>>> 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K), >>>> 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs] >>>> 43730:2016-10-21T14:12:25.050+0000: 67622.740: >>>> [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew: >>>> 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K), >>>> 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs] >>>> 44079:2016-10-21T14:18:55.172+0000: 68012.862: >>>> [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew: >>>> 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K), >>>> 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs] >>>> 50151:2016-10-21T17:10:14.471+0000: 78292.160: >>>> [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew: >>>> 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K), >>>> 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs] >>>> 56073:2016-10-21T19:59:36.847+0000: 88454.536: >>>> [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew: >>>> 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K), >>>> 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs] >>>> 78987:2016-10-22T05:49:25.623+0000: 123843.312: >>>> [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew: >>>> 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K), >>>> 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs] >>>> 79104:2016-10-22T05:59:26.382+0000: 124444.071: >>>> [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew: >>>> 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K), >>>> 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs] >>>> 79504:2016-10-22T06:09:36.983+0000: 125054.672: >>>> [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew: >>>> 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K), >>>> 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs] >>>> 79772:2016-10-22T06:30:36.130+0000: 126313.819: >>>> [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew: >>>> 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K), >>>> 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs] >>>> 80087:2016-10-22T06:37:07.202+0000: 126704.891: >>>> [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew: >>>> 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K), >>>> 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs] >>>> 89969:2016-10-22T13:54:27.978+0000: 152945.667: >>>> [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew: >>>> 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K), >>>> 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs] >>>> 90200:2016-10-22T14:05:02.717+0000: 153580.407: >>>> [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew: >>>> 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K), >>>> 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs] >>>> 90299:2016-10-22T14:14:30.521+0000: 154148.210: >>>> [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew: >>>> 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K), >>>> 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs] >>>> 261329:2016-10-26T00:06:44.499+0000: 448882.189: >>>> [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew: >>>> 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K), >>>> 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs] >>>> 261935:2016-10-26T00:13:34.277+0000: 449291.967: >>>> [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew: >>>> 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K), >>>> 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs] >>>> 262143:2016-10-26T00:20:09.397+0000: 449687.087: >>>> [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew: >>>> 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K), >>>> 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs] >>>> 262275:2016-10-26T00:27:02.196+0000: 450099.886: >>>> [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew: >>>> 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K), >>>> 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs] >>>> 262282:2016-10-26T00:27:29.448+0000: 450127.138: >>>> [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew: >>>> 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K), >>>> 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs] >>>> 262631:2016-10-26T00:34:17.632+0000: 450535.321: >>>> [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew: >>>> 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K), >>>> 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs] >>>> 262844:2016-10-26T00:41:08.118+0000: 450945.808: >>>> [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew: >>>> 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K), >>>> 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs] >>>> 345421:2016-10-27T04:17:59.617+0000: 550357.306: >>>> [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew: >>>> 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K), >>>> 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs] >>>> 345510:2016-10-27T04:24:11.721+0000: 550729.411: >>>> [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew: >>>> 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K), >>>> 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs] >>>> 345514:2016-10-27T04:24:36.695+0000: 550754.385: >>>> [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew: >>>> 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K), >>>> 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs] >>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791: >>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: >>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), >>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] >>>> >>>> Context around a single instance is fairly normal: >>>> >>>> 345773-2016-10-27T04:31:28.032+0000: 551165.721: >>>> [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew: >>>> 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K), >>>> 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs] >>>> 345774-2016-10-27T04:31:28.635+0000: 551166.324: >>>> [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew: >>>> 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K), >>>> 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs] >>>> 345775-2016-10-27T04:31:29.205+0000: 551166.894: >>>> [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew: >>>> 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K), >>>> 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs] >>>> 345776-2016-10-27T04:31:29.798+0000: 551167.487: >>>> [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew: >>>> 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K), >>>> 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs] >>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791: >>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: >>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), >>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] >>>> 345778-2016-10-27T04:32:08.449+0000: 551206.139: >>>> [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew: >>>> 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K), >>>> 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs] >>>> 345779-2016-10-27T04:32:09.090+0000: 551206.779: >>>> [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew: >>>> 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K), >>>> 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs] >>>> 345780-2016-10-27T04:32:09.802+0000: 551207.491: >>>> [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew: >>>> 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K), >>>> 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs] >>>> 345781-2016-10-27T04:32:10.536+0000: 551208.226: >>>> [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew: >>>> 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K), >>>> 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs] >>>> 345782-2016-10-27T04:32:11.137+0000: 551208.826: >>>> [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew: >>>> 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K), >>>> 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs] >>>> 345783-2016-10-27T04:32:11.642+0000: 551209.332: >>>> [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew: >>>> 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K), >>>> 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs] >>>> >>>> Since the user times are high as well, I don't think this could be >>>> swapping. >>>> >>> Can you ask the customer if they're using transparent hugepages (THP)? >>> >>>> >>>> Here are the hard-earned set of JVM arguments that we're using: >>>> >>>> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \ >>>> -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \ >>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled \ >>>> -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \ >>>> -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \ >>>> -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC >>>> -XX:+UseBiasedLocking \ >>>> -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \ >>>> -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \ >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar -XX:+UseLargePages >>>> \ >>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintCommandLineFlags >>>> \ >>>> -XX:+UseGCLogFileRotation \ >>>> -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \ >>>> -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log >>>> >>>> This is on Linux with Java 1.7.0_72. >>>> >>>> Does this look familiar to anyone? Alternatively, are there some more >>>> JVM options that we could include to get more information? >>>> >>>> One of the first things that we'll try is to move to a later JVM, but >>>> it will be easier to get the customer to do that if we can point to a >>>> specific issue that has been addressed. >>>> >>>> Thanks for your help. >>>> >>>> David >>>> >>> >>> >>> -- >>> Sent from my phone >>> >> >> > > -- > Sent from my phone > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Sun Oct 30 18:56:24 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Sun, 30 Oct 2016 14:56:24 -0400 Subject: occasional ParNew times of 15+ seconds In-Reply-To: References: Message-ID: On Sunday, October 30, 2016, David Ely wrote: > Thank you Vitaly and Charlie. We will have them disable THP, move to a > later version of the JVM, and add in some additional GC logging JVM options. > > Looking more at the GC log, it appears that the long ParNew pauses only > occur when the old generation usage is at least half of the distance > between the live size and when CMS is triggered via > CMSInitiatingOccupancyFraction. After a CMS collection, the long pauses > stop. However, there are plenty of CMS cycles where we don't see any long > pauses, and there are plenty of places where we promote the same amount of > data associated with a long pause but don't experience a long pause. > > Is this behavior consistent with the THP diagnosis? > The very high sys time is unusual for a parnew collection. THP defrag is one possible known cause of that. It's certainly possible there's something else going on, but turning off THP is a good start in troubleshooting; even if it's not the cause here, it may bite your customer later. A few more questions in the meantime: 1) are these parnew tails reproducible? 2) is this running on bare metal or VM? 3) what's the hardware spec? If you can have the customer disable THP without bumping the JVM version, it would help pinpoint the issue. But, I understand if you just want to fix the issue asap. > > David > > On Sat, Oct 29, 2016 at 6:15 PM, charlie hunt > wrote: > >> +1 on disabling THP >> >> Charlie >> >> On Oct 29, 2016, at 10:07 AM, Vitaly Davidovich > > wrote: >> >> David, >> >> Ask them to turn off THP - it's a known source of large latency due to >> the kernel doing page defragmentation; your app takes a page fault, and >> boom - the kernel may start doing defragmentation to make a huge page >> available. You can search online for THP issues. The symptoms are similar >> to yours - very high sys time. >> >> If they turn it off and still get same lengthy parnew pauses, then it's >> clearly something else but at least we'll eliminate THP as the culprit. >> >> On Saturday, October 29, 2016, David Ely > > wrote: >> >>> Thank you for the response. Yes. meminfo (see full output below) shows >>> ~80GB of AnonHugePages, which is pretty close to the size of the JVM (full >>> output below). Looking back through previous information that we have from >>> this customer, transparent huge pages have been turned on for years. >>> We've asked them for anything else that might have changed in this >>> environment. >>> >>> Are there any other JVM options that we could enable that would shed >>> light on what's going on within the ParNew? Would -XX:+PrintTLAB >>> -XX:+PrintPLAB -XX:PrintFLSStatistics=1 show anything useful? >>> >>> David >>> >>> >>> MemTotal: 264396572 kB >>> MemFree: 2401576 kB >>> Buffers: 381564 kB >>> Cached: 172673120 kB >>> SwapCached: 0 kB >>> Active: 163439836 kB >>> Inactive: 90737452 kB >>> Active(anon): 76910848 kB >>> Inactive(anon): 4212580 kB >>> Active(file): 86528988 kB >>> Inactive(file): 86524872 kB >>> Unevictable: 0 kB >>> Mlocked: 0 kB >>> SwapTotal: 16236540 kB >>> SwapFree: 16236540 kB >>> Dirty: 14552 kB >>> Writeback: 0 kB >>> AnonPages: 81111768 kB >>> Mapped: 31312 kB >>> Shmem: 212 kB >>> Slab: 6078732 kB >>> SReclaimable: 5956052 kB >>> SUnreclaim: 122680 kB >>> KernelStack: 41296 kB >>> PageTables: 171324 kB >>> NFS_Unstable: 0 kB >>> Bounce: 0 kB >>> WritebackTmp: 0 kB >>> CommitLimit: 148434824 kB >>> Committed_AS: 93124984 kB >>> VmallocTotal: 34359738367 kB >>> VmallocUsed: 686780 kB >>> VmallocChunk: 34225639420 kB >>> HardwareCorrupted: 0 kB >>> *AnonHugePages: 80519168 kB* >>> HugePages_Total: 0 >>> HugePages_Free: 0 >>> HugePages_Rsvd: 0 >>> HugePages_Surp: 0 >>> Hugepagesize: 2048 kB >>> DirectMap4k: 5132 kB >>> DirectMap2M: 1957888 kB >>> DirectMap1G: 266338304 kB >>> >>> >>> On Fri, Oct 28, 2016 at 8:04 PM, Vitaly Davidovich >>> wrote: >>> >>>> >>>> >>>> On Friday, October 28, 2016, David Ely wrote: >>>> >>>>> While typical ParNew GC times are 50ms, our application is >>>>> occasionally hitting ParNew times that are over 15 seconds for one of our >>>>> customers, and we have no idea why. Looking at the full GC log file: >>>>> >>>>> 382250 ParNew GCs are < 1 second >>>>> 9303 are 100ms to 1 second >>>>> 1267 are 1 second to 2 seconds >>>>> 99 are 2 seconds to 10 seconds >>>>> 24 are > 10 seconds, 48 seconds being the max >>>>> >>>>> The long ones are somewhat bursty as you can see from looking at the >>>>> line numbers in the GC log: >>>>> >>>>> $ egrep -n '(ParNew.*real=[1-9][0-9]\)' gc.log.0 >>>>> >>>>> 12300:2016-10-21T01:03:20.380+0000: 20278.069: >>>>> [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew: >>>>> 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K), >>>>> 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs] >>>>> 43730:2016-10-21T14:12:25.050+0000: 67622.740: >>>>> [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew: >>>>> 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K), >>>>> 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs] >>>>> 44079:2016-10-21T14:18:55.172+0000: 68012.862: >>>>> [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew: >>>>> 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K), >>>>> 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs] >>>>> 50151:2016-10-21T17:10:14.471+0000: 78292.160: >>>>> [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew: >>>>> 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K), >>>>> 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs] >>>>> 56073:2016-10-21T19:59:36.847+0000: 88454.536: >>>>> [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew: >>>>> 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K), >>>>> 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs] >>>>> 78987:2016-10-22T05:49:25.623+0000: 123843.312: >>>>> [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew: >>>>> 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K), >>>>> 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs] >>>>> 79104:2016-10-22T05:59:26.382+0000: 124444.071: >>>>> [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew: >>>>> 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K), >>>>> 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs] >>>>> 79504:2016-10-22T06:09:36.983+0000: 125054.672: >>>>> [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew: >>>>> 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K), >>>>> 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs] >>>>> 79772:2016-10-22T06:30:36.130+0000: 126313.819: >>>>> [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew: >>>>> 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K), >>>>> 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs] >>>>> 80087:2016-10-22T06:37:07.202+0000: 126704.891: >>>>> [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew: >>>>> 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K), >>>>> 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs] >>>>> 89969:2016-10-22T13:54:27.978+0000: 152945.667: >>>>> [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew: >>>>> 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K), >>>>> 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs] >>>>> 90200:2016-10-22T14:05:02.717+0000: 153580.407: >>>>> [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew: >>>>> 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K), >>>>> 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs] >>>>> 90299:2016-10-22T14:14:30.521+0000: 154148.210: >>>>> [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew: >>>>> 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K), >>>>> 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs] >>>>> 261329:2016-10-26T00:06:44.499+0000: 448882.189: >>>>> [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew: >>>>> 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K), >>>>> 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs] >>>>> 261935:2016-10-26T00:13:34.277+0000: 449291.967: >>>>> [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew: >>>>> 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K), >>>>> 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs] >>>>> 262143:2016-10-26T00:20:09.397+0000: 449687.087: >>>>> [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew: >>>>> 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K), >>>>> 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs] >>>>> 262275:2016-10-26T00:27:02.196+0000: 450099.886: >>>>> [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew: >>>>> 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K), >>>>> 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs] >>>>> 262282:2016-10-26T00:27:29.448+0000: 450127.138: >>>>> [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew: >>>>> 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K), >>>>> 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs] >>>>> 262631:2016-10-26T00:34:17.632+0000: 450535.321: >>>>> [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew: >>>>> 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K), >>>>> 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs] >>>>> 262844:2016-10-26T00:41:08.118+0000: 450945.808: >>>>> [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew: >>>>> 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K), >>>>> 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs] >>>>> 345421:2016-10-27T04:17:59.617+0000: 550357.306: >>>>> [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew: >>>>> 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K), >>>>> 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs] >>>>> 345510:2016-10-27T04:24:11.721+0000: 550729.411: >>>>> [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew: >>>>> 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K), >>>>> 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs] >>>>> 345514:2016-10-27T04:24:36.695+0000: 550754.385: >>>>> [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew: >>>>> 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K), >>>>> 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs] >>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791: >>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: >>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), >>>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] >>>>> >>>>> Context around a single instance is fairly normal: >>>>> >>>>> 345773-2016-10-27T04:31:28.032+0000: 551165.721: >>>>> [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew: >>>>> 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K), >>>>> 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs] >>>>> 345774-2016-10-27T04:31:28.635+0000: 551166.324: >>>>> [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew: >>>>> 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K), >>>>> 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs] >>>>> 345775-2016-10-27T04:31:29.205+0000: 551166.894: >>>>> [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew: >>>>> 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K), >>>>> 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs] >>>>> 345776-2016-10-27T04:31:29.798+0000: 551167.487: >>>>> [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew: >>>>> 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K), >>>>> 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs] >>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791: >>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: >>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), >>>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] >>>>> 345778-2016-10-27T04:32:08.449+0000: 551206.139: >>>>> [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew: >>>>> 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K), >>>>> 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs] >>>>> 345779-2016-10-27T04:32:09.090+0000: 551206.779: >>>>> [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew: >>>>> 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K), >>>>> 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs] >>>>> 345780-2016-10-27T04:32:09.802+0000: 551207.491: >>>>> [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew: >>>>> 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K), >>>>> 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs] >>>>> 345781-2016-10-27T04:32:10.536+0000: 551208.226: >>>>> [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew: >>>>> 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K), >>>>> 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs] >>>>> 345782-2016-10-27T04:32:11.137+0000: 551208.826: >>>>> [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew: >>>>> 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K), >>>>> 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs] >>>>> 345783-2016-10-27T04:32:11.642+0000: 551209.332: >>>>> [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew: >>>>> 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K), >>>>> 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs] >>>>> >>>>> Since the user times are high as well, I don't think this could be >>>>> swapping. >>>>> >>>> Can you ask the customer if they're using transparent hugepages (THP)? >>>> >>>>> >>>>> Here are the hard-earned set of JVM arguments that we're using: >>>>> >>>>> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \ >>>>> -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \ >>>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled >>>>> \ >>>>> -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \ >>>>> -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \ >>>>> -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC >>>>> -XX:+UseBiasedLocking \ >>>>> -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M \ >>>>> -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \ >>>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar >>>>> -XX:+UseLargePages \ >>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps >>>>> -XX:+PrintCommandLineFlags \ >>>>> -XX:+UseGCLogFileRotation \ >>>>> -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \ >>>>> -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log >>>>> >>>>> This is on Linux with Java 1.7.0_72. >>>>> >>>>> Does this look familiar to anyone? Alternatively, are there some more >>>>> JVM options that we could include to get more information? >>>>> >>>>> One of the first things that we'll try is to move to a later JVM, but >>>>> it will be easier to get the customer to do that if we can point to a >>>>> specific issue that has been addressed. >>>>> >>>>> Thanks for your help. >>>>> >>>>> David >>>>> >>>> >>>> >>>> -- >>>> Sent from my phone >>>> >>> >>> >> >> -- >> Sent from my phone >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.ely at unboundid.com Mon Oct 31 00:09:38 2016 From: david.ely at unboundid.com (David Ely) Date: Sun, 30 Oct 2016 19:09:38 -0500 Subject: occasional ParNew times of 15+ seconds In-Reply-To: References: Message-ID: Thanks again Vitaly. Responses inline. On Sun, Oct 30, 2016 at 1:56 PM, Vitaly Davidovich wrote: > > > On Sunday, October 30, 2016, David Ely wrote: > >> Thank you Vitaly and Charlie. We will have them disable THP, move to a >> later version of the JVM, and add in some additional GC logging JVM options. >> >> Looking more at the GC log, it appears that the long ParNew pauses only >> occur when the old generation usage is at least half of the distance >> between the live size and when CMS is triggered via >> CMSInitiatingOccupancyFraction. After a CMS collection, the long pauses >> stop. However, there are plenty of CMS cycles where we don't see any long >> pauses, and there are plenty of places where we promote the same amount of >> data associated with a long pause but don't experience a long pause. >> >> Is this behavior consistent with the THP diagnosis? >> > The very high sys time is unusual for a parnew collection. THP defrag is > one possible known cause of that. It's certainly possible there's > something else going on, but turning off THP is a good start in > troubleshooting; even if it's not the cause here, it may bite your customer > later. > The sys times are high, but they are not especially high relative to the user times. The ratio across all of the ParNew collections is about the same. > > A few more questions in the meantime: > > 1) are these parnew tails reproducible? > I believe so. They are seeing it on multiple systems. It seems to have gotten worse on the newer systems, which have 256GB of RAM compared to 96GB. > 2) is this running on bare metal or VM? > Bare metal. > 3) what's the hardware spec? > These specific pauses on hardware they acquired recently. Java sees 48 CPUs, and it has 256GB of RAM. > > If you can have the customer disable THP without bumping the JVM version, > it would help pinpoint the issue. But, I understand if you just want to > fix the issue asap. > Since they are seeing this on multiple systems, they should be able to have at least one where they only disable THP. They'll have to put these changes through their testing environment, so it might be a little while before I'll have an update. > > >> >> On Sat, Oct 29, 2016 at 6:15 PM, charlie hunt >> wrote: >> >>> +1 on disabling THP >>> >>> Charlie >>> >>> On Oct 29, 2016, at 10:07 AM, Vitaly Davidovich >>> wrote: >>> >>> David, >>> >>> Ask them to turn off THP - it's a known source of large latency due to >>> the kernel doing page defragmentation; your app takes a page fault, and >>> boom - the kernel may start doing defragmentation to make a huge page >>> available. You can search online for THP issues. The symptoms are similar >>> to yours - very high sys time. >>> >>> If they turn it off and still get same lengthy parnew pauses, then it's >>> clearly something else but at least we'll eliminate THP as the culprit. >>> >>> On Saturday, October 29, 2016, David Ely >>> wrote: >>> >>>> Thank you for the response. Yes. meminfo (see full output below) shows >>>> ~80GB of AnonHugePages, which is pretty close to the size of the JVM (full >>>> output below). Looking back through previous information that we have from >>>> this customer, transparent huge pages have been turned on for years. >>>> We've asked them for anything else that might have changed in this >>>> environment. >>>> >>>> Are there any other JVM options that we could enable that would shed >>>> light on what's going on within the ParNew? Would -XX:+PrintTLAB >>>> -XX:+PrintPLAB -XX:PrintFLSStatistics=1 show anything useful? >>>> >>>> David >>>> >>>> >>>> MemTotal: 264396572 kB >>>> MemFree: 2401576 kB >>>> Buffers: 381564 kB >>>> Cached: 172673120 kB >>>> SwapCached: 0 kB >>>> Active: 163439836 kB >>>> Inactive: 90737452 kB >>>> Active(anon): 76910848 kB >>>> Inactive(anon): 4212580 kB >>>> Active(file): 86528988 kB >>>> Inactive(file): 86524872 kB >>>> Unevictable: 0 kB >>>> Mlocked: 0 kB >>>> SwapTotal: 16236540 kB >>>> SwapFree: 16236540 kB >>>> Dirty: 14552 kB >>>> Writeback: 0 kB >>>> AnonPages: 81111768 kB >>>> Mapped: 31312 kB >>>> Shmem: 212 kB >>>> Slab: 6078732 kB >>>> SReclaimable: 5956052 kB >>>> SUnreclaim: 122680 kB >>>> KernelStack: 41296 kB >>>> PageTables: 171324 kB >>>> NFS_Unstable: 0 kB >>>> Bounce: 0 kB >>>> WritebackTmp: 0 kB >>>> CommitLimit: 148434824 kB >>>> Committed_AS: 93124984 kB >>>> VmallocTotal: 34359738367 kB >>>> VmallocUsed: 686780 kB >>>> VmallocChunk: 34225639420 kB >>>> HardwareCorrupted: 0 kB >>>> *AnonHugePages: 80519168 kB* >>>> HugePages_Total: 0 >>>> HugePages_Free: 0 >>>> HugePages_Rsvd: 0 >>>> HugePages_Surp: 0 >>>> Hugepagesize: 2048 kB >>>> DirectMap4k: 5132 kB >>>> DirectMap2M: 1957888 kB >>>> DirectMap1G: 266338304 kB >>>> >>>> >>>> On Fri, Oct 28, 2016 at 8:04 PM, Vitaly Davidovich >>>> wrote: >>>> >>>>> >>>>> >>>>> On Friday, October 28, 2016, David Ely >>>>> wrote: >>>>> >>>>>> While typical ParNew GC times are 50ms, our application is >>>>>> occasionally hitting ParNew times that are over 15 seconds for one of our >>>>>> customers, and we have no idea why. Looking at the full GC log file: >>>>>> >>>>>> 382250 ParNew GCs are < 1 second >>>>>> 9303 are 100ms to 1 second >>>>>> 1267 are 1 second to 2 seconds >>>>>> 99 are 2 seconds to 10 seconds >>>>>> 24 are > 10 seconds, 48 seconds being the max >>>>>> >>>>>> The long ones are somewhat bursty as you can see from looking at the >>>>>> line numbers in the GC log: >>>>>> >>>>>> $ egrep -n '(ParNew.*real=[1-9][0-9]\)' gc.log.0 >>>>>> >>>>>> 12300:2016-10-21T01:03:20.380+0000: 20278.069: >>>>>> [GC2016-10-21T01:03:20.380+0000: 20278.070: [ParNew: >>>>>> 1697741K->10024K(1887488K), 16.9913450 secs] 33979542K->32817239K(84724992K), >>>>>> 16.9921050 secs] [Times: user=541.32 sys=14.37, real=16.99 secs] >>>>>> 43730:2016-10-21T14:12:25.050+0000: 67622.740: >>>>>> [GC2016-10-21T14:12:25.051+0000: 67622.740: [ParNew: >>>>>> 1728194K->33817K(1887488K), 12.7508470 secs] 49737924K->48320707K(84724992K), >>>>>> 12.7517840 secs] [Times: user=405.89 sys=11.05, real=12.75 secs] >>>>>> 44079:2016-10-21T14:18:55.172+0000: 68012.862: >>>>>> [GC2016-10-21T14:18:55.173+0000: 68012.862: [ParNew: >>>>>> 1698371K->26958K(1887488K), 12.7384460 secs] 50339815K->48930730K(84724992K), >>>>>> 12.7392360 secs] [Times: user=406.58 sys=11.29, real=12.73 secs] >>>>>> 50151:2016-10-21T17:10:14.471+0000: 78292.160: >>>>>> [GC2016-10-21T17:10:14.471+0000: 78292.161: [ParNew: >>>>>> 1713813K->40968K(1887488K), 18.6593320 secs] 49366906K->47959129K(84724992K), >>>>>> 18.6602550 secs] [Times: user=590.03 sys=17.45, real=18.66 secs] >>>>>> 56073:2016-10-21T19:59:36.847+0000: 88454.536: >>>>>> [GC2016-10-21T19:59:36.847+0000: 88454.537: [ParNew: >>>>>> 1685720K->20763K(1887488K), 16.0840200 secs] 50704025K->49302131K(84724992K), >>>>>> 16.0848810 secs] [Times: user=487.00 sys=16.84, real=16.09 secs] >>>>>> 78987:2016-10-22T05:49:25.623+0000: 123843.312: >>>>>> [GC2016-10-22T05:49:25.623+0000: 123843.313: [ParNew: >>>>>> 1709771K->22678K(1887488K), 10.9933380 secs] 43323834K->41914203K(84724992K), >>>>>> 10.9943060 secs] [Times: user=349.67 sys=9.84, real=10.99 secs] >>>>>> 79104:2016-10-22T05:59:26.382+0000: 124444.071: >>>>>> [GC2016-10-22T05:59:26.382+0000: 124444.072: [ParNew: >>>>>> 1697024K->22260K(1887488K), 11.5490390 secs] 44558499K->43145880K(84724992K), >>>>>> 11.5499650 secs] [Times: user=367.73 sys=10.01, real=11.55 secs] >>>>>> 79504:2016-10-22T06:09:36.983+0000: 125054.672: >>>>>> [GC2016-10-22T06:09:36.984+0000: 125054.673: [ParNew: >>>>>> 1688112K->4769K(1887488K), 14.1528810 secs] 46684947K->45263748K(84724992K), >>>>>> 14.1539860 secs] [Times: user=452.28 sys=12.71, real=14.15 secs] >>>>>> 79772:2016-10-22T06:30:36.130+0000: 126313.819: >>>>>> [GC2016-10-22T06:30:36.130+0000: 126313.820: [ParNew: >>>>>> 1725520K->35893K(1887488K), 14.4479670 secs] 48989739K->47563879K(84724992K), >>>>>> 14.4488810 secs] [Times: user=461.60 sys=13.04, real=14.45 secs] >>>>>> 80087:2016-10-22T06:37:07.202+0000: 126704.891: >>>>>> [GC2016-10-22T06:37:07.202+0000: 126704.892: [ParNew: >>>>>> 1698021K->23440K(1887488K), 15.7039920 secs] 50517163K->49105987K(84724992K), >>>>>> 15.7050040 secs] [Times: user=497.65 sys=14.75, real=15.70 secs] >>>>>> 89969:2016-10-22T13:54:27.978+0000: 152945.667: >>>>>> [GC2016-10-22T13:54:27.978+0000: 152945.668: [ParNew: >>>>>> 1834914K->15978K(1887488K), 11.5637150 secs] 48716340K->47307673K(84724992K), >>>>>> 11.5645440 secs] [Times: user=367.77 sys=10.01, real=11.57 secs] >>>>>> 90200:2016-10-22T14:05:02.717+0000: 153580.407: >>>>>> [GC2016-10-22T14:05:02.718+0000: 153580.407: [ParNew: >>>>>> 1684626K->7078K(1887488K), 17.3424650 secs] 50361539K->48947648K(84724992K), >>>>>> 17.3433490 secs] [Times: user=554.39 sys=15.81, real=17.34 secs] >>>>>> 90299:2016-10-22T14:14:30.521+0000: 154148.210: >>>>>> [GC2016-10-22T14:14:30.521+0000: 154148.211: [ParNew: >>>>>> 1690850K->6078K(1887488K), 13.1699350 secs] 51455784K->50033156K(84724992K), >>>>>> 13.1708900 secs] [Times: user=419.55 sys=11.54, real=13.17 secs] >>>>>> 261329:2016-10-26T00:06:44.499+0000: 448882.189: >>>>>> [GC2016-10-26T00:06:44.500+0000: 448882.189: [ParNew: >>>>>> 1705614K->22224K(1887488K), 17.5831730 secs] 40683698K->39525817K(84724992K), >>>>>> 17.5843270 secs] [Times: user=561.85 sys=14.79, real=17.58 secs] >>>>>> 261935:2016-10-26T00:13:34.277+0000: 449291.967: >>>>>> [GC2016-10-26T00:13:34.278+0000: 449291.967: [ParNew: >>>>>> 1690085K->26707K(1887488K), 13.9331790 secs] 43792178K->42655000K(84724992K), >>>>>> 13.9340780 secs] [Times: user=446.36 sys=11.45, real=13.93 secs] >>>>>> 262143:2016-10-26T00:20:09.397+0000: 449687.087: >>>>>> [GC2016-10-26T00:20:09.398+0000: 449687.087: [ParNew: >>>>>> 1696593K->27078K(1887488K), 40.3344500 secs] 45588644K->44444949K(84724992K), >>>>>> 40.3355430 secs] [Times: user=1248.15 sys=43.07, real=40.33 secs] >>>>>> 262275:2016-10-26T00:27:02.196+0000: 450099.886: >>>>>> [GC2016-10-26T00:27:02.197+0000: 450099.886: [ParNew: >>>>>> 1683406K->17853K(1887488K), 17.7472360 secs] 46908499K->45506131K(84724992K), >>>>>> 17.7482260 secs] [Times: user=567.03 sys=16.10, real=17.75 secs] >>>>>> 262282:2016-10-26T00:27:29.448+0000: 450127.138: >>>>>> [GC2016-10-26T00:27:29.449+0000: 450127.138: [ParNew: >>>>>> 1687737K->10499K(1887488K), 35.4934000 secs] 47195678K->46044477K(84724992K), >>>>>> 35.4943230 secs] [Times: user=1131.34 sys=31.87, real=35.49 secs] >>>>>> 262631:2016-10-26T00:34:17.632+0000: 450535.321: >>>>>> [GC2016-10-26T00:34:17.632+0000: 450535.321: [ParNew: >>>>>> 1687590K->10226K(1887488K), 21.4043600 secs] 49431427K->48018504K(84724992K), >>>>>> 21.4052230 secs] [Times: user=682.50 sys=19.46, real=21.41 secs] >>>>>> 262844:2016-10-26T00:41:08.118+0000: 450945.808: >>>>>> [GC2016-10-26T00:41:08.119+0000: 450945.808: [ParNew: >>>>>> 1692928K->11302K(1887488K), 48.2899260 secs] 51073216K->49915878K(84724992K), >>>>>> 48.2909550 secs] [Times: user=1493.17 sys=53.55, real=48.28 secs] >>>>>> 345421:2016-10-27T04:17:59.617+0000: 550357.306: >>>>>> [GC2016-10-27T04:17:59.618+0000: 550357.307: [ParNew: >>>>>> 1695052K->22991K(1887488K), 33.8707510 secs] 46334738K->45187822K(84724992K), >>>>>> 33.8718980 secs] [Times: user=1081.31 sys=30.59, real=33.86 secs] >>>>>> 345510:2016-10-27T04:24:11.721+0000: 550729.411: >>>>>> [GC2016-10-27T04:24:11.722+0000: 550729.411: [ParNew: >>>>>> 1705080K->20401K(1887488K), 18.9795540 secs] 47388073K->45965537K(84724992K), >>>>>> 18.9805410 secs] [Times: user=606.94 sys=17.25, real=18.98 secs] >>>>>> 345514:2016-10-27T04:24:36.695+0000: 550754.385: >>>>>> [GC2016-10-27T04:24:36.696+0000: 550754.385: [ParNew: >>>>>> 1707810K->32640K(1887488K), 30.9728200 secs] 47656489K->46506725K(84724992K), >>>>>> 30.9737300 secs] [Times: user=917.67 sys=33.07, real=30.97 secs] >>>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791: >>>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: >>>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), >>>>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] >>>>>> >>>>>> Context around a single instance is fairly normal: >>>>>> >>>>>> 345773-2016-10-27T04:31:28.032+0000: 551165.721: >>>>>> [GC2016-10-27T04:31:28.033+0000: 551165.722: [ParNew: >>>>>> 1685858K->8851K(1887488K), 0.0480250 secs] 49545909K->47870050K(84724992K), >>>>>> 0.0490200 secs] [Times: user=1.47 sys=0.02, real=0.05 secs] >>>>>> 345774-2016-10-27T04:31:28.635+0000: 551166.324: >>>>>> [GC2016-10-27T04:31:28.636+0000: 551166.325: [ParNew: >>>>>> 1686675K->10456K(1887488K), 0.0463570 secs] 49547874K->47872545K(84724992K), >>>>>> 0.0473410 secs] [Times: user=1.41 sys=0.04, real=0.05 secs] >>>>>> 345775-2016-10-27T04:31:29.205+0000: 551166.894: >>>>>> [GC2016-10-27T04:31:29.205+0000: 551166.894: [ParNew: >>>>>> 1688280K->12733K(1887488K), 0.0487100 secs] 49550369K->47876404K(84724992K), >>>>>> 0.0496310 secs] [Times: user=1.47 sys=0.04, real=0.05 secs] >>>>>> 345776-2016-10-27T04:31:29.798+0000: 551167.487: >>>>>> [GC2016-10-27T04:31:29.798+0000: 551167.488: [ParNew: >>>>>> 1690557K->26694K(1887488K), 0.0471170 secs] 49554228K->47892320K(84724992K), >>>>>> 0.0481180 secs] [Times: user=1.40 sys=0.02, real=0.05 secs] >>>>>> 345777:2016-10-27T04:31:30.102+0000: 551167.791: >>>>>> [GC2016-10-27T04:31:30.102+0000: 551167.791: [ParNew: >>>>>> 1704518K->30860K(1887488K), 38.0976720 secs] 49570144K->48422333K(84724992K), >>>>>> 38.0984950 secs] [Times: user=1215.89 sys=34.79, real=38.09 secs] >>>>>> 345778-2016-10-27T04:32:08.449+0000: 551206.139: >>>>>> [GC2016-10-27T04:32:08.450+0000: 551206.139: [ParNew: >>>>>> 1708684K->122033K(1887488K), 0.0664280 secs] 50100157K->48528020K(84724992K), >>>>>> 0.0672860 secs] [Times: user=1.60 sys=0.05, real=0.07 secs] >>>>>> 345779-2016-10-27T04:32:09.090+0000: 551206.779: >>>>>> [GC2016-10-27T04:32:09.091+0000: 551206.780: [ParNew: >>>>>> 1799857K->42169K(1887488K), 0.0688910 secs] 50205844K->48541030K(84724992K), >>>>>> 0.0696110 secs] [Times: user=1.70 sys=0.03, real=0.07 secs] >>>>>> 345780-2016-10-27T04:32:09.802+0000: 551207.491: >>>>>> [GC2016-10-27T04:32:09.802+0000: 551207.491: [ParNew: >>>>>> 1719993K->43790K(1887488K), 0.0508540 secs] 50218854K->48542651K(84724992K), >>>>>> 0.0516000 secs] [Times: user=1.54 sys=0.03, real=0.05 secs] >>>>>> 345781-2016-10-27T04:32:10.536+0000: 551208.226: >>>>>> [GC2016-10-27T04:32:10.537+0000: 551208.226: [ParNew: >>>>>> 1721614K->30389K(1887488K), 0.0668100 secs] 50220475K->48545932K(84724992K), >>>>>> 0.0675470 secs] [Times: user=1.81 sys=0.03, real=0.06 secs] >>>>>> 345782-2016-10-27T04:32:11.137+0000: 551208.826: >>>>>> [GC2016-10-27T04:32:11.137+0000: 551208.826: [ParNew: >>>>>> 1708213K->18631K(1887488K), 0.0632570 secs] 50223756K->48540797K(84724992K), >>>>>> 0.0639650 secs] [Times: user=1.95 sys=0.01, real=0.06 secs] >>>>>> 345783-2016-10-27T04:32:11.642+0000: 551209.332: >>>>>> [GC2016-10-27T04:32:11.643+0000: 551209.332: [ParNew: >>>>>> 1696455K->19415K(1887488K), 0.0509260 secs] 50218621K->48545033K(84724992K), >>>>>> 0.0516780 secs] [Times: user=1.55 sys=0.03, real=0.05 secs] >>>>>> >>>>>> Since the user times are high as well, I don't think this could be >>>>>> swapping. >>>>>> >>>>> Can you ask the customer if they're using transparent hugepages (THP)? >>>>> >>>>>> >>>>>> Here are the hard-earned set of JVM arguments that we're using: >>>>>> >>>>>> -d64 -server -Xmx81g -Xms81g -XX:MaxNewSize=2g \ >>>>>> -XX:NewSize=2g -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled \ >>>>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled >>>>>> \ >>>>>> -XX:+CMSScavengeBeforeRemark -XX:RefDiscoveryPolicy=1 \ >>>>>> -XX:ParallelCMSThreads=12 -XX:CMSMaxAbortablePrecleanTime=3600000 \ >>>>>> -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC >>>>>> -XX:+UseBiasedLocking \ >>>>>> -XX:MaxTenuringThreshold=2 -XX:+UseCompressedOops -XX:PermSize=256M >>>>>> \ >>>>>> -XX:MaxPermSize=256M -XX:+HeapDumpOnOutOfMemoryError \ >>>>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseMembar >>>>>> -XX:+UseLargePages \ >>>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps >>>>>> -XX:+PrintCommandLineFlags \ >>>>>> -XX:+UseGCLogFileRotation \ >>>>>> -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=100m \ >>>>>> -Xloggc:${INSTANCE_ROOT}/logs/jvm/gc.log >>>>>> >>>>>> This is on Linux with Java 1.7.0_72. >>>>>> >>>>>> Does this look familiar to anyone? Alternatively, are there some more >>>>>> JVM options that we could include to get more information? >>>>>> >>>>>> One of the first things that we'll try is to move to a later JVM, but >>>>>> it will be easier to get the customer to do that if we can point to a >>>>>> specific issue that has been addressed. >>>>>> >>>>>> Thanks for your help. >>>>>> >>>>>> David >>>>>> >>>>> >>>>> >>>>> -- >>>>> Sent from my phone >>>>> >>>> >>>> >>> >>> -- >>> Sent from my phone >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> >> > > -- > Sent from my phone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From graphhopper at gmx.de Mon Oct 31 18:07:38 2016 From: graphhopper at gmx.de (Peter) Date: Mon, 31 Oct 2016 19:07:38 +0100 Subject: Big speed difference for G1 vs. parallel GC Message-ID: <0729fdcc-3597-2cf3-9806-21d7d85f3c7f@gmx.de> Hi, I've stumbled today* over a big speed difference for code execution with G1 GC vs. parallel GC also in the latest JDK8 (1.8.0_111-b14). Maybe you have interests to investigate this. You should be able to reproduce this via: # setup git clone https://github.com/graphhopper/graphhopper wget http://download.geofabrik.de/europe/germany/bayern-latest.osm.pbf cd graphhopper # run measurement export JAVA_OPTS="-XX:+UseParallelGC -Xmx1000m -Xms1000m" # the graphhopper.sh script just makes the installation of maven and bundling the jar a bit simpler # you can also execute the tests in the class Measurement.java ./graphhopper.sh clean ./graphhopper.sh measurement berlin-latest.osm.pbf # now a measurement-.properties is created: grep routing.mean measurement-XY.properties Now this should print a line where the value is in ms. E.g. I get ~450ms for the parallel GC and ~780ms for G1GC (on an old laptop). When I increase the Xmx for the G1 run to 1400m the results do NOT get closer to parallel GC! Let me know if you need more information! Regards Peter * https://github.com/graphhopper/graphhopper/issues/854 -- GraphHopper.com - fast and flexible route planning -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecki at zusammenkunft.net Mon Oct 31 19:52:06 2016 From: ecki at zusammenkunft.net (Bernd Eckenfels) Date: Mon, 31 Oct 2016 19:52:06 +0000 (UTC) Subject: Big speed difference for G1 vs. parallel GC In-Reply-To: <0729fdcc-3597-2cf3-9806-21d7d85f3c7f@gmx.de> References: <0729fdcc-3597-2cf3-9806-21d7d85f3c7f@gmx.de> Message-ID: <57A008FAA9E6E1E3.C9141F80-A575-4BDE-9C6B-548B87AAEF9A@mail.outlook.com> Hello, Since this is measuring a short workload after vom startup it might not be the best benchmark, but then again throughput GC is expected to be faster than G1. In the particular case however I guess you could tune G1 a bit to that workload. Did you check the verbose GC logs, and how many CPUs does Java see/use? Gruss Bernd -- http://bernd.eckenfels.net On Mon, Oct 31, 2016 at 8:38 PM +0100, "Peter" wrote: Hi, I've stumbled today* over a big speed difference for code execution with G1 GC vs. parallel GC also in the latest JDK8 (1.8.0_111-b14). Maybe you have interests to investigate this. You should be able to reproduce this via: # setup git clone https://github.com/graphhopper/graphhopper wget http://download.geofabrik.de/europe/germany/bayern-latest.osm.pbf cd graphhopper # run measurement export JAVA_OPTS="-XX:+UseParallelGC -Xmx1000m -Xms1000m" # the graphhopper.sh script just makes the installation of maven and bundling the jar a bit simpler # you can also execute the tests in the class Measurement.java ./graphhopper.sh clean ./graphhopper.sh measurement berlin-latest.osm.pbf # now a measurement-.properties is created: grep routing.mean measurement-XY.properties Now this should print a line where the value is in ms. E.g. I get ~450ms for the parallel GC and ~780ms for G1GC (on an old laptop). When I increase the Xmx for the G1 run to 1400m the results do NOT get closer to parallel GC! Let me know if you need more information! Regards Peter * https://github.com/graphhopper/graphhopper/issues/854 -- GraphHopper.com - fast and flexible route planning -------------- next part -------------- An HTML attachment was scrubbed... URL: From graphhopper at gmx.de Mon Oct 31 20:47:35 2016 From: graphhopper at gmx.de (Peter) Date: Mon, 31 Oct 2016 21:47:35 +0100 Subject: Big speed difference for G1 vs. parallel GC In-Reply-To: <57A008FAA9E6E1E3.C9141F80-A575-4BDE-9C6B-548B87AAEF9A@mail.outlook.com> References: <0729fdcc-3597-2cf3-9806-21d7d85f3c7f@gmx.de> <57A008FAA9E6E1E3.C9141F80-A575-4BDE-9C6B-548B87AAEF9A@mail.outlook.com> Message-ID: <0968f2eb-606c-2a62-a3d5-afde77bac47a@gmx.de> Hi Bernd, why do you think it is measuring a short workload? 'short' in which terms? The overall test suite takes roughly 3 minutes but can be increased easily via increasing the number of road routing queries. BTW: with routing.mean we measure the latency of every road routing query, at least I think so ;) > Did you check the verbose GC logs, and how many CPUs does Java see/use? Nothing suspicious in the GC logs IMO, except that G1 produces much more output. Still this reminded me of another mistake I made recently (not disabling swapping) and so I went to my dev server (instead of laptop) where this is already done and have more RAM there (32g), still using just 1000m and the results are a bit better: 320ms vs. only 235ms, so G1 is only ~25% slower. What differences are expected here ... let's say 'maximum'? BTW: CPU usage on the server is roughly 200-240% for G1 and 100-120% for the parallel GC, so the speedup might be also related to the CPUs as the laptop only has 2 cores without hyperthreading. Regards Peter On 31.10.2016 20:52, Bernd Eckenfels wrote: > Hello, > > Since this is measuring a short workload after vom startup it might > not be the best benchmark, but then again throughput GC is expected to > be faster than G1. > > In the particular case however I guess you could tune G1 a bit to that > workload. Did you check the verbose GC logs, and how many CPUs does > Java see/use? > > Gruss > Bernd > -- > http://bernd.eckenfels.net > > > > > On Mon, Oct 31, 2016 at 8:38 PM +0100, "Peter" > wrote: > > Hi, > > I've stumbled today* over a big speed difference for code > execution with G1 GC vs. parallel GC also in the latest JDK8 > (1.8.0_111-b14). Maybe you have interests to investigate this. You > should be able to reproduce this via: > > # setup > git clone https://github.com/graphhopper/graphhopper > wget http://download.geofabrik.de/europe/germany/bayern-latest.osm.pbf > cd graphhopper > > # run measurement > export JAVA_OPTS="-XX:+UseParallelGC -Xmx1000m -Xms1000m" > # the graphhopper.sh script just makes the installation of maven > and bundling the jar a bit simpler > # you can also execute the tests in the class Measurement.java > > ./graphhopper.sh clean > ./graphhopper.sh measurement berlin-latest.osm.pbf > # now a measurement-.properties is created: > grep routing.mean measurement-XY.properties > > Now this should print a line where the value is in ms. E.g. I get > ~450ms for the parallel GC and ~780ms for G1GC (on an old laptop). > When I increase the Xmx for the G1 run to 1400m the results do NOT > get closer to parallel GC! > > Let me know if you need more information! > > Regards > Peter > > * > https://github.com/graphhopper/graphhopper/issues/854 > > -- > GraphHopper.com - fast and flexible route planning > -- GraphHopper.com - fast and flexible route planning -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Mon Oct 31 23:04:16 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 31 Oct 2016 19:04:16 -0400 Subject: Big speed difference for G1 vs. parallel GC In-Reply-To: <0968f2eb-606c-2a62-a3d5-afde77bac47a@gmx.de> References: <0729fdcc-3597-2cf3-9806-21d7d85f3c7f@gmx.de> <57A008FAA9E6E1E3.C9141F80-A575-4BDE-9C6B-548B87AAEF9A@mail.outlook.com> <0968f2eb-606c-2a62-a3d5-afde77bac47a@gmx.de> Message-ID: G1 has more expensive GC write barriers - if you have a reference heavy heap with lots of mutation, it can add up. 20% more overhead for each barrier is a number I've heard before. On Monday, October 31, 2016, Peter wrote: > Hi Bernd, > > why do you think it is measuring a short workload? 'short' in which terms? > The overall test suite takes roughly 3 minutes but can be increased easily > via increasing the number of road routing queries. BTW: with routing.mean > we measure the latency of every road routing query, at least I think so ;) > > > Did you check the verbose GC logs, and how many CPUs does Java see/use? > > Nothing suspicious in the GC logs > IMO, > except that G1 produces much more output. Still this reminded me of another > mistake I made recently (not > disabling swapping) and so I went to my dev server (instead of laptop) > where this is already done and have more RAM there (32g), still using just > 1000m and the results are a bit better: 320ms vs. only 235ms, so G1 is only > ~25% slower. What differences are expected here ... let's say 'maximum'? > > BTW: CPU usage on the server is roughly 200-240% for G1 and 100-120% for > the parallel GC, so the speedup might be also related to the CPUs as the > laptop only has 2 cores without hyperthreading. > > Regards > Peter > > On 31.10.2016 20:52, Bernd Eckenfels wrote: > > Hello, > > Since this is measuring a short workload after vom startup it might not be > the best benchmark, but then again throughput GC is expected to be faster > than G1. > > In the particular case however I guess you could tune G1 a bit to that > workload. Did you check the verbose GC logs, and how many CPUs does Java > see/use? > > Gruss > Bernd > -- > http://bernd.eckenfels.net > > > > > On Mon, Oct 31, 2016 at 8:38 PM +0100, "Peter" > wrote: > > Hi, >> >> I've stumbled today* over a big speed difference for code execution with >> G1 GC vs. parallel GC also in the latest JDK8 (1.8.0_111-b14). Maybe you >> have interests to investigate this. You should be able to reproduce this >> via: >> >> # setup >> git clone https://github.com/graphhopper/graphhopper >> wget http://download.geofabrik.de/europe/germany/bayern-latest.osm.pbf >> cd graphhopper >> >> # run measurement >> export JAVA_OPTS="-XX:+UseParallelGC -Xmx1000m -Xms1000m" >> # the graphhopper.sh script just makes the installation of maven and >> bundling the jar a bit simpler >> # you can also execute the tests in the class Measurement.java >> >> ./graphhopper.sh clean >> ./graphhopper.sh measurement berlin-latest.osm.pbf >> # now a measurement-.properties is created: >> grep routing.mean measurement-XY.properties >> >> Now this should print a line where the value is in ms. E.g. I get ~450ms >> for the parallel GC and ~780ms for G1GC (on an old laptop). When I increase >> the Xmx for the G1 run to 1400m the results do NOT get closer to parallel >> GC! >> >> Let me know if you need more information! >> >> Regards >> Peter >> >> * >> https://github.com/graphhopper/graphhopper/issues/854 >> >> -- >> GraphHopper.com - fast and flexible route planning >> >> > > -- > GraphHopper.com - fast and flexible route planning > > -- Sent from my phone -------------- next part -------------- An HTML attachment was scrubbed... URL: