From yanping.wang at intel.com Fri Apr 3 05:29:55 2015 From: yanping.wang at intel.com (Wang, Yanping) Date: Fri, 3 Apr 2015 05:29:55 +0000 Subject: Mixed GC promotion issue Message-ID: <222E9E27A7469F4FA2D137F0724FBD37926C393C@ORSMSX105.amr.corp.intel.com> Hi, I have a GC log from 8 nodes cluster running 90% write and 10% Read YCSB to test HBase performance. JVM version: Java HotSpot(TM) 64-Bit Server VM (25.40-b25) for linux-amd64 JRE (1.8.0_40-b25), built on Feb 10 2015 21:29:53 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8) Command Line flags: -XX:+UseG1GC -Xms80g -Xmx80g -XX:+AlwaysPreTouch -XX:G1HeapWastePercent=20 -XX:MaxGCPauseMillis=100 -XX:ParallelGCThreads=48 -XX:ConcGCThreads=32 -XX:+ParallelRefProcEnabled The run had 2 Full GCs with GC pause 27 seconds and 23 seconds, which is pretty bad for latency sensitive HBase. I looked at the log and found: before both Full GCs, there were Mixed GC did some sort of size promotion, regardless heap usage was already over 90%. See attached one of the case below, this Mixed GC pause only took 0.57 seconds, but heap got 75.8G(80.0G)->78.5G(80.0G)] After that, the next Mixed GC hit "to-space exhausted" with 19 seconds pause, followed by Full GC that took 23 seconds. I am wondering why G1 still decided to promote size up from 75.8G with 80GB heap? Is it possible to decide don't promote if heap occupancy is already over 85%? Or 90%? ======================================================= 2015-03-12T21:12:51.326-0700: 17283.305: [GC pause (G1 Evacuation Pause) (mixed) 17283.305: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 65985, predicted base time: 51.83 ms, remaining time: 48.17 ms, target pause time: 100.00 ms] 17283.305: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 112 regions, survivors: 16 regions, predicted young region time: 60.86 ms] 17283.307: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: predicted time is too high, predicted time: 0.97 ms, remaining time: 0.00 ms, old: 242 regions, min: 242 regions] 17283.307: [G1Ergonomics (CSet Construction) added expensive regions to CSet, reason: old CSet region num not reached min, old: 242 regions, expensive: 242 regions, min: 242 regions, remaining time: 0.00 ms] 17283.307: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 112 regions, survivors: 16 regions, old: 242 regions, predicted pause time: 249.13 ms, target pause time: 100.00 ms] 17283.500: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: region allocation request failed, allocation request: 8870968 bytes] 17283.501: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 8870968 bytes, attempted expansion amount: 33554432 bytes] 17283.501: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap already fully expanded] 17283.723: [SoftReference, 0 refs, 0.0029074 secs]17283.726: [WeakReference, 53 refs, 0.0013187 secs]17283.727: [FinalReference, 2758 refs, 0.0186603 secs]17283.746: [PhantomReference, 0 refs, 20 refs, 0.0054745 secs]17283.752: [JNI Weak Reference, 0.0000076 secs] 17283.880: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 84087406592 bytes, allocation request: 0 bytes, threshold: 38654705655 bytes (45.00 %), source: end of GC] 17283.881: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 1692 regions, reclaimable: 21320699264 bytes (24.82 %), threshold: 20.00 %] (to-space exhausted), 0.5759340 secs] [Parallel Time: 414.1 ms, GC Workers: 48] [GC Worker Start (ms): Min: 17283307.7, Avg: 17283309.1, Max: 17283309.9, Diff: 2.2] [Ext Root Scanning (ms): Min: 0.0, Avg: 0.4, Max: 1.7, Diff: 1.7, Sum: 18.4] [Update RS (ms): Min: 9.2, Avg: 9.9, Max: 11.7, Diff: 2.5, Sum: 474.6] [Processed Buffers: Min: 5, Avg: 8.1, Max: 15, Diff: 10, Sum: 391] [Scan RS (ms): Min: 62.5, Avg: 64.8, Max: 72.5, Diff: 9.9, Sum: 3111.1] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Object Copy (ms): Min: 328.9, Avg: 336.6, Max: 337.9, Diff: 9.0, Sum: 16156.0] [Termination (ms): Min: 0.0, Avg: 0.4, Max: 0.7, Diff: 0.7, Sum: 19.6] [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 3.2] [GC Worker Total (ms): Min: 411.2, Avg: 412.1, Max: 413.5, Diff: 2.2, Sum: 19782.8] [GC Worker End (ms): Min: 17283721.2, Avg: 17283721.2, Max: 17283721.3, Diff: 0.2] [Code Root Fixup: 0.2 ms] [Code Root Purge: 0.0 ms] [Clear CT: 5.6 ms] [Other: 156.1 ms] [Evacuation Failure: 108.7 ms] [Choose CSet: 2.5 ms] [Ref Proc: 29.6 ms] [Ref Enq: 2.2 ms] [Redirty Cards: 9.0 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 3.4 ms] [Eden: 3584.0M(3584.0M)->0.0B(3584.0M) Survivors: 512.0M->512.0M Heap: 75.8G(80.0G)->78.5G(80.0G)] [Times: user=14.41 sys=0.16, real=0.57 secs] ==================================================== Thanks -yanping -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Fri Apr 3 15:49:13 2015 From: yu.zhang at oracle.com (Yu Zhang) Date: Fri, 03 Apr 2015 08:49:13 -0700 Subject: Mixed GC promotion issue In-Reply-To: <222E9E27A7469F4FA2D137F0724FBD37926C393C@ORSMSX105.amr.corp.intel.com> References: <222E9E27A7469F4FA2D137F0724FBD37926C393C@ORSMSX105.amr.corp.intel.com> Message-ID: <551EB679.5060509@oracle.com> Yanping, you can try 2 things: -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=1 The default NewSizePercent is 5, which might be too high for big heap. in crease *-XX:G1MixedGCCountTarget=<8> *to reduce the number of expensive regions added to cset. Thanks, Jenny On 4/2/2015 10:29 PM, Wang, Yanping wrote: > > Hi, > > I have a GC log from 8 nodes cluster running 90% write and 10% Read > YCSB to test HBase performance. > > JVM version: Java HotSpot(TM) 64-Bit Server VM (25.40-b25) for > linux-amd64 JRE (1.8.0_40-b25), built on Feb 10 2015 21:29:53 by > "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8) > > Command Line flags: -XX:+UseG1GC ?Xms80g ?Xmx80g -XX:+AlwaysPreTouch > -XX:G1HeapWastePercent=20 -XX:MaxGCPauseMillis=100 > -XX:ParallelGCThreads=48 -XX:ConcGCThreads=32 -XX:+ParallelRefProcEnabled > > The run had 2 Full GCs with GC pause 27 seconds and 23 seconds, which > is pretty bad for latency sensitive HBase. > > I looked at the log and found: before both Full GCs, there were Mixed > GC did some sort of size promotion, regardless heap usage was already > over 90%. > > See attached one of the case below, this Mixed GC pause only took 0.57 > seconds, but heap got 75.8G(80.0G)->78.5G(80.0G)] > > After that, the next Mixed GC hit ?to-space exhausted? with 19 seconds > pause, followed by Full GC that took 23 seconds. > > I am wondering why G1 still decided to promote size up from 75.8G with > 80GB heap? Is it possible to decide don?t promote if heap occupancy is > already over 85%? Or 90%? > > ======================================================= > > 2015-03-12T21:12:51.326-0700: 17283.305: [GC pause (G1 Evacuation > Pause) (mixed) 17283.305: [G1Ergonomics (CSet Construction) start > choosing CSet, _pending_cards: 65985, predicted base time: 51.83 ms, > remaining time: 48.17 ms, target pause time: 100.00 ms] > > 17283.305: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 112 regions, survivors: 16 regions, predicted young region > time: 60.86 ms] > > 17283.307: [G1Ergonomics (CSet Construction) finish adding old regions > to CSet, reason: predicted time is too high, predicted time: 0.97 ms, > remaining time: 0.00 ms, old: 242 regions, min: 242 regions] > > 17283.307: [G1Ergonomics (CSet Construction) added expensive regions > to CSet, reason: old CSet region num not reached min, old: 242 > regions, expensive: 242 regions, min: 242 regions, remaining time: > 0.00 ms] > > 17283.307: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 112 regions, survivors: 16 regions, old: 242 regions, predicted > pause time: 249.13 ms, target pause time: 100.00 ms] > > 17283.500: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > region allocation request failed, allocation request: 8870968 bytes] > > 17283.501: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 8870968 bytes, attempted expansion amount: 33554432 > bytes] > > 17283.501: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap already fully expanded] > > 17283.723: [SoftReference, 0 refs, 0.0029074 secs]17283.726: > [WeakReference, 53 refs, 0.0013187 secs]17283.727: [FinalReference, > 2758 refs, 0.0186603 secs]17283.746: [PhantomReference, 0 refs, 20 > refs, 0.0054745 secs]17283.752: [JNI Weak Reference, 0.0000076 secs] > 17283.880: [G1Ergonomics (Concurrent Cycles) do not request concurrent > cycle initiation, reason: still doing mixed collections, occupancy: > 84087406592 bytes, allocation request: 0 bytes, threshold: 38654705655 > bytes (45.00 %), source: end of GC] > > 17283.881: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: > candidate old regions available, candidate old regions: 1692 regions, > reclaimable: 21320699264 bytes (24.82 %), threshold: 20.00 %] > > (to-space exhausted), 0.5759340 secs] > > [Parallel Time: 414.1 ms, GC Workers: 48] > > [GC Worker Start (ms): Min: 17283307.7, Avg: 17283309.1, Max: > 17283309.9, Diff: 2.2] > > [Ext Root Scanning (ms): Min: 0.0, Avg: 0.4, Max: 1.7, Diff: > 1.7, Sum: 18.4] > > [Update RS (ms): Min: 9.2, Avg: 9.9, Max: 11.7, Diff: 2.5, Sum: > 474.6] > > [Processed Buffers: Min: 5, Avg: 8.1, Max: 15, Diff: 10, Sum: > 391] > > [Scan RS (ms): Min: 62.5, Avg: 64.8, Max: 72.5, Diff: 9.9, Sum: > 3111.1] > > [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: > 0.0, Sum: 0.1] > > [Object Copy (ms): Min: 328.9, Avg: 336.6, Max: 337.9, Diff: > 9.0, Sum: 16156.0] > > [Termination (ms): Min: 0.0, Avg: 0.4, Max: 0.7, Diff: 0.7, Sum: > 19.6] > > [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, > Sum: 3.2] > > [GC Worker Total (ms): Min: 411.2, Avg: 412.1, Max: 413.5, Diff: > 2.2, Sum: 19782.8] > > [GC Worker End (ms): Min: 17283721.2, Avg: 17283721.2, Max: > 17283721.3, Diff: 0.2] > > [Code Root Fixup: 0.2 ms] > > [Code Root Purge: 0.0 ms] > > [Clear CT: 5.6 ms] > > [Other: 156.1 ms] > > [Evacuation Failure: 108.7 ms] > > [Choose CSet: 2.5 ms] > > [Ref Proc: 29.6 ms] > > [Ref Enq: 2.2 ms] > > [Redirty Cards: 9.0 ms] > > [Humongous Reclaim: 0.0 ms] > > [Free CSet: 3.4 ms] > > [Eden: 3584.0M(3584.0M)->0.0B(3584.0M) Survivors: 512.0M->512.0M > Heap: 75.8G(80.0G)->78.5G(80.0G)] > > [Times: user=14.41 sys=0.16, real=0.57 secs] > > ==================================================== > > Thanks > > -yanping > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From yanping.wang at intel.com Fri Apr 3 16:28:26 2015 From: yanping.wang at intel.com (Wang, Yanping) Date: Fri, 3 Apr 2015 16:28:26 +0000 Subject: Mixed GC promotion issue In-Reply-To: <551EB679.5060509@oracle.com> References: <222E9E27A7469F4FA2D137F0724FBD37926C393C@ORSMSX105.amr.corp.intel.com> <551EB679.5060509@oracle.com> Message-ID: <222E9E27A7469F4FA2D137F0724FBD37926C3AC4@ORSMSX105.amr.corp.intel.com> Hi, Jenny I can cover this issue by tuning G1 flags as you instructed. But I am wondering: (1) how many engineers out there have the knowledge and willing to hand craft G1 with experimental G1 flags? (2) Is this a G1 issue? If it is, for long run, why not fix it in G1 itself? Below is another example: from same JVM version and same parameters: Command Line flags: -XX:+UseG1GC -Xms80g -Xmx80g -XX:+AlwaysPreTouch -XX:G1HeapWastePercent=20 -XX:MaxGCPauseMillis=100 -XX:ParallelGCThreads=48 -XX:ConcGCThreads=32 -XX:+ParallelRefProcEnabled Look at these two Mixed GCs: the first took 3.21 seconds to complete, with Heap: Heap: 76.6G(80.0G)->79.1G(80.0G) That 76.6G is already 95.75% out of 80GB total heap. G1 has all the info it needs to decide and try to avoid Full-GC, but why does G1 still promote? Does G1 not know 79.1GB is so close to 80GB total heap side, and is guaranteed to have Full GC next? The second Mixed GC took over 15 seconds to complete, followed up by a Full-GC that took over 27 seconds. In this Full-GC, we can see the heap: Heap: 80.0G(80.0G)->45.9G(80.0G)]. That means this workload's live data size is only around 46GB, and there is no humongous object allocation at all. Why did G1 still lost the race? ===================================================== 2015-03-12T21:08:01.385-0700: 16993.364: [GC pause (G1 Evacuation Pause) (mixed) 16993.364: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 57024, predicted base time: 53.85 ms, remaining time: 46.15 ms, target pause time: 100.00 ms] 16993.364: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 112 regions, survivors: 16 regions, predicted young region time: 59.86 ms] 16993.370: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: predicted time is too high, predicted time: 1.50 ms, remaining time: 0.00 ms, old: 239 regions, min: 239 regions] 16993.370: [G1Ergonomics (CSet Construction) added expensive regions to CSet, reason: old CSet region num not reached min, old: 239 regions, expensive: 239 regions, min: 239 regions, remaining time: 0.00 ms] 16993.370: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 112 regions, survivors: 16 regions, old: 239 regions, predicted pause time: 327.87 ms, target pause time: 100.00 ms] 16993.614: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: region allocation request failed, allocation request: 4317952 bytes] 16993.614: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 4317952 bytes, attempted expansion amount: 33554432 bytes] 16993.614: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap already fully expanded] 16996.219: [SoftReference, 0 refs, 0.0072010 secs]16996.227: [WeakReference, 82 refs, 0.0047163 secs]16996.231: [FinalReference, 4367 refs, 0.0427568 secs]16996.274: [PhantomReference, 0 refs, 21 refs, 0.0097792 secs]16996.284: [JNI Weak Reference, 0.0000225 secs] 16996.578: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 84355842048 bytes, allocation request: 0 bytes, threshold: 38654705655 bytes (45.00 %), source: end of GC] 16996.578: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 1667 regions, reclaimable: 19753877136 bytes (23.00 %), threshold: 20.00 %] (to-space exhausted), 3.2139606 secs] [Parallel Time: 2846.1 ms, GC Workers: 48] [GC Worker Start (ms): Min: 16993370.6, Avg: 16993371.2, Max: 16993371.9, Diff: 1.3] [Ext Root Scanning (ms): Min: 0.1, Avg: 0.6, Max: 1.4, Diff: 1.3, Sum: 31.1] [Update RS (ms): Min: 8.6, Avg: 9.2, Max: 10.5, Diff: 1.9, Sum: 441.6] [Processed Buffers: Min: 4, Avg: 7.5, Max: 13, Diff: 9, Sum: 361] [Scan RS (ms): Min: 121.2, Avg: 122.0, Max: 122.9, Diff: 1.8, Sum: 5857.1] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Object Copy (ms): Min: 2709.2, Avg: 2710.5, Max: 2712.3, Diff: 3.1, Sum: 130104.8] [Termination (ms): Min: 0.0, Avg: 1.4, Max: 2.1, Diff: 2.1, Sum: 69.1] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.1, Sum: 1.9] [GC Worker Total (ms): Min: 2843.2, Avg: 2843.9, Max: 2844.5, Diff: 1.4, Sum: 136505.7] [GC Worker End (ms): Min: 16996215.1, Avg: 16996215.1, Max: 16996215.2, Diff: 0.1] [Code Root Fixup: 0.4 ms] [Code Root Purge: 0.0 ms] [Clear CT: 7.1 ms] [Other: 360.4 ms] [Evacuation Failure: 265.4 ms] [Choose CSet: 6.1 ms] [Ref Proc: 66.8 ms] [Ref Enq: 3.2 ms] [Redirty Cards: 11.5 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 6.7 ms] [Eden: 3584.0M(3584.0M)->0.0B(3584.0M) Survivors: 512.0M->512.0M Heap: 76.6G(80.0G)->79.1G(80.0G)] [Times: user=21.01 sys=1.04, real=3.21 secs] 2015-03-12T21:08:06.071-0700: 16998.050: [GC pause (G1 Evacuation Pause) (mixed) 16998.050: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 56280, predicted base time: 53.72 ms, remaining time: 46.28 ms, target pause time: 100.00 ms] 16998.050: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 30 regions, survivors: 16 regions, predicted young region time: 28.94 ms] 16998.060: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: reclaimable percentage not over threshold, old: 154 regions, max: 256 regions, reclaimable: 17164740528 bytes (19.98 %), threshold: 20.00 %] 16998.060: [G1Ergonomics (CSet Construction) added expensive regions to CSet, reason: old CSet region num not reached min, old: 154 regions, expensive: 143 regions, min: 239 regions, remaining time: 0.00 ms] 16998.060: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 30 regions, survivors: 16 regions, old: 154 regions, predicted pause time: 322.12 ms, target pause time: 100.00 ms] 16998.061: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: region allocation request failed, allocation request: 16777208 bytes] 16998.061: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 16777208 bytes, attempted expansion amount: 33554432 bytes] 16998.061: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap already fully expanded] 17012.741: [SoftReference, 0 refs, 0.0077200 secs]17012.749: [WeakReference, 0 refs, 0.0059451 secs]17012.755: [FinalReference, 0 refs, 0.0047808 secs]17012.760: [PhantomReference, 0 refs, 0 refs, 0.0117597 secs]17012.771: [JNI Weak Reference, 0.0000172 secs] 17013.701: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 85899345920 bytes, allocation request: 0 bytes, threshold: 38654705655 bytes (45.00 %), source: end of GC] 17013.701: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 1513 regions, reclaimable: 17164740528 bytes (19.98 %), threshold: 20.00 %] (to-space exhausted), 15.6508656 secs] [Parallel Time: 14676.7 ms, GC Workers: 48] [GC Worker Start (ms): Min: 16998061.0, Avg: 16998061.6, Max: 16998062.1, Diff: 1.1] [Ext Root Scanning (ms): Min: 0.2, Avg: 0.5, Max: 1.3, Diff: 1.1, Sum: 23.6] [Update RS (ms): Min: 0.0, Avg: 6.4, Max: 12.7, Diff: 12.7, Sum: 306.5] [Processed Buffers: Min: 0, Avg: 7.9, Max: 25, Diff: 25, Sum: 377] [Scan RS (ms): Min: 112.8, Avg: 215.2, Max: 275.5, Diff: 162.8, Sum: 10331.2] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.3, Diff: 0.3, Sum: 0.7] [Object Copy (ms): Min: 14384.2, Avg: 14451.0, Max: 14560.6, Diff: 176.4, Sum: 693647.4] [Termination (ms): Min: 0.0, Avg: 1.0, Max: 1.8, Diff: 1.8, Sum: 47.7] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.9] [GC Worker Total (ms): Min: 14673.6, Avg: 14674.1, Max: 14674.7, Diff: 1.1, Sum: 704358.0] [GC Worker End (ms): Min: 17012735.7, Avg: 17012735.7, Max: 17012735.8, Diff: 0.1] [Code Root Fixup: 0.5 ms] [Code Root Purge: 0.0 ms] [Clear CT: 5.9 ms] [Other: 967.7 ms] [Evacuation Failure: 894.2 ms] [Choose CSet: 10.0 ms] [Ref Proc: 33.2 ms] [Ref Enq: 4.5 ms] [Redirty Cards: 15.1 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 9.7 ms] [Eden: 960.0M(3584.0M)->0.0B(4096.0M) Survivors: 512.0M->0.0B Heap: 80.0G(80.0G)->80.0G(80.0G)] [Times: user=32.91 sys=5.11, real=15.64 secs] a few Young GC with 80G heap later, Full-GC happened: 2015-03-12T21:08:22.363-0700: 17014.342: [Full GC (Allocation Failure) 17022.628: [SoftReference, 776 refs, 0.0001476 secs]17022.628: [WeakReference, 9010 refs, 0.0016977 secs]17022.630: [FinalReference, 3058 refs, 0.0016888 secs]17022.631: [PhantomReference, 0 refs, 2095 refs, 0.0002522 secs]17022.632: [JNI Weak Reference, 0.0000101 secs] 80G->45G(80G), 27.6925446 secs] [Eden: 0.0B(4096.0M)->0.0B(4096.0M) Survivors: 0.0B->0.0B Heap: 80.0G(80.0G)->45.9G(80.0G)], [Metaspace: 49643K->49563K(51200K)] [Times: user=58.11 sys=0.30, real=27.69 secs] 2015-03-12T21:08:50.057-0700: 17042.036: [GC concurrent-mark-abort] =================================================== Thanks -yanping From: Yu Zhang [mailto:yu.zhang at oracle.com] Sent: Friday, April 03, 2015 8:49 AM To: Wang, Yanping; 'hotspot-gc-use at openjdk.java.net' Subject: Re: Mixed GC promotion issue Yanping, you can try 2 things: -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=1 The default NewSizePercent is 5, which might be too high for big heap. in crease -XX:G1MixedGCCountTarget=<8> to reduce the number of expensive regions added to cset. Thanks, Jenny On 4/2/2015 10:29 PM, Wang, Yanping wrote: Hi, I have a GC log from 8 nodes cluster running 90% write and 10% Read YCSB to test HBase performance. JVM version: Java HotSpot(TM) 64-Bit Server VM (25.40-b25) for linux-amd64 JRE (1.8.0_40-b25), built on Feb 10 2015 21:29:53 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8) Command Line flags: -XX:+UseG1GC -Xms80g -Xmx80g -XX:+AlwaysPreTouch -XX:G1HeapWastePercent=20 -XX:MaxGCPauseMillis=100 -XX:ParallelGCThreads=48 -XX:ConcGCThreads=32 -XX:+ParallelRefProcEnabled The run had 2 Full GCs with GC pause 27 seconds and 23 seconds, which is pretty bad for latency sensitive HBase. I looked at the log and found: before both Full GCs, there were Mixed GC did some sort of size promotion, regardless heap usage was already over 90%. See attached one of the case below, this Mixed GC pause only took 0.57 seconds, but heap got 75.8G(80.0G)->78.5G(80.0G)] After that, the next Mixed GC hit "to-space exhausted" with 19 seconds pause, followed by Full GC that took 23 seconds. I am wondering why G1 still decided to promote size up from 75.8G with 80GB heap? Is it possible to decide don't promote if heap occupancy is already over 85%? Or 90%? ======================================================= 2015-03-12T21:12:51.326-0700: 17283.305: [GC pause (G1 Evacuation Pause) (mixed) 17283.305: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 65985, predicted base time: 51.83 ms, remaining time: 48.17 ms, target pause time: 100.00 ms] 17283.305: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 112 regions, survivors: 16 regions, predicted young region time: 60.86 ms] 17283.307: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: predicted time is too high, predicted time: 0.97 ms, remaining time: 0.00 ms, old: 242 regions, min: 242 regions] 17283.307: [G1Ergonomics (CSet Construction) added expensive regions to CSet, reason: old CSet region num not reached min, old: 242 regions, expensive: 242 regions, min: 242 regions, remaining time: 0.00 ms] 17283.307: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 112 regions, survivors: 16 regions, old: 242 regions, predicted pause time: 249.13 ms, target pause time: 100.00 ms] 17283.500: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: region allocation request failed, allocation request: 8870968 bytes] 17283.501: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 8870968 bytes, attempted expansion amount: 33554432 bytes] 17283.501: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap already fully expanded] 17283.723: [SoftReference, 0 refs, 0.0029074 secs]17283.726: [WeakReference, 53 refs, 0.0013187 secs]17283.727: [FinalReference, 2758 refs, 0.0186603 secs]17283.746: [PhantomReference, 0 refs, 20 refs, 0.0054745 secs]17283.752: [JNI Weak Reference, 0.0000076 secs] 17283.880: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 84087406592 bytes, allocation request: 0 bytes, threshold: 38654705655 bytes (45.00 %), source: end of GC] 17283.881: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 1692 regions, reclaimable: 21320699264 bytes (24.82 %), threshold: 20.00 %] (to-space exhausted), 0.5759340 secs] [Parallel Time: 414.1 ms, GC Workers: 48] [GC Worker Start (ms): Min: 17283307.7, Avg: 17283309.1, Max: 17283309.9, Diff: 2.2] [Ext Root Scanning (ms): Min: 0.0, Avg: 0.4, Max: 1.7, Diff: 1.7, Sum: 18.4] [Update RS (ms): Min: 9.2, Avg: 9.9, Max: 11.7, Diff: 2.5, Sum: 474.6] [Processed Buffers: Min: 5, Avg: 8.1, Max: 15, Diff: 10, Sum: 391] [Scan RS (ms): Min: 62.5, Avg: 64.8, Max: 72.5, Diff: 9.9, Sum: 3111.1] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Object Copy (ms): Min: 328.9, Avg: 336.6, Max: 337.9, Diff: 9.0, Sum: 16156.0] [Termination (ms): Min: 0.0, Avg: 0.4, Max: 0.7, Diff: 0.7, Sum: 19.6] [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 3.2] [GC Worker Total (ms): Min: 411.2, Avg: 412.1, Max: 413.5, Diff: 2.2, Sum: 19782.8] [GC Worker End (ms): Min: 17283721.2, Avg: 17283721.2, Max: 17283721.3, Diff: 0.2] [Code Root Fixup: 0.2 ms] [Code Root Purge: 0.0 ms] [Clear CT: 5.6 ms] [Other: 156.1 ms] [Evacuation Failure: 108.7 ms] [Choose CSet: 2.5 ms] [Ref Proc: 29.6 ms] [Ref Enq: 2.2 ms] [Redirty Cards: 9.0 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 3.4 ms] [Eden: 3584.0M(3584.0M)->0.0B(3584.0M) Survivors: 512.0M->512.0M Heap: 75.8G(80.0G)->78.5G(80.0G)] [Times: user=14.41 sys=0.16, real=0.57 secs] ==================================================== Thanks -yanping _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlie.hunt at oracle.com Fri Apr 3 13:46:09 2015 From: charlie.hunt at oracle.com (charlie hunt) Date: Fri, 3 Apr 2015 08:46:09 -0500 Subject: G1 root cause and tuning In-Reply-To: <1999336357.5062179.1428048717509.JavaMail.yahoo@mail.yahoo.com> References: <3A50F3D7-6C40-4D3E-B8D9-4822F111DB8A@oracle.com> <1999336357.5062179.1428048717509.JavaMail.yahoo@mail.yahoo.com> Message-ID: Currently out the office. Perhaps Jenny and Thomas can chime in with some additional analysis. To answer your specific questions, PrintHeapAtGCExtended is printing what we expect to see. Iirc, "no shared space configured" may be related to the AppCDS feature. We don't need to worry about that for your issues. Eden space size. No, we should not increase it. It looks the major issue is frequent humongous object allocations. Doing tests with increased region size would be the appropriate next steps. Some additional observations from the log snippets ... Pause times in non-full GCs look to be in Update RS. Ideally we would want to largest amount of time to be in copy time. Reference processing times are kinda high too. Is your application using a large number of reference objects? One thing that's puzzling in the Full GC line is the amount of usr CPU time is much less than the real time. This seems odd! Could your system be paging to virtual memory? Or, if you are on Linux, do you have transparent huge pages enable (they should be disabled!). Also seeing more than 5 seconds in sys time is eye catching and could be related to the above issues. All the above said, I think the major issue that needs to be addressed is the frequent large object allocations. If increasing region size doesn't address this, then you are likely looking at making application changes to remove the frequent large object allocations. Once that is addressed, we can focus on some of the other issues. Charlie Sent from my iPhone > On Apr 3, 2015, at 3:11 AM, Medan Gavril wrote: > > HI Charlie, > > We had today a FUL GC of 280 seconds. At least it printed the full gc and it did not end with a jvm hung. . We will apply XX:G1HeapRegionSize=8M. > "XX:+PrintHeapAtGCExtended" did it printed the expected output? > NO shared space configured, what does it mean? > You said eden sapce is 1 GB. Can/should we increase it? > > 495364.337: [GC pause (young) 495364.337: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 258.26 ms, remaining time: 2241.74 ms, target pause time: 2500.00 ms] > 495364.337: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 239 regions, survivors: 18 regions, predicted young region time: 63.39 ms] > 495364.337: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 239 regions, survivors: 18 regions, old: 0 regions, predicted pause time: 321.65 ms, target pause time: 2500.00 ms] > , 0.25714812 secs] > [Parallel Time: 181.0 ms] > [GC Worker Start (ms): 495364337.4 495364337.4 495364337.4 495364337.4 495364337.5 495364337.5 495364337.5 495364337.5 495364337.5 495364337.5 495364337.5 495364337.5 495364337.5 > Avg: 495364337.5, Min: 495364337.4, Max: 495364337.5, Diff: 0.1] > [Ext Root Scanning (ms): 18.5 18.0 18.8 25.0 17.9 17.9 18.3 19.2 17.8 25.1 18.9 26.4 18.3 > Avg: 20.0, Min: 17.8, Max: 26.4, Diff: 8.6] > [SATB Filtering (ms): 0.0 0.0 0.0 0.0 0.0 1.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 > Avg: 0.1, Min: 0.0, Max: 1.3, Diff: 1.3] > [Update RS (ms): 129.9 130.5 131.0 124.7 130.0 129.1 129.2 129.0 139.2 122.5 129.2 121.2 126.1 > Avg: 128.6, Min: 121.2, Max: 139.2, Diff: 18.0] > [Processed Buffers : 44 57 70 66 67 50 70 54 71 47 59 46 34 > Sum: 735, Avg: 56, Min: 34, Max: 71, Diff: 37] > [Scan RS (ms): 0.2 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.1 0.1 0.2 0.1 > Avg: 0.1, Min: 0.0, Max: 0.2, Diff: 0.2] > [Object Copy (ms): 26.9 27.0 25.6 25.7 27.3 27.1 27.9 27.2 18.4 27.7 27.3 27.6 30.8 > Avg: 26.7, Min: 18.4, Max: 30.8, Diff: 12.4] > [Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 > Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] > [Termination Attempts : 2 3 3 1 3 3 3 3 3 2 3 1 2 > Sum: 32, Avg: 2, Min: 1, Max: 3, Diff: 2] > [GC Worker End (ms): 495364512.9 495364512.9 495364512.9 495364512.9 495364512.9 495364513.0 495364513.0 495364512.9 495364512.9 495364512.9 495364512.9 495364512.9 495364513.0 > Avg: 495364512.9, Min: 495364512.9, Max: 495364513.0, Diff: 0.1] > [GC Worker (ms): 175.5 175.5 175.5 175.5 175.5 175.5 175.5 175.4 175.4 175.4 175.4 175.4 175.4 > Avg: 175.5, Min: 175.4, Max: 175.5, Diff: 0.1] > [GC Worker Other (ms): 5.5 5.5 5.5 5.5 5.6 5.6 5.6 5.6 5.6 5.6 5.6 5.6 5.6 > Avg: 5.6, Min: 5.5, Max: 5.6, Diff: 0.1] > [Complete CSet Marking: 0.0 ms] > [Clear CT: 0.3 ms] > [Other: 75.9 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 72.8 ms] > [Ref Enq: 0.4 ms] > [Free CSet: 1.3 ms] > [Eden: 956M(952M)->0B(972M) Survivors: 72M->52M Heap: 10420M(22480M)->9469M(22480M)] > [Times: user=2.92 sys=0.00, real=0.27 secs] > Total time for which application threads were stopped: 0.2605235 seconds > Total time for which application threads were stopped: 0.0024983 seconds > Total time for which application threads were stopped: 0.0046328 seconds > 495368.172: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 134217744 bytes] > 495368.172: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 125829120 bytes, attempted expansion amount: 125829120 bytes] > 495368.172: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] > 495368.176: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 134217744 bytes] > 495368.176: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 125829120 bytes, attempted expansion amount: 125829120 bytes] > 495368.176: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] > 495368.176: [GC pause (young) 495368.176: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 667.90 ms, remaining time: 1832.10 ms, target pause time: 2500.00 ms] > 495368.176: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 243 regions, survivors: 13 regions, predicted young region time: 46.78 ms] > 495368.176: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 243 regions, survivors: 13 regions, old: 0 regions, predicted pause time: 714.67 ms, target pause time: 2500.00 ms] > , 0.36436765 secs] > [Parallel Time: 273.1 ms] > [GC Worker Start (ms): 495368176.0 495368176.0 495368176.0 495368176.0 495368176.0 495368176.0 495368176.0 495368176.0 495368176.0 495368176.0 495368176.1 495368176.1 495368176.1 > Avg: 495368176.0, Min: 495368176.0, Max: 495368176.1, Diff: 0.1] > [Ext Root Scanning (ms): 23.1 19.7 25.1 21.5 21.6 25.3 18.7 19.3 25.5 20.5 18.4 20.6 20.1 > Avg: 21.5, Min: 18.4, Max: 25.5, Diff: 7.2] > [SATB Filtering (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.3 0.0 0.0 0.0 0.0 0.0 > Avg: 0.1, Min: 0.0, Max: 1.3, Diff: 1.3] > [Update RS (ms): 222.1 225.3 220.2 224.8 223.8 222.8 222.6 224.9 220.0 225.4 232.7 231.0 225.2 > Avg: 224.7, Min: 220.0, Max: 232.7, Diff: 12.7] > [Processed Buffers : 139 200 242 206 182 154 151 211 149 162 189 222 123 > Sum: 2330, Avg: 179, Min: 123, Max: 242, Diff: 119] > [Scan RS (ms): 0.0 0.3 0.0 0.0 0.0 0.0 0.2 0.2 0.0 0.0 0.0 0.0 0.0 > Avg: 0.1, Min: 0.0, Max: 0.3, Diff: 0.3] > [Object Copy (ms): 22.4 22.3 22.3 21.3 22.2 19.5 26.0 21.9 22.0 21.6 16.4 15.9 22.2 > Avg: 21.2, Min: 15.9, Max: 26.0, Diff: 10.1] > [Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 > Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] > [Termination Attempts : 1 1 1 1 1 1 1 1 1 1 1 1 1 > Sum: 13, Avg: 1, Min: 1, Max: 1, Diff: 0] > [GC Worker End (ms): 495368443.7 495368443.6 495368443.6 495368443.7 495368443.7 495368443.6 495368443.7 495368443.6 495368443.6 495368443.7 495368443.7 495368443.6 495368443.7 > Avg: 495368443.7, Min: 495368443.6, Max: 495368443.7, Diff: 0.1] > [GC Worker (ms): 267.7 267.6 267.6 267.6 267.7 267.6 267.6 267.6 267.6 267.6 267.6 267.6 267.6 > Avg: 267.6, Min: 267.6, Max: 267.7, Diff: 0.1] > [GC Worker Other (ms): 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.6 5.6 5.6 > Avg: 5.5, Min: 5.5, Max: 5.6, Diff: 0.1] > [Complete CSet Marking: 0.0 ms] > [Clear CT: 0.2 ms] > [Other: 91.0 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 89.0 ms] > [Ref Enq: 0.0 ms] > [Free CSet: 1.4 ms] > [Eden: 972M(972M)->0B(972M) Survivors: 52M->52M Heap: 10695M(22480M)->9727M(22480M)] > [Times: user=3.74 sys=0.00, real=0.37 secs] > 495368.541: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 134217744 bytes] > 495368.541: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 125829120 bytes, attempted expansion amount: 125829120 bytes] > 495368.541: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] > Total time for which application threads were stopped: 0.3686524 seconds > 495368.556: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 134217744 bytes] > 495368.557: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 125829120 bytes, attempted expansion amount: 125829120 bytes] > 495368.557: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] > 495368.557: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 134217744 bytes] > 495368.557: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 134217744 bytes, attempted expansion amount: 138412032 bytes] > 495368.557: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] > 495368.557: [Full GC > 9731M->6801M(22480M), 283.4340824 secs] > [Times: user=78.98 sys=5.43, real=283.44 secs] > Total time for which application threads were stopped: 283.4511972 seconds > Total time for which application threads were stopped: 0.4082000 seconds > 495652.564: [GC concurrent-mark-abort] > Total time for which application threads were stopped: 0.0152932 seconds > Total time for which application threads were stopped: 0.0434217 seconds > Total time for which application threads were stopped: 0.3351899 seconds > Total time for which application threads were stopped: 0.0395479 seconds > Total time for which application threads were stopped: 0.1017198 seconds > Total time for which application threads were stopped: 0.0664623 seconds > Total time for which application threads were stopped: 0.0095970 seconds > Total time for which application threads were stopped: 0.0200670 seconds > Total time for which application threads were stopped: 0.0033661 seconds > Total time for which application threads were stopped: 0.0029946 seconds > Total time for which application threads were stopped: 0.0726447 seconds > Total time for which application threads were stopped: 0.0023253 seconds > Total time for which application threads were stopped: 0.0050572 seconds > Total time for which application threads were stopped: 0.0861142 seconds > Total time for which application threads were stopped: 0.0025415 seconds > Total time for which application threads were stopped: 0.0028936 seconds > Total time for which application threads were stopped: 0.0096976 seconds > Total time for which application threads were stopped: 0.0025880 seconds > Total time for which application threads were stopped: 0.0116163 seconds > Total time for which application threads were stopped: 0.0026818 seconds > Total time for which application threads were stopped: 0.0032141 seconds > Total time for which application threads were stopped: 0.0078230 seconds > Total time for which application threads were stopped: 0.0025152 seconds > Total time for which application threads were stopped: 0.0026890 seconds > Total time for which application threads were stopped: 0.0030924 seconds > Total time for which application threads were stopped: 0.0027699 seconds > Total time for which application threads were stopped: 0.0029259 seconds > Total time for which application threads were stopped: 0.0032893 seconds > Total time for which application threads were stopped: 0.0037969 seconds > Total time for which application threads were stopped: 0.0130588 seconds > Total time for which application threads were stopped: 0.1230765 seconds > Total time for which application threads were stopped: 0.0022368 seconds > Total time for which application threads were stopped: 0.0022032 seconds > Total time for which application threads were stopped: 0.0505220 seconds > Total time for which application threads were stopped: 0.0351524 seconds > Total time for which application threads were stopped: 0.0022816 seconds > Total time for which application threads were stopped: 0.0292614 seconds > Total time for which application threads were stopped: 0.0117505 seconds > Total time for which application threads were stopped: 0.0035653 seconds > Total time for which application threads were stopped: 0.0388754 seconds > Total time for which application threads were stopped: 0.0028810 seconds > Total time for which application threads were stopped: 0.0025892 seconds > Total time for which application threads were stopped: 0.0028770 seconds > Total time for which application threads were stopped: 0.0024385 seconds > Total time for which application threads were stopped: 0.0025007 seconds > Total time for which application threads were stopped: 0.0024033 seconds > Total time for which application threads were stopped: 0.0024441 seconds > Total time for which application threads were stopped: 0.0028249 seconds > Total time for which application threads were stopped: 0.0026523 seconds > Total time for which application threads were stopped: 0.0025650 seconds > Total time for which application threads were stopped: 0.0025241 seconds > Total time for which application threads were stopped: 0.0026102 seconds > Total time for which application threads were stopped: 0.0026741 seconds > Total time for which application threads were stopped: 0.0025973 seconds > Total time for which application threads were stopped: 0.0027808 seconds > Total time for which application threads were stopped: 0.0112295 seconds > Total time for which application threads were stopped: 0.0028798 seconds > Total time for which application threads were stopped: 0.0026709 seconds > Total time for which application threads were stopped: 0.0038697 seconds > Total time for which application threads were stopped: 0.0026591 seconds > Total time for which application threads were stopped: 0.0025443 seconds > Total time for which application threads were stopped: 0.0040415 seconds > Total time for which application threads were stopped: 0.0025476 seconds > Total time for which application threads were stopped: 0.0025682 seconds > Total time for which application threads were stopped: 0.0032743 seconds > Total time for which application threads were stopped: 0.0034255 seconds > Total time for which application threads were stopped: 0.0034190 seconds > Total time for which application threads were stopped: 0.0025161 seconds > Total time for which application threads were stopped: 0.0036486 seconds > Total time for which application threads were stopped: 0.0034647 seconds > Total time for which application threads were stopped: 0.0032177 seconds > Total time for which application threads were stopped: 0.0027416 seconds > Total time for which application threads were stopped: 0.0034942 seconds > Total time for which application threads were stopped: 0.0026935 seconds > Total time for which application threads were stopped: 0.0029138 seconds > Total time for which application threads were stopped: 0.0026070 seconds > Total time for which application threads were stopped: 0.0025387 seconds > Total time for which application threads were stopped: 0.0145426 seconds > Total time for which application threads were stopped: 0.0031826 seconds > Total time for which application threads were stopped: 0.0023891 seconds > Total time for which application threads were stopped: 0.0027578 seconds > Total time for which application threads were stopped: 0.0742630 seconds > Total time for which application threads were stopped: 0.0024304 seconds > Total time for which application threads were stopped: 0.0637979 seconds > Total time for which application threads were stopped: 0.0198993 seconds > Total time for which application threads were stopped: 0.0491918 seconds > Total time for which application threads were stopped: 0.0024178 seconds > Total time for which application threads were stopped: 0.0098973 seconds > Total time for which application threads were stopped: 0.0302448 seconds > Total time for which application threads were stopped: 0.0121276 seconds > Total time for which application threads were stopped: 0.0025787 seconds > Total time for which application threads were stopped: 0.0138264 seconds > Total time for which application threads were stopped: 0.0153000 seconds > Total time for which application threads were stopped: 0.0027456 seconds > Total time for which application threads were stopped: 0.0038677 seconds > Total time for which application threads were stopped: 0.0029987 seconds > Total time for which application threads were stopped: 0.0054323 seconds > Total time for which application threads were stopped: 0.0026236 seconds > Total time for which application threads were stopped: 0.0034756 seconds > Total time for which application threads were stopped: 0.0028228 seconds > Total time for which application threads were stopped: 0.0096895 seconds > Total time for which application threads were stopped: 0.0028705 seconds > Total time for which application threads were stopped: 0.0038539 seconds > Total time for which application threads were stopped: 0.0027335 seconds > Total time for which application threads were stopped: 0.0039392 seconds > Total time for which application threads were stopped: 0.0030043 seconds > Total time for which application threads were stopped: 0.0025136 seconds > Total time for which application threads were stopped: 0.0041170 seconds > Total time for which application threads were stopped: 0.0134016 seconds > Total time for which application threads were stopped: 0.0028604 seconds > Total time for which application threads were stopped: 0.0030666 seconds > Total time for which application threads were stopped: 0.0092635 seconds > Total time for which application threads were stopped: 0.0035233 seconds > Total time for which application threads were stopped: 0.0027291 seconds > Total time for which application threads were stopped: 0.0026195 seconds > Total time for which application threads were stopped: 0.0025989 seconds > Total time for which application threads were stopped: 0.0027949 seconds > Total time for which application threads were stopped: 0.0035386 seconds > Total time for which application threads were stopped: 0.0034117 seconds > Total time for which application threads were stopped: 0.0033919 seconds > Total time for which application threads were stopped: 0.0024518 seconds > Total time for which application threads were stopped: 0.0031862 seconds > Total time for which application threads were stopped: 0.0032185 seconds > Total time for which application threads were stopped: 0.0028277 seconds > Total time for which application threads were stopped: 0.0032185 seconds > Total time for which application threads were stopped: 0.0034893 seconds > Total time for which application threads were stopped: 0.0027493 seconds > Total time for which application threads were stopped: 0.0031793 seconds > Total time for which application threads were stopped: 0.0041045 seconds > Total time for which application threads were stopped: 0.0025815 seconds > Total time for which application threads were stopped: 0.0035859 seconds > Total time for which application threads were stopped: 0.0038022 seconds > Total time for which application threads were stopped: 0.0029696 seconds > Total time for which application threads were stopped: 0.0037133 seconds > Total time for which application threads were stopped: 0.0032796 seconds > Total time for which application threads were stopped: 0.0030532 seconds > Total time for which application threads were stopped: 0.0026405 seconds > Total time for which application threads were stopped: 0.0033176 seconds > Total time for which application threads were stopped: 0.0032808 seconds > Total time for which application threads were stopped: 0.0200569 seconds > Total time for which application threads were stopped: 0.0027396 seconds > Total time for which application threads were stopped: 0.0037072 seconds > Total time for which application threads were stopped: 0.0024882 seconds > Total time for which application threads were stopped: 0.0030096 seconds > Total time for which application threads were stopped: 0.0024801 seconds > Total time for which application threads were stopped: 0.0025565 seconds > Total time for which application threads were stopped: 0.0038931 seconds > Total time for which application threads were stopped: 0.0030067 seconds > Total time for which application threads were stopped: 0.0030302 seconds > Total time for which application threads were stopped: 0.0271912 seconds > Total time for which application threads were stopped: 0.0026147 seconds > Heap > garbage-first heap total 23019520K, used 7803799K [0x0000000243000000, 0x00000007c0000000, 0x00000007c0000000) > region size 4096K, 146 young (598016K), 0 survivors (0K) > compacting perm gen total 258048K, used 255345K [0x00000007c0000000, 0x00000007cfc00000, 0x0000000800000000) > the space 258048K, 98% used [0x00000007c0000000, 0x00000007cf95c610, 0x00000007cf95c800, 0x00000007cfc00000) > No shared spaces configured. > on_exit trigger matched. Restarting the JVM. (Exit code: -1) > > > > > On Tuesday, March 31, 2015 3:52 PM, charlie hunt wrote: > > > Just as a clarification, the -XX:+ParallelRefProcEnabled will help reduce the time spent in reference processing. It will not help address the issue of seeing Full GCs as a result of frequent humongous object allocations, or a humongous allocations where there is not sufficient contiguous regions available to satisfy the humongous allocation request. > > Thomas?s suggestion to increase the region size may help with the Full GCs as a result of humongous object allocations. > > thanks, > > charlie > >> On Mar 31, 2015, at 7:42 AM, Medan Gavril wrote: >> >> HI Charlie, >> >> Currenltly we can only go to java 7 update 7x(latest). >> >> We will try the following changes: >> 1. -XX:G1HeapRegionSize=8 (then increase) >> 2. -XX:+ParallelRefProcEnabled >> >> Please let me know if you have any other suggestion. >> >> Best Regards, >> Gabi Medan >> >> >> On Tuesday, March 31, 2015 3:35 PM, charlie hunt wrote: >> >> >> To add to Thomas?s good suggestions, I suppose one other alternative is to make application changes to break up the 300+ MB allocation into smaller MB allocations. This would offer a better opportunity for that humongous allocation to be satisfied. >> >> hths, >> >> charlie >> >>> On Mar 31, 2015, at 6:30 AM, Thomas Schatzl wrote: >>> >>> Hi all, >>> >>>> On Mon, 2015-03-30 at 20:41 -0500, charlie hunt wrote: >>>> Hi Jenny, >>>> >>>> One possibility is that there is not enough available contiguous >>>> regions to satisfy a 300+ MB humongous allocation. >>>> >>>> If we assume a 22 GB Java heap, (a little larger than the 22480M shown >>>> in the log), with 2048 G1 regions (default as you know), the region >>>> size would be about 11 MB. That implies there needs to be about 30 >>>> contiguous G1 regions available to satisfy the humongous allocation >>>> request. >>>> >>>> An unrelated question ? do other GCs have a similar pattern of a >>>> rather large percentage of time in Ref Proc relative to the overall >>>> pause time, i.e. 24.7 ms / 120 ms ~ 20% of the pause time. If that?s >>>> the case, then if -XX:+ParallelRefProcEnabled is not already set, >>>> there may be some low hanging tuning fruit. But, it is not going to >>>> address the frequent humongous allocation problem. It is also >>>> interesting in that the pause time goal is 2500 ms, yet the actual >>>> pause time is 120 ms, and eden is being sized at less than 1 GB out of >>>> a 22 GB Java heap. Are the frequent humongous allocations messing >>>> with the heap sizing heuristics? >>> >>> While I have no solution for the problem we are aware of these problems: >>> >>> - https://bugs.openjdk.java.net/browse/JDK-7068229 for dynamically >>> enabling MT reference processing >>> >>> - https://bugs.openjdk.java.net/browse/JDK-8038487 to use mixed GC >>> instead of Full GC to clear out space for failing humoungous object >>> allocations. >>> >>> I am not sure about what jdk release "JRE 1.17 update 17" actually is. >>> From the given strings in the PrintGCDetails output, it seems to be >>> something quite old, I would guess jdk6? >>> >>> In that case, if possible I would recommend trying a newer version that >>> improves humongous object handling significantly (e.g. 8u40 is latest >>> official). >>> >>> Another option that works in all versions I am aware of is increasing >>> heap region size with -XX:G1HeapRegionSize=M, where X is 8/16 or 32; >>> it seems that 4M region size has been chosen by ergonomics. >>> Start with the smaller of the suggested values. >>> >>> Thanks, >>> Thomas >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gabi_io at yahoo.com Fri Apr 3 08:11:57 2015 From: gabi_io at yahoo.com (Medan Gavril) Date: Fri, 3 Apr 2015 08:11:57 +0000 (UTC) Subject: G1 root cause and tuning In-Reply-To: <3A50F3D7-6C40-4D3E-B8D9-4822F111DB8A@oracle.com> References: <3A50F3D7-6C40-4D3E-B8D9-4822F111DB8A@oracle.com> Message-ID: <1999336357.5062179.1428048717509.JavaMail.yahoo@mail.yahoo.com> HI Charlie, We had today a FUL GC of 280 seconds. At least it printed the full gc and it did not end with a jvm hung. . We will apply?XX:G1HeapRegionSize=8M.?"XX:+PrintHeapAtGCExtended" did it printed the expected output?NO shared space configured, what does it mean??You said eden sapce is 1 GB. Can/should we increase it? 495364.337: [GC pause (young) 495364.337: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 258.26 ms, remaining time: 2241.74 ms, target pause time: 2500.00 ms]?495364.337: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 239 regions, survivors: 18 regions, predicted young region time: 63.39 ms]?495364.337: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 239 regions, survivors: 18 regions, old: 0 regions, predicted pause time: 321.65 ms, target pause time: 2500.00 ms], 0.25714812 secs]? ?[Parallel Time: 181.0 ms]? ? ? [GC Worker Start (ms): ?495364337.4 ?495364337.4 ?495364337.4 ?495364337.4 ?495364337.5 ?495364337.5 ?495364337.5 ?495364337.5 ?495364337.5 ?495364337.5 ?495364337.5 ?495364337.5 ?495364337.5? ? ? ?Avg: 495364337.5, Min: 495364337.4, Max: 495364337.5, Diff: ? 0.1]? ? ? [Ext Root Scanning (ms): ?18.5 ?18.0 ?18.8 ?25.0 ?17.9 ?17.9 ?18.3 ?19.2 ?17.8 ?25.1 ?18.9 ?26.4 ?18.3? ? ? ?Avg: ?20.0, Min: ?17.8, Max: ?26.4, Diff: ? 8.6]? ? ? [SATB Filtering (ms): ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?1.3 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0? ? ? ?Avg: ? 0.1, Min: ? 0.0, Max: ? 1.3, Diff: ? 1.3]? ? ? [Update RS (ms): ?129.9 ?130.5 ?131.0 ?124.7 ?130.0 ?129.1 ?129.2 ?129.0 ?139.2 ?122.5 ?129.2 ?121.2 ?126.1? ? ? ?Avg: 128.6, Min: 121.2, Max: 139.2, Diff: ?18.0]? ? ? ? ?[Processed Buffers : 44 57 70 66 67 50 70 54 71 47 59 46 34? ? ? ? ? Sum: 735, Avg: 56, Min: 34, Max: 71, Diff: 37]? ? ? [Scan RS (ms): ?0.2 ?0.0 ?0.0 ?0.0 ?0.2 ?0.0 ?0.0 ?0.0 ?0.0 ?0.1 ?0.1 ?0.2 ?0.1? ? ? ?Avg: ? 0.1, Min: ? 0.0, Max: ? 0.2, Diff: ? 0.2]? ? ? [Object Copy (ms): ?26.9 ?27.0 ?25.6 ?25.7 ?27.3 ?27.1 ?27.9 ?27.2 ?18.4 ?27.7 ?27.3 ?27.6 ?30.8? ? ? ?Avg: ?26.7, Min: ?18.4, Max: ?30.8, Diff: ?12.4]? ? ? [Termination (ms): ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0? ? ? ?Avg: ? 0.0, Min: ? 0.0, Max: ? 0.0, Diff: ? 0.0]? ? ? ? ?[Termination Attempts : 2 3 3 1 3 3 3 3 3 2 3 1 2? ? ? ? ? Sum: 32, Avg: 2, Min: 1, Max: 3, Diff: 2]? ? ? [GC Worker End (ms): ?495364512.9 ?495364512.9 ?495364512.9 ?495364512.9 ?495364512.9 ?495364513.0 ?495364513.0 ?495364512.9 ?495364512.9 ?495364512.9 ?495364512.9 ?495364512.9 ?495364513.0? ? ? ?Avg: 495364512.9, Min: 495364512.9, Max: 495364513.0, Diff: ? 0.1]? ? ? [GC Worker (ms): ?175.5 ?175.5 ?175.5 ?175.5 ?175.5 ?175.5 ?175.5 ?175.4 ?175.4 ?175.4 ?175.4 ?175.4 ?175.4? ? ? ?Avg: 175.5, Min: 175.4, Max: 175.5, Diff: ? 0.1]? ? ? [GC Worker Other (ms): ?5.5 ?5.5 ?5.5 ?5.5 ?5.6 ?5.6 ?5.6 ?5.6 ?5.6 ?5.6 ?5.6 ?5.6 ?5.6? ? ? ?Avg: ? 5.6, Min: ? 5.5, Max: ? 5.6, Diff: ? 0.1]? ?[Complete CSet Marking: ? 0.0 ms]? ?[Clear CT: ? 0.3 ms]? ?[Other: ?75.9 ms]? ? ? [Choose CSet: ? 0.0 ms]? ? ? [Ref Proc: ?72.8 ms]? ? ? [Ref Enq: ? 0.4 ms]? ? ? [Free CSet: ? 1.3 ms]? ?[Eden: 956M(952M)->0B(972M) Survivors: 72M->52M Heap: 10420M(22480M)->9469M(22480M)]?[Times: user=2.92 sys=0.00, real=0.27 secs]?Total time for which application threads were stopped: 0.2605235 secondsTotal time for which application threads were stopped: 0.0024983 secondsTotal time for which application threads were stopped: 0.0046328 seconds?495368.172: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 134217744 bytes]?495368.172: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 125829120 bytes, attempted expansion amount: 125829120 bytes]?495368.172: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed]?495368.176: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 134217744 bytes]?495368.176: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 125829120 bytes, attempted expansion amount: 125829120 bytes]?495368.176: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed]495368.176: [GC pause (young) 495368.176: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 667.90 ms, remaining time: 1832.10 ms, target pause time: 2500.00 ms]?495368.176: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 243 regions, survivors: 13 regions, predicted young region time: 46.78 ms]?495368.176: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 243 regions, survivors: 13 regions, old: 0 regions, predicted pause time: 714.67 ms, target pause time: 2500.00 ms], 0.36436765 secs]? ?[Parallel Time: 273.1 ms]? ? ? [GC Worker Start (ms): ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.1 ?495368176.1 ?495368176.1? ? ? ?Avg: 495368176.0, Min: 495368176.0, Max: 495368176.1, Diff: ? 0.1]? ? ? [Ext Root Scanning (ms): ?23.1 ?19.7 ?25.1 ?21.5 ?21.6 ?25.3 ?18.7 ?19.3 ?25.5 ?20.5 ?18.4 ?20.6 ?20.1? ? ? ?Avg: ?21.5, Min: ?18.4, Max: ?25.5, Diff: ? 7.2]? ? ? [SATB Filtering (ms): ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?1.3 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0? ? ? ?Avg: ? 0.1, Min: ? 0.0, Max: ? 1.3, Diff: ? 1.3]? ? ? [Update RS (ms): ?222.1 ?225.3 ?220.2 ?224.8 ?223.8 ?222.8 ?222.6 ?224.9 ?220.0 ?225.4 ?232.7 ?231.0 ?225.2? ? ? ?Avg: 224.7, Min: 220.0, Max: 232.7, Diff: ?12.7]? ? ? ? ?[Processed Buffers : 139 200 242 206 182 154 151 211 149 162 189 222 123? ? ? ? ? Sum: 2330, Avg: 179, Min: 123, Max: 242, Diff: 119]? ? ? [Scan RS (ms): ?0.0 ?0.3 ?0.0 ?0.0 ?0.0 ?0.0 ?0.2 ?0.2 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0? ? ? ?Avg: ? 0.1, Min: ? 0.0, Max: ? 0.3, Diff: ? 0.3]? ? ? [Object Copy (ms): ?22.4 ?22.3 ?22.3 ?21.3 ?22.2 ?19.5 ?26.0 ?21.9 ?22.0 ?21.6 ?16.4 ?15.9 ?22.2? ? ? ?Avg: ?21.2, Min: ?15.9, Max: ?26.0, Diff: ?10.1]? ? ? [Termination (ms): ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0? ? ? ?Avg: ? 0.0, Min: ? 0.0, Max: ? 0.0, Diff: ? 0.0]? ? ? ? ?[Termination Attempts : 1 1 1 1 1 1 1 1 1 1 1 1 1? ? ? ? ? Sum: 13, Avg: 1, Min: 1, Max: 1, Diff: 0]? ? ? [GC Worker End (ms): ?495368443.7 ?495368443.6 ?495368443.6 ?495368443.7 ?495368443.7 ?495368443.6 ?495368443.7 ?495368443.6 ?495368443.6 ?495368443.7 ?495368443.7 ?495368443.6 ?495368443.7? ? ? ?Avg: 495368443.7, Min: 495368443.6, Max: 495368443.7, Diff: ? 0.1]? ? ? [GC Worker (ms): ?267.7 ?267.6 ?267.6 ?267.6 ?267.7 ?267.6 ?267.6 ?267.6 ?267.6 ?267.6 ?267.6 ?267.6 ?267.6? ? ? ?Avg: 267.6, Min: 267.6, Max: 267.7, Diff: ? 0.1]? ? ? [GC Worker Other (ms): ?5.5 ?5.5 ?5.5 ?5.5 ?5.5 ?5.5 ?5.5 ?5.5 ?5.5 ?5.5 ?5.6 ?5.6 ?5.6? ? ? ?Avg: ? 5.5, Min: ? 5.5, Max: ? 5.6, Diff: ? 0.1]? ?[Complete CSet Marking: ? 0.0 ms]? ?[Clear CT: ? 0.2 ms]? ?[Other: ?91.0 ms]? ? ? [Choose CSet: ? 0.0 ms]? ? ? [Ref Proc: ?89.0 ms]? ? ? [Ref Enq: ? 0.0 ms]? ? ? [Free CSet: ? 1.4 ms]? ?[Eden: 972M(972M)->0B(972M) Survivors: 52M->52M Heap: 10695M(22480M)->9727M(22480M)]?[Times: user=3.74 sys=0.00, real=0.37 secs]??495368.541: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 134217744 bytes]?495368.541: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 125829120 bytes, attempted expansion amount: 125829120 bytes]?495368.541: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed]Total time for which application threads were stopped: 0.3686524 seconds?495368.556: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 134217744 bytes]?495368.557: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 125829120 bytes, attempted expansion amount: 125829120 bytes]?495368.557: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed]?495368.557: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 134217744 bytes]?495368.557: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 134217744 bytes, attempted expansion amount: 138412032 bytes]?495368.557: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed]495368.557: [Full GC?9731M->6801M(22480M), 283.4340824 secs]?[Times: user=78.98 sys=5.43, real=283.44 secs]?Total time for which application threads were stopped: 283.4511972 secondsTotal time for which application threads were stopped: 0.4082000 seconds495652.564: [GC concurrent-mark-abort]Total time for which application threads were stopped: 0.0152932 secondsTotal time for which application threads were stopped: 0.0434217 secondsTotal time for which application threads were stopped: 0.3351899 secondsTotal time for which application threads were stopped: 0.0395479 secondsTotal time for which application threads were stopped: 0.1017198 seconds Total time for which application threads were stopped: 0.0664623 secondsTotal time for which application threads were stopped: 0.0095970 secondsTotal time for which application threads were stopped: 0.0200670 secondsTotal time for which application threads were stopped: 0.0033661 secondsTotal time for which application threads were stopped: 0.0029946 secondsTotal time for which application threads were stopped: 0.0726447 secondsTotal time for which application threads were stopped: 0.0023253 secondsTotal time for which application threads were stopped: 0.0050572 secondsTotal time for which application threads were stopped: 0.0861142 secondsTotal time for which application threads were stopped: 0.0025415 secondsTotal time for which application threads were stopped: 0.0028936 secondsTotal time for which application threads were stopped: 0.0096976 secondsTotal time for which application threads were stopped: 0.0025880 secondsTotal time for which application threads were stopped: 0.0116163 secondsTotal time for which application threads were stopped: 0.0026818 secondsTotal time for which application threads were stopped: 0.0032141 secondsTotal time for which application threads were stopped: 0.0078230 secondsTotal time for which application threads were stopped: 0.0025152 secondsTotal time for which application threads were stopped: 0.0026890 secondsTotal time for which application threads were stopped: 0.0030924 secondsTotal time for which application threads were stopped: 0.0027699 secondsTotal time for which application threads were stopped: 0.0029259 secondsTotal time for which application threads were stopped: 0.0032893 secondsTotal time for which application threads were stopped: 0.0037969 secondsTotal time for which application threads were stopped: 0.0130588 secondsTotal time for which application threads were stopped: 0.1230765 secondsTotal time for which application threads were stopped: 0.0022368 secondsTotal time for which application threads were stopped: 0.0022032 secondsTotal time for which application threads were stopped: 0.0505220 secondsTotal time for which application threads were stopped: 0.0351524 secondsTotal time for which application threads were stopped: 0.0022816 secondsTotal time for which application threads were stopped: 0.0292614 secondsTotal time for which application threads were stopped: 0.0117505 secondsTotal time for which application threads were stopped: 0.0035653 secondsTotal time for which application threads were stopped: 0.0388754 secondsTotal time for which application threads were stopped: 0.0028810 secondsTotal time for which application threads were stopped: 0.0025892 secondsTotal time for which application threads were stopped: 0.0028770 secondsTotal time for which application threads were stopped: 0.0024385 secondsTotal time for which application threads were stopped: 0.0025007 secondsTotal time for which application threads were stopped: 0.0024033 secondsTotal time for which application threads were stopped: 0.0024441 secondsTotal time for which application threads were stopped: 0.0028249 secondsTotal time for which application threads were stopped: 0.0026523 secondsTotal time for which application threads were stopped: 0.0025650 secondsTotal time for which application threads were stopped: 0.0025241 secondsTotal time for which application threads were stopped: 0.0026102 secondsTotal time for which application threads were stopped: 0.0026741 secondsTotal time for which application threads were stopped: 0.0025973 secondsTotal time for which application threads were stopped: 0.0027808 secondsTotal time for which application threads were stopped: 0.0112295 secondsTotal time for which application threads were stopped: 0.0028798 secondsTotal time for which application threads were stopped: 0.0026709 secondsTotal time for which application threads were stopped: 0.0038697 secondsTotal time for which application threads were stopped: 0.0026591 secondsTotal time for which application threads were stopped: 0.0025443 secondsTotal time for which application threads were stopped: 0.0040415 secondsTotal time for which application threads were stopped: 0.0025476 secondsTotal time for which application threads were stopped: 0.0025682 secondsTotal time for which application threads were stopped: 0.0032743 secondsTotal time for which application threads were stopped: 0.0034255 secondsTotal time for which application threads were stopped: 0.0034190 secondsTotal time for which application threads were stopped: 0.0025161 secondsTotal time for which application threads were stopped: 0.0036486 secondsTotal time for which application threads were stopped: 0.0034647 secondsTotal time for which application threads were stopped: 0.0032177 secondsTotal time for which application threads were stopped: 0.0027416 secondsTotal time for which application threads were stopped: 0.0034942 secondsTotal time for which application threads were stopped: 0.0026935 secondsTotal time for which application threads were stopped: 0.0029138 secondsTotal time for which application threads were stopped: 0.0026070 secondsTotal time for which application threads were stopped: 0.0025387 secondsTotal time for which application threads were stopped: 0.0145426 secondsTotal time for which application threads were stopped: 0.0031826 secondsTotal time for which application threads were stopped: 0.0023891 secondsTotal time for which application threads were stopped: 0.0027578 secondsTotal time for which application threads were stopped: 0.0742630 secondsTotal time for which application threads were stopped: 0.0024304 secondsTotal time for which application threads were stopped: 0.0637979 secondsTotal time for which application threads were stopped: 0.0198993 secondsTotal time for which application threads were stopped: 0.0491918 secondsTotal time for which application threads were stopped: 0.0024178 secondsTotal time for which application threads were stopped: 0.0098973 secondsTotal time for which application threads were stopped: 0.0302448 secondsTotal time for which application threads were stopped: 0.0121276 secondsTotal time for which application threads were stopped: 0.0025787 secondsTotal time for which application threads were stopped: 0.0138264 secondsTotal time for which application threads were stopped: 0.0153000 secondsTotal time for which application threads were stopped: 0.0027456 secondsTotal time for which application threads were stopped: 0.0038677 secondsTotal time for which application threads were stopped: 0.0029987 secondsTotal time for which application threads were stopped: 0.0054323 secondsTotal time for which application threads were stopped: 0.0026236 secondsTotal time for which application threads were stopped: 0.0034756 secondsTotal time for which application threads were stopped: 0.0028228 secondsTotal time for which application threads were stopped: 0.0096895 secondsTotal time for which application threads were stopped: 0.0028705 secondsTotal time for which application threads were stopped: 0.0038539 secondsTotal time for which application threads were stopped: 0.0027335 secondsTotal time for which application threads were stopped: 0.0039392 secondsTotal time for which application threads were stopped: 0.0030043 secondsTotal time for which application threads were stopped: 0.0025136 secondsTotal time for which application threads were stopped: 0.0041170 secondsTotal time for which application threads were stopped: 0.0134016 secondsTotal time for which application threads were stopped: 0.0028604 secondsTotal time for which application threads were stopped: 0.0030666 secondsTotal time for which application threads were stopped: 0.0092635 secondsTotal time for which application threads were stopped: 0.0035233 secondsTotal time for which application threads were stopped: 0.0027291 secondsTotal time for which application threads were stopped: 0.0026195 secondsTotal time for which application threads were stopped: 0.0025989 secondsTotal time for which application threads were stopped: 0.0027949 secondsTotal time for which application threads were stopped: 0.0035386 secondsTotal time for which application threads were stopped: 0.0034117 secondsTotal time for which application threads were stopped: 0.0033919 secondsTotal time for which application threads were stopped: 0.0024518 secondsTotal time for which application threads were stopped: 0.0031862 secondsTotal time for which application threads were stopped: 0.0032185 secondsTotal time for which application threads were stopped: 0.0028277 secondsTotal time for which application threads were stopped: 0.0032185 secondsTotal time for which application threads were stopped: 0.0034893 secondsTotal time for which application threads were stopped: 0.0027493 secondsTotal time for which application threads were stopped: 0.0031793 secondsTotal time for which application threads were stopped: 0.0041045 secondsTotal time for which application threads were stopped: 0.0025815 secondsTotal time for which application threads were stopped: 0.0035859 secondsTotal time for which application threads were stopped: 0.0038022 secondsTotal time for which application threads were stopped: 0.0029696 secondsTotal time for which application threads were stopped: 0.0037133 secondsTotal time for which application threads were stopped: 0.0032796 secondsTotal time for which application threads were stopped: 0.0030532 secondsTotal time for which application threads were stopped: 0.0026405 secondsTotal time for which application threads were stopped: 0.0033176 secondsTotal time for which application threads were stopped: 0.0032808 secondsTotal time for which application threads were stopped: 0.0200569 secondsTotal time for which application threads were stopped: 0.0027396 secondsTotal time for which application threads were stopped: 0.0037072 secondsTotal time for which application threads were stopped: 0.0024882 secondsTotal time for which application threads were stopped: 0.0030096 secondsTotal time for which application threads were stopped: 0.0024801 secondsTotal time for which application threads were stopped: 0.0025565 secondsTotal time for which application threads were stopped: 0.0038931 secondsTotal time for which application threads were stopped: 0.0030067 secondsTotal time for which application threads were stopped: 0.0030302 secondsTotal time for which application threads were stopped: 0.0271912 secondsTotal time for which application threads were stopped: 0.0026147 secondsHeap?garbage-first heap ? total 23019520K, used 7803799K [0x0000000243000000, 0x00000007c0000000, 0x00000007c0000000)? region size 4096K, 146 young (598016K), 0 survivors (0K)?compacting perm gen ?total 258048K, used 255345K [0x00000007c0000000, 0x00000007cfc00000, 0x0000000800000000)? ?the space 258048K, ?98% used [0x00000007c0000000, 0x00000007cf95c610, 0x00000007cf95c800, 0x00000007cfc00000)No shared spaces configured.on_exit trigger matched. ?Restarting the JVM. ?(Exit code: -1) On Tuesday, March 31, 2015 3:52 PM, charlie hunt wrote: Just as a clarification, the -XX:+ParallelRefProcEnabled will help reduce the time spent in reference processing. It will not help address the issue of seeing Full GCs as a result of frequent humongous object allocations, or a humongous allocations where there is not sufficient contiguous regions available to satisfy the humongous allocation request. Thomas?s suggestion to increase the region size may help with the Full GCs as a result of humongous object allocations. thanks, charlie On Mar 31, 2015, at 7:42 AM, Medan Gavril wrote: HI Charlie, Currenltly we can only go to java 7 update 7x(latest). We will try the following changes:? ?1.?-XX:G1HeapRegionSize=8 (then increase)? ? 2.?-XX:+ParallelRefProcEnabled Please let me know if you have any other suggestion. Best Regards,Gabi Medan On Tuesday, March 31, 2015 3:35 PM, charlie hunt wrote: To add to Thomas?s good suggestions, I suppose one other alternative is to make application changes to break up the 300+ MB allocation into smaller MB allocations. ?This would offer a better opportunity for that humongous allocation to be satisfied. hths, charlie On Mar 31, 2015, at 6:30 AM, Thomas Schatzl wrote: Hi all, On Mon, 2015-03-30 at 20:41 -0500, charlie hunt wrote: Hi Jenny, One possibility is that there is not enough available contiguous regions to satisfy a 300+ MB humongous allocation. If we assume a 22 GB Java heap, (a little larger than the 22480M shown in the log), with 2048 G1 regions (default as you know), the region size would be about 11 MB. That implies there needs to be about 30 contiguous G1 regions available to satisfy the humongous allocation request. An unrelated question ? do other GCs have a similar pattern of a rather large percentage of time in Ref Proc relative to the overall pause time, i.e. 24.7 ms / 120 ms ~ 20% of the pause time. ?If that?s the case, then if -XX:+ParallelRefProcEnabled is not already set, there may be some low hanging tuning fruit. But, it is not going to address the frequent humongous allocation problem. ?It is also interesting in that the pause time goal is 2500 ms, yet the actual pause time is 120 ms, and eden is being sized at less than 1 GB out of a 22 GB Java heap. ?Are the frequent humongous allocations messing with the heap sizing heuristics? While I have no solution for the problem we are aware of these problems: -?https://bugs.openjdk.java.net/browse/JDK-7068229?for dynamically enabling MT reference processing -?https://bugs.openjdk.java.net/browse/JDK-8038487?to use mixed GC instead of Full GC to clear out space for failing humoungous object allocations. I am not sure about what jdk release "JRE 1.17 update 17" actually is. >From the given strings in the PrintGCDetails output, it seems to be something quite old, I would guess jdk6? In that case, if possible I would recommend trying a newer version that improves humongous object handling significantly (e.g. 8u40 is latest official). Another option that works in all versions I am aware of is increasing heap region size with -XX:G1HeapRegionSize=M, where X is 8/16 or 32; it seems that 4M region size has been chosen by ergonomics. Start with the smaller of the suggested values. Thanks, ?Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: wrapperParsed.zip Type: application/octet-stream Size: 499490 bytes Desc: not available URL: From gabi_io at yahoo.com Mon Apr 6 09:26:16 2015 From: gabi_io at yahoo.com (Medan Gavril) Date: Mon, 6 Apr 2015 09:26:16 +0000 (UTC) Subject: G1 root cause and tuning In-Reply-To: References: Message-ID: <1067305432.102640.1428312376370.JavaMail.yahoo@mail.yahoo.com> Hi Charlie, Our goal is to reduce number of Full GC as well as make them shorter by tuning G1. Of course we will check the application code in parallel. we will start with "XX:G1HeapRegionSize=8M?? then UnlockExperimentalVMOptions InitiatingHeapOccupancyPercent=0G1HeapWastePercent ?Best Regards,Gabi Medan On Friday, April 3, 2015 4:46 PM, charlie hunt wrote: Currently out the office. Perhaps Jenny and Thomas can chime in with some additional analysis. To answer your specific questions, PrintHeapAtGCExtended is printing what we expect to see.Iirc, "no shared space configured" may be related to the AppCDS feature. We don't need to worry about that for your issues.Eden space size. No, we should not increase it. It looks the major issue is frequent humongous object allocations. Doing tests with increased region size would be the appropriate next steps. Some additional observations from the log snippets ...Pause times in non-full GCs look to be in Update RS. Ideally we would want to largest amount of time to be in copy time. Reference processing times are kinda high too. Is your application using a large number of reference objects?One thing that's puzzling in the Full GC line is the amount of usr CPU time is much less than the real time. This seems odd! Could your system be paging to virtual memory? ?Or, if you are on Linux, do you have transparent huge pages enable (they should be disabled!). Also seeing more than 5 seconds in sys time is eye catching and could be related to the above issues. All the above said, I think the major issue that needs to be addressed is the frequent large object allocations. If increasing region size doesn't address this, then you are likely looking at making application changes to remove the frequent large object allocations. Once that is addressed, we can focus on some of the other issues. Charlie Sent from my iPhone On Apr 3, 2015, at 3:11 AM, Medan Gavril wrote: HI Charlie, We had today a FUL GC of 280 seconds. At least it printed the full gc and it did not end with a jvm hung. . We will apply?XX:G1HeapRegionSize=8M.?"XX:+PrintHeapAtGCExtended" did it printed the expected output?NO shared space configured, what does it mean??You said eden sapce is 1 GB. Can/should we increase it? 495364.337: [GC pause (young) 495364.337: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 258.26 ms, remaining time: 2241.74 ms, target pause time: 2500.00 ms]?495364.337: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 239 regions, survivors: 18 regions, predicted young region time: 63.39 ms]?495364.337: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 239 regions, survivors: 18 regions, old: 0 regions, predicted pause time: 321.65 ms, target pause time: 2500.00 ms], 0.25714812 secs]? ?[Parallel Time: 181.0 ms]? ? ? [GC Worker Start (ms): ?495364337.4 ?495364337.4 ?495364337.4 ?495364337.4 ?495364337.5 ?495364337.5 ?495364337.5 ?495364337.5 ?495364337.5 ?495364337.5 ?495364337.5 ?495364337.5 ?495364337.5? ? ? ?Avg: 495364337.5, Min: 495364337.4, Max: 495364337.5, Diff: ? 0.1]? ? ? [Ext Root Scanning (ms): ?18.5 ?18.0 ?18.8 ?25.0 ?17.9 ?17.9 ?18.3 ?19.2 ?17.8 ?25.1 ?18.9 ?26.4 ?18.3? ? ? ?Avg: ?20.0, Min: ?17.8, Max: ?26.4, Diff: ? 8.6]? ? ? [SATB Filtering (ms): ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?1.3 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0? ? ? ?Avg: ? 0.1, Min: ? 0.0, Max: ? 1.3, Diff: ? 1.3]? ? ? [Update RS (ms): ?129.9 ?130.5 ?131.0 ?124.7 ?130.0 ?129.1 ?129.2 ?129.0 ?139.2 ?122.5 ?129.2 ?121.2 ?126.1? ? ? ?Avg: 128.6, Min: 121.2, Max: 139.2, Diff: ?18.0]? ? ? ? ?[Processed Buffers : 44 57 70 66 67 50 70 54 71 47 59 46 34? ? ? ? ? Sum: 735, Avg: 56, Min: 34, Max: 71, Diff: 37]? ? ? [Scan RS (ms): ?0.2 ?0.0 ?0.0 ?0.0 ?0.2 ?0.0 ?0.0 ?0.0 ?0.0 ?0.1 ?0.1 ?0.2 ?0.1? ? ? ?Avg: ? 0.1, Min: ? 0.0, Max: ? 0.2, Diff: ? 0.2]? ? ? [Object Copy (ms): ?26.9 ?27.0 ?25.6 ?25.7 ?27.3 ?27.1 ?27.9 ?27.2 ?18.4 ?27.7 ?27.3 ?27.6 ?30.8? ? ? ?Avg: ?26.7, Min: ?18.4, Max: ?30.8, Diff: ?12.4]? ? ? [Termination (ms): ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0? ? ? ?Avg: ? 0.0, Min: ? 0.0, Max: ? 0.0, Diff: ? 0.0]? ? ? ? ?[Termination Attempts : 2 3 3 1 3 3 3 3 3 2 3 1 2? ? ? ? ? Sum: 32, Avg: 2, Min: 1, Max: 3, Diff: 2]? ? ? [GC Worker End (ms): ?495364512.9 ?495364512.9 ?495364512.9 ?495364512.9 ?495364512.9 ?495364513.0 ?495364513.0 ?495364512.9 ?495364512.9 ?495364512.9 ?495364512.9 ?495364512.9 ?495364513.0? ? ? ?Avg: 495364512.9, Min: 495364512.9, Max: 495364513.0, Diff: ? 0.1]? ? ? [GC Worker (ms): ?175.5 ?175.5 ?175.5 ?175.5 ?175.5 ?175.5 ?175.5 ?175.4 ?175.4 ?175.4 ?175.4 ?175.4 ?175.4? ? ? ?Avg: 175.5, Min: 175.4, Max: 175.5, Diff: ? 0.1]? ? ? [GC Worker Other (ms): ?5.5 ?5.5 ?5.5 ?5.5 ?5.6 ?5.6 ?5.6 ?5.6 ?5.6 ?5.6 ?5.6 ?5.6 ?5.6? ? ? ?Avg: ? 5.6, Min: ? 5.5, Max: ? 5.6, Diff: ? 0.1]? ?[Complete CSet Marking: ? 0.0 ms]? ?[Clear CT: ? 0.3 ms]? ?[Other: ?75.9 ms]? ? ? [Choose CSet: ? 0.0 ms]? ? ? [Ref Proc: ?72.8 ms]? ? ? [Ref Enq: ? 0.4 ms]? ? ? [Free CSet: ? 1.3 ms]? ?[Eden: 956M(952M)->0B(972M) Survivors: 72M->52M Heap: 10420M(22480M)->9469M(22480M)]?[Times: user=2.92 sys=0.00, real=0.27 secs]?Total time for which application threads were stopped: 0.2605235 secondsTotal time for which application threads were stopped: 0.0024983 secondsTotal time for which application threads were stopped: 0.0046328 seconds?495368.172: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 134217744 bytes]?495368.172: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 125829120 bytes, attempted expansion amount: 125829120 bytes]?495368.172: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed]?495368.176: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 134217744 bytes]?495368.176: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 125829120 bytes, attempted expansion amount: 125829120 bytes]?495368.176: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed]495368.176: [GC pause (young) 495368.176: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 667.90 ms, remaining time: 1832.10 ms, target pause time: 2500.00 ms]?495368.176: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 243 regions, survivors: 13 regions, predicted young region time: 46.78 ms]?495368.176: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 243 regions, survivors: 13 regions, old: 0 regions, predicted pause time: 714.67 ms, target pause time: 2500.00 ms], 0.36436765 secs]? ?[Parallel Time: 273.1 ms]? ? ? [GC Worker Start (ms): ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.0 ?495368176.1 ?495368176.1 ?495368176.1? ? ? ?Avg: 495368176.0, Min: 495368176.0, Max: 495368176.1, Diff: ? 0.1]? ? ? [Ext Root Scanning (ms): ?23.1 ?19.7 ?25.1 ?21.5 ?21.6 ?25.3 ?18.7 ?19.3 ?25.5 ?20.5 ?18.4 ?20.6 ?20.1? ? ? ?Avg: ?21.5, Min: ?18.4, Max: ?25.5, Diff: ? 7.2]? ? ? [SATB Filtering (ms): ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?1.3 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0? ? ? ?Avg: ? 0.1, Min: ? 0.0, Max: ? 1.3, Diff: ? 1.3]? ? ? [Update RS (ms): ?222.1 ?225.3 ?220.2 ?224.8 ?223.8 ?222.8 ?222.6 ?224.9 ?220.0 ?225.4 ?232.7 ?231.0 ?225.2? ? ? ?Avg: 224.7, Min: 220.0, Max: 232.7, Diff: ?12.7]? ? ? ? ?[Processed Buffers : 139 200 242 206 182 154 151 211 149 162 189 222 123? ? ? ? ? Sum: 2330, Avg: 179, Min: 123, Max: 242, Diff: 119]? ? ? [Scan RS (ms): ?0.0 ?0.3 ?0.0 ?0.0 ?0.0 ?0.0 ?0.2 ?0.2 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0? ? ? ?Avg: ? 0.1, Min: ? 0.0, Max: ? 0.3, Diff: ? 0.3]? ? ? [Object Copy (ms): ?22.4 ?22.3 ?22.3 ?21.3 ?22.2 ?19.5 ?26.0 ?21.9 ?22.0 ?21.6 ?16.4 ?15.9 ?22.2? ? ? ?Avg: ?21.2, Min: ?15.9, Max: ?26.0, Diff: ?10.1]? ? ? [Termination (ms): ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0? ? ? ?Avg: ? 0.0, Min: ? 0.0, Max: ? 0.0, Diff: ? 0.0]? ? ? ? ?[Termination Attempts : 1 1 1 1 1 1 1 1 1 1 1 1 1? ? ? ? ? Sum: 13, Avg: 1, Min: 1, Max: 1, Diff: 0]? ? ? [GC Worker End (ms): ?495368443.7 ?495368443.6 ?495368443.6 ?495368443.7 ?495368443.7 ?495368443.6 ?495368443.7 ?495368443.6 ?495368443.6 ?495368443.7 ?495368443.7 ?495368443.6 ?495368443.7? ? ? ?Avg: 495368443.7, Min: 495368443.6, Max: 495368443.7, Diff: ? 0.1]? ? ? [GC Worker (ms): ?267.7 ?267.6 ?267.6 ?267.6 ?267.7 ?267.6 ?267.6 ?267.6 ?267.6 ?267.6 ?267.6 ?267.6 ?267.6? ? ? ?Avg: 267.6, Min: 267.6, Max: 267.7, Diff: ? 0.1]? ? ? [GC Worker Other (ms): ?5.5 ?5.5 ?5.5 ?5.5 ?5.5 ?5.5 ?5.5 ?5.5 ?5.5 ?5.5 ?5.6 ?5.6 ?5.6? ? ? ?Avg: ? 5.5, Min: ? 5.5, Max: ? 5.6, Diff: ? 0.1]? ?[Complete CSet Marking: ? 0.0 ms]? ?[Clear CT: ? 0.2 ms]? ?[Other: ?91.0 ms]? ? ? [Choose CSet: ? 0.0 ms]? ? ? [Ref Proc: ?89.0 ms]? ? ? [Ref Enq: ? 0.0 ms]? ? ? [Free CSet: ? 1.4 ms]? ?[Eden: 972M(972M)->0B(972M) Survivors: 52M->52M Heap: 10695M(22480M)->9727M(22480M)]?[Times: user=3.74 sys=0.00, real=0.37 secs]??495368.541: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 134217744 bytes]?495368.541: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 125829120 bytes, attempted expansion amount: 125829120 bytes]?495368.541: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed]Total time for which application threads were stopped: 0.3686524 seconds?495368.556: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 134217744 bytes]?495368.557: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 125829120 bytes, attempted expansion amount: 125829120 bytes]?495368.557: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed]?495368.557: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 134217744 bytes]?495368.557: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 134217744 bytes, attempted expansion amount: 138412032 bytes]?495368.557: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed]495368.557: [Full GC?9731M->6801M(22480M), 283.4340824 secs]?[Times: user=78.98 sys=5.43, real=283.44 secs]?Total time for which application threads were stopped: 283.4511972 secondsTotal time for which application threads were stopped: 0.4082000 seconds495652.564: [GC concurrent-mark-abort]Total time for which application threads were stopped: 0.0152932 secondsTotal time for which application threads were stopped: 0.0434217 secondsTotal time for which application threads were stopped: 0.3351899 secondsTotal time for which application threads were stopped: 0.0395479 secondsTotal time for which application threads were stopped: 0.1017198 seconds Total time for which application threads were stopped: 0.0664623 secondsTotal time for which application threads were stopped: 0.0095970 secondsTotal time for which application threads were stopped: 0.0200670 secondsTotal time for which application threads were stopped: 0.0033661 secondsTotal time for which application threads were stopped: 0.0029946 secondsTotal time for which application threads were stopped: 0.0726447 secondsTotal time for which application threads were stopped: 0.0023253 secondsTotal time for which application threads were stopped: 0.0050572 secondsTotal time for which application threads were stopped: 0.0861142 secondsTotal time for which application threads were stopped: 0.0025415 secondsTotal time for which application threads were stopped: 0.0028936 secondsTotal time for which application threads were stopped: 0.0096976 secondsTotal time for which application threads were stopped: 0.0025880 secondsTotal time for which application threads were stopped: 0.0116163 secondsTotal time for which application threads were stopped: 0.0026818 secondsTotal time for which application threads were stopped: 0.0032141 secondsTotal time for which application threads were stopped: 0.0078230 secondsTotal time for which application threads were stopped: 0.0025152 secondsTotal time for which application threads were stopped: 0.0026890 secondsTotal time for which application threads were stopped: 0.0030924 secondsTotal time for which application threads were stopped: 0.0027699 secondsTotal time for which application threads were stopped: 0.0029259 secondsTotal time for which application threads were stopped: 0.0032893 secondsTotal time for which application threads were stopped: 0.0037969 secondsTotal time for which application threads were stopped: 0.0130588 secondsTotal time for which application threads were stopped: 0.1230765 secondsTotal time for which application threads were stopped: 0.0022368 secondsTotal time for which application threads were stopped: 0.0022032 secondsTotal time for which application threads were stopped: 0.0505220 secondsTotal time for which application threads were stopped: 0.0351524 secondsTotal time for which application threads were stopped: 0.0022816 secondsTotal time for which application threads were stopped: 0.0292614 secondsTotal time for which application threads were stopped: 0.0117505 secondsTotal time for which application threads were stopped: 0.0035653 secondsTotal time for which application threads were stopped: 0.0388754 secondsTotal time for which application threads were stopped: 0.0028810 secondsTotal time for which application threads were stopped: 0.0025892 secondsTotal time for which application threads were stopped: 0.0028770 secondsTotal time for which application threads were stopped: 0.0024385 secondsTotal time for which application threads were stopped: 0.0025007 secondsTotal time for which application threads were stopped: 0.0024033 secondsTotal time for which application threads were stopped: 0.0024441 secondsTotal time for which application threads were stopped: 0.0028249 secondsTotal time for which application threads were stopped: 0.0026523 secondsTotal time for which application threads were stopped: 0.0025650 secondsTotal time for which application threads were stopped: 0.0025241 secondsTotal time for which application threads were stopped: 0.0026102 secondsTotal time for which application threads were stopped: 0.0026741 secondsTotal time for which application threads were stopped: 0.0025973 secondsTotal time for which application threads were stopped: 0.0027808 secondsTotal time for which application threads were stopped: 0.0112295 secondsTotal time for which application threads were stopped: 0.0028798 secondsTotal time for which application threads were stopped: 0.0026709 secondsTotal time for which application threads were stopped: 0.0038697 secondsTotal time for which application threads were stopped: 0.0026591 secondsTotal time for which application threads were stopped: 0.0025443 secondsTotal time for which application threads were stopped: 0.0040415 secondsTotal time for which application threads were stopped: 0.0025476 secondsTotal time for which application threads were stopped: 0.0025682 secondsTotal time for which application threads were stopped: 0.0032743 secondsTotal time for which application threads were stopped: 0.0034255 secondsTotal time for which application threads were stopped: 0.0034190 secondsTotal time for which application threads were stopped: 0.0025161 secondsTotal time for which application threads were stopped: 0.0036486 secondsTotal time for which application threads were stopped: 0.0034647 secondsTotal time for which application threads were stopped: 0.0032177 secondsTotal time for which application threads were stopped: 0.0027416 secondsTotal time for which application threads were stopped: 0.0034942 secondsTotal time for which application threads were stopped: 0.0026935 secondsTotal time for which application threads were stopped: 0.0029138 secondsTotal time for which application threads were stopped: 0.0026070 secondsTotal time for which application threads were stopped: 0.0025387 secondsTotal time for which application threads were stopped: 0.0145426 secondsTotal time for which application threads were stopped: 0.0031826 secondsTotal time for which application threads were stopped: 0.0023891 secondsTotal time for which application threads were stopped: 0.0027578 secondsTotal time for which application threads were stopped: 0.0742630 secondsTotal time for which application threads were stopped: 0.0024304 secondsTotal time for which application threads were stopped: 0.0637979 secondsTotal time for which application threads were stopped: 0.0198993 secondsTotal time for which application threads were stopped: 0.0491918 secondsTotal time for which application threads were stopped: 0.0024178 secondsTotal time for which application threads were stopped: 0.0098973 secondsTotal time for which application threads were stopped: 0.0302448 secondsTotal time for which application threads were stopped: 0.0121276 secondsTotal time for which application threads were stopped: 0.0025787 secondsTotal time for which application threads were stopped: 0.0138264 secondsTotal time for which application threads were stopped: 0.0153000 secondsTotal time for which application threads were stopped: 0.0027456 secondsTotal time for which application threads were stopped: 0.0038677 secondsTotal time for which application threads were stopped: 0.0029987 secondsTotal time for which application threads were stopped: 0.0054323 secondsTotal time for which application threads were stopped: 0.0026236 secondsTotal time for which application threads were stopped: 0.0034756 secondsTotal time for which application threads were stopped: 0.0028228 secondsTotal time for which application threads were stopped: 0.0096895 secondsTotal time for which application threads were stopped: 0.0028705 secondsTotal time for which application threads were stopped: 0.0038539 secondsTotal time for which application threads were stopped: 0.0027335 secondsTotal time for which application threads were stopped: 0.0039392 secondsTotal time for which application threads were stopped: 0.0030043 secondsTotal time for which application threads were stopped: 0.0025136 secondsTotal time for which application threads were stopped: 0.0041170 secondsTotal time for which application threads were stopped: 0.0134016 secondsTotal time for which application threads were stopped: 0.0028604 secondsTotal time for which application threads were stopped: 0.0030666 secondsTotal time for which application threads were stopped: 0.0092635 secondsTotal time for which application threads were stopped: 0.0035233 secondsTotal time for which application threads were stopped: 0.0027291 secondsTotal time for which application threads were stopped: 0.0026195 secondsTotal time for which application threads were stopped: 0.0025989 secondsTotal time for which application threads were stopped: 0.0027949 secondsTotal time for which application threads were stopped: 0.0035386 secondsTotal time for which application threads were stopped: 0.0034117 secondsTotal time for which application threads were stopped: 0.0033919 secondsTotal time for which application threads were stopped: 0.0024518 secondsTotal time for which application threads were stopped: 0.0031862 secondsTotal time for which application threads were stopped: 0.0032185 secondsTotal time for which application threads were stopped: 0.0028277 secondsTotal time for which application threads were stopped: 0.0032185 secondsTotal time for which application threads were stopped: 0.0034893 secondsTotal time for which application threads were stopped: 0.0027493 secondsTotal time for which application threads were stopped: 0.0031793 secondsTotal time for which application threads were stopped: 0.0041045 secondsTotal time for which application threads were stopped: 0.0025815 secondsTotal time for which application threads were stopped: 0.0035859 secondsTotal time for which application threads were stopped: 0.0038022 secondsTotal time for which application threads were stopped: 0.0029696 secondsTotal time for which application threads were stopped: 0.0037133 secondsTotal time for which application threads were stopped: 0.0032796 secondsTotal time for which application threads were stopped: 0.0030532 secondsTotal time for which application threads were stopped: 0.0026405 secondsTotal time for which application threads were stopped: 0.0033176 secondsTotal time for which application threads were stopped: 0.0032808 secondsTotal time for which application threads were stopped: 0.0200569 secondsTotal time for which application threads were stopped: 0.0027396 secondsTotal time for which application threads were stopped: 0.0037072 secondsTotal time for which application threads were stopped: 0.0024882 secondsTotal time for which application threads were stopped: 0.0030096 secondsTotal time for which application threads were stopped: 0.0024801 secondsTotal time for which application threads were stopped: 0.0025565 secondsTotal time for which application threads were stopped: 0.0038931 secondsTotal time for which application threads were stopped: 0.0030067 secondsTotal time for which application threads were stopped: 0.0030302 secondsTotal time for which application threads were stopped: 0.0271912 secondsTotal time for which application threads were stopped: 0.0026147 secondsHeap?garbage-first heap ? total 23019520K, used 7803799K [0x0000000243000000, 0x00000007c0000000, 0x00000007c0000000)? region size 4096K, 146 young (598016K), 0 survivors (0K)?compacting perm gen ?total 258048K, used 255345K [0x00000007c0000000, 0x00000007cfc00000, 0x0000000800000000)? ?the space 258048K, ?98% used [0x00000007c0000000, 0x00000007cf95c610, 0x00000007cf95c800, 0x00000007cfc00000)No shared spaces configured.on_exit trigger matched. ?Restarting the JVM. ?(Exit code: -1) On Tuesday, March 31, 2015 3:52 PM, charlie hunt wrote: Just as a clarification, the -XX:+ParallelRefProcEnabled will help reduce the time spent in reference processing. It will not help address the issue of seeing Full GCs as a result of frequent humongous object allocations, or a humongous allocations where there is not sufficient contiguous regions available to satisfy the humongous allocation request. Thomas?s suggestion to increase the region size may help with the Full GCs as a result of humongous object allocations. thanks, charlie On Mar 31, 2015, at 7:42 AM, Medan Gavril wrote: HI Charlie, Currenltly we can only go to java 7 update 7x(latest). We will try the following changes:? ?1.?-XX:G1HeapRegionSize=8 (then increase)? ? 2.?-XX:+ParallelRefProcEnabled Please let me know if you have any other suggestion. Best Regards,Gabi Medan On Tuesday, March 31, 2015 3:35 PM, charlie hunt wrote: To add to Thomas?s good suggestions, I suppose one other alternative is to make application changes to break up the 300+ MB allocation into smaller MB allocations. ?This would offer a better opportunity for that humongous allocation to be satisfied. hths, charlie On Mar 31, 2015, at 6:30 AM, Thomas Schatzl wrote: Hi all, On Mon, 2015-03-30 at 20:41 -0500, charlie hunt wrote: Hi Jenny, One possibility is that there is not enough available contiguous regions to satisfy a 300+ MB humongous allocation. If we assume a 22 GB Java heap, (a little larger than the 22480M shown in the log), with 2048 G1 regions (default as you know), the region size would be about 11 MB. That implies there needs to be about 30 contiguous G1 regions available to satisfy the humongous allocation request. An unrelated question ? do other GCs have a similar pattern of a rather large percentage of time in Ref Proc relative to the overall pause time, i.e. 24.7 ms / 120 ms ~ 20% of the pause time. ?If that?s the case, then if -XX:+ParallelRefProcEnabled is not already set, there may be some low hanging tuning fruit. But, it is not going to address the frequent humongous allocation problem. ?It is also interesting in that the pause time goal is 2500 ms, yet the actual pause time is 120 ms, and eden is being sized at less than 1 GB out of a 22 GB Java heap. ?Are the frequent humongous allocations messing with the heap sizing heuristics? While I have no solution for the problem we are aware of these problems: -?https://bugs.openjdk.java.net/browse/JDK-7068229?for dynamically enabling MT reference processing -?https://bugs.openjdk.java.net/browse/JDK-8038487?to use mixed GC instead of Full GC to clear out space for failing humoungous object allocations. I am not sure about what jdk release "JRE 1.17 update 17" actually is. >From the given strings in the PrintGCDetails output, it seems to be something quite old, I would guess jdk6? In that case, if possible I would recommend trying a newer version that improves humongous object handling significantly (e.g. 8u40 is latest official). Another option that works in all versions I am aware of is increasing heap region size with -XX:G1HeapRegionSize=M, where X is 8/16 or 32; it seems that 4M region size has been chosen by ergonomics. Start with the smaller of the suggested values. Thanks, ?Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From Rohit.Chaubey at broadridge.com Tue Apr 7 17:10:16 2015 From: Rohit.Chaubey at broadridge.com (Chaubey, Rohit) Date: Tue, 7 Apr 2015 13:10:16 -0400 Subject: G1 GC tuning Message-ID: Hello We are trying to fine tune our high impact application and would appreciate a little help in doing so using G1GC. Following are the application requirements Application Requirements * Expected load: 40 concurrent users with 1 Txns/user per second. * 99th percentile - 250 ms .Desired all response times to be less than 1 second * 15 GB RAM per JVM for one set(set1) and 12 GB RAM for another set(set 2). Both jvms contain different kinds of data and the application interacts with both to fetch desired output. Both JVMS have the same GC setting. But the kind of data in set 2 is more operation intensive and thus gets more activity that set 1 jvms. * 2x load capacity. Hopefully we can get to 2X. We implemented G1GC replacing the CMS as 1) We have large heaps. Ranging from 8 Gb to 25 GB. The usual problem comes in with the set2 JVMS. Set1 generally does not crash. 2) CMS was crashing the JVM frequently. A full GC cycle would crash the JVM. It used to happen during the nightly batch run for the application. 3) We had Xmn upto 50% of the heap. Once we switched from CMS to G1GC, the set2 jvms stopped crashing frequently. However we did have 2 incidents where the crash did happen. The G1GC parametrs that are being used are as follows: JAVA_ARGS="$JAVA_ARGS --J=-Xss${STACK_SIZE} \ --J=-XX:+UseG1GC \ --J=-XX:MaxGCPauseMillis=200 \ --J=-XX:ParallelGCThreads=20 \ --J=-XX:InitiatingHeapOccupancyPercent=60 \ --J=-XX:SurvivorRatio=2 \ --J=-XX:ConcGCThreads=5 \ --J=-Xmx$SERVER_HEAP \ --J=-Xms$SERVER_HEAP \ --J=-DDistributionManager.DISCONNECT_WAIT=$DISCONNECT_WAIT_TIME \ --J=-XX:+HeapDumpOnOutOfMemoryError \ --J=-XX:HeapDumpPath=${GF_LOG}/jvmdumps \ --J=-DgemfireSecurityPropertyFile=$DIR/$cluster_name/runtime/servers/$SERVER_NAME/gfsecurity.properties --J=-verbose:gc \ --J=-Xloggc:${GF_LOG}/logs/$SERVER_NAME/gc.log \ --J=-XX:+UseGCLogFileRotation --J=-XX:NumberOfGCLogFiles=10 --J=-XX:GCLogFileSize=1m \ --J=-XX:+PrintGCDateStamps --J=-XX:+PrintGCDetails --J=-XX:+PrintTenuringDistribution --J=-XX:+PrintAdaptiveSizePolicy" We tried starting from scratch and worked our way up to the above config with a series of load tests and other activities. Following are the observations that we have seen so far: * When the SurvivorRatio element was removed, then we observed a degraded performance. Thus we kept it at 2 instead of default 8. * There are a handful of humongous allocations(maybe 3 -4 over a day) that are requested during the batch run.The number of those is not that high as you can see from the attached logs. Should I be changing the region size for them? Please let me know what can be changed and tested so that we do not encounter the jvm crash during batch times and also maintain the current sla. I have attached all the gc log files for the cluster. Thanks and Regards, Rohit Chaubey Email: rohit.chaubey at broadridge.com Work: 201-714-3379, BB: 201-618-9230 This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: log files.zip Type: application/x-zip-compressed Size: 2660835 bytes Desc: log files.zip URL: From yanping.wang at intel.com Tue Apr 7 17:53:12 2015 From: yanping.wang at intel.com (Wang, Yanping) Date: Tue, 7 Apr 2015 17:53:12 +0000 Subject: hotspot-gc-use Digest, Vol 85, Issue 6 In-Reply-To: References: Message-ID: <222E9E27A7469F4FA2D137F0724FBD3798C3BE3D@ORSMSX104.amr.corp.intel.com> Hi, Rohit Which version of JDK are you using? It will be nice if you can attach GC logs to help us diagnostic what was going on during runtime. Thanks -yanping -----Original Message----- From: hotspot-gc-use [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of hotspot-gc-use-request at openjdk.java.net Sent: Tuesday, April 07, 2015 10:44 AM To: hotspot-gc-use at openjdk.java.net Subject: hotspot-gc-use Digest, Vol 85, Issue 6 Send hotspot-gc-use mailing list submissions to hotspot-gc-use at openjdk.java.net To subscribe or unsubscribe via the World Wide Web, visit http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use or, via email, send a message with subject or body 'help' to hotspot-gc-use-request at openjdk.java.net You can reach the person managing the list at hotspot-gc-use-owner at openjdk.java.net When replying, please edit your Subject line so it is more specific than "Re: Contents of hotspot-gc-use digest..." Today's Topics: 1. G1 GC tuning (Chaubey, Rohit) ---------------------------------------------------------------------- Message: 1 Date: Tue, 7 Apr 2015 13:10:16 -0400 From: "Chaubey, Rohit" To: "hotspot-gc-use at openjdk.java.net" Subject: G1 GC tuning Message-ID: Content-Type: text/plain; charset="us-ascii" Hello We are trying to fine tune our high impact application and would appreciate a little help in doing so using G1GC. Following are the application requirements Application Requirements * Expected load: 40 concurrent users with 1 Txns/user per second. * 99th percentile - 250 ms .Desired all response times to be less than 1 second * 15 GB RAM per JVM for one set(set1) and 12 GB RAM for another set(set 2). Both jvms contain different kinds of data and the application interacts with both to fetch desired output. Both JVMS have the same GC setting. But the kind of data in set 2 is more operation intensive and thus gets more activity that set 1 jvms. * 2x load capacity. Hopefully we can get to 2X. We implemented G1GC replacing the CMS as 1) We have large heaps. Ranging from 8 Gb to 25 GB. The usual problem comes in with the set2 JVMS. Set1 generally does not crash. 2) CMS was crashing the JVM frequently. A full GC cycle would crash the JVM. It used to happen during the nightly batch run for the application. 3) We had Xmn upto 50% of the heap. Once we switched from CMS to G1GC, the set2 jvms stopped crashing frequently. However we did have 2 incidents where the crash did happen. The G1GC parametrs that are being used are as follows: JAVA_ARGS="$JAVA_ARGS --J=-Xss${STACK_SIZE} \ --J=-XX:+UseG1GC \ --J=-XX:MaxGCPauseMillis=200 \ --J=-XX:ParallelGCThreads=20 \ --J=-XX:InitiatingHeapOccupancyPercent=60 \ --J=-XX:SurvivorRatio=2 \ --J=-XX:ConcGCThreads=5 \ --J=-Xmx$SERVER_HEAP \ --J=-Xms$SERVER_HEAP \ --J=-DDistributionManager.DISCONNECT_WAIT=$DISCONNECT_WAIT_TIME \ --J=-XX:+HeapDumpOnOutOfMemoryError \ --J=-XX:HeapDumpPath=${GF_LOG}/jvmdumps \ --J=-DgemfireSecurityPropertyFile=$DIR/$cluster_name/runtime/servers/$SERVER_NAME/gfsecurity.properties --J=-verbose:gc \ --J=-Xloggc:${GF_LOG}/logs/$SERVER_NAME/gc.log \ --J=-XX:+UseGCLogFileRotation --J=-XX:NumberOfGCLogFiles=10 --J=-XX:GCLogFileSize=1m \ --J=-XX:+PrintGCDateStamps --J=-XX:+PrintGCDetails --J=-XX:+PrintTenuringDistribution --J=-XX:+PrintAdaptiveSizePolicy" We tried starting from scratch and worked our way up to the above config with a series of load tests and other activities. Following are the observations that we have seen so far: * When the SurvivorRatio element was removed, then we observed a degraded performance. Thus we kept it at 2 instead of default 8. * There are a handful of humongous allocations(maybe 3 -4 over a day) that are requested during the batch run.The number of those is not that high as you can see from the attached logs. Should I be changing the region size for them? Please let me know what can be changed and tested so that we do not encounter the jvm crash during batch times and also maintain the current sla. I have attached all the gc log files for the cluster. Thanks and Regards, Rohit Chaubey Email: rohit.chaubey at broadridge.com Work: 201-714-3379, BB: 201-618-9230 This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: log files.zip Type: application/x-zip-compressed Size: 2660835 bytes Desc: log files.zip URL: ------------------------------ Subject: Digest Footer _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use ------------------------------ End of hotspot-gc-use Digest, Vol 85, Issue 6 ********************************************* From jon.masamitsu at oracle.com Tue Apr 7 18:07:14 2015 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Tue, 07 Apr 2015 11:07:14 -0700 Subject: G1 GC tuning In-Reply-To: References: Message-ID: <55241CD2.8060401@oracle.com> Rohit, Which release of the JDK are you using? Jon On 04/07/2015 10:10 AM, Chaubey, Rohit wrote: > > Hello > > We are trying to fine tune our high impact application and would > appreciate a little help in doing so using G1GC. Following are the > application requirements > > Application Requirements > > ? Expected load: 40 concurrent users with 1 Txns/user per second. > > ? 99th percentile ? 250 ms .Desired all response times to be less than > 1 second > > ? 15 GB RAM per JVM for one set(set1) and 12 GB RAM for another > set(set 2). Both jvms contain different kinds of data and the > application interacts with both to fetch desired output. Both JVMS > have the same GC setting. But the kind of data in set 2 is more > operation intensive and thus gets more activity that set 1 jvms. > > ? 2x load capacity. Hopefully we can get to 2X. > > We implemented G1GC replacing the CMS as > > 1)We have large heaps. Ranging from 8 Gb to 25 GB. The usual problem > comes in with the set2 JVMS. Set1 generally does not crash. > > 2)CMS was crashing the JVM frequently. A full GC cycle would crash the > JVM. It used to happen during the nightly batch run for the application. > > 3)We had Xmn upto 50% of the heap. > > Once we switched from CMS to G1GC, the set2 jvms stopped crashing > frequently. However we did have 2 incidents where the crash did > happen. The G1GC parametrs that are being used are as follows: > > JAVA_ARGS="$JAVA_ARGS --J=-Xss${STACK_SIZE} \ > > --J=-XX:+UseG1GC \ > > --J=-XX:MaxGCPauseMillis=200 \ > > --J=-XX:ParallelGCThreads=20 \ > > --J=-XX:InitiatingHeapOccupancyPercent=60 \ > > --J=-XX:SurvivorRatio=2 \ > > --J=-XX:ConcGCThreads=5 \ > > --J=-Xmx$SERVER_HEAP \ > > --J=-Xms$SERVER_HEAP \ > > --J=-DDistributionManager.DISCONNECT_WAIT=$DISCONNECT_WAIT_TIME \ > > --J=-XX:+HeapDumpOnOutOfMemoryError \ > > --J=-XX:HeapDumpPath=${GF_LOG}/jvmdumps \ > > --J=-DgemfireSecurityPropertyFile=$DIR/$cluster_name/runtime/servers/$SERVER_NAME/gfsecurity.properties > > --J=-verbose:gc \ > > --J=-Xloggc:${GF_LOG}/logs/$SERVER_NAME/gc.log \ > > --J=-XX:+UseGCLogFileRotation --J=-XX:NumberOfGCLogFiles=10 > --J=-XX:GCLogFileSize=1m \ > > --J=-XX:+PrintGCDateStamps --J=-XX:+PrintGCDetails > --J=-XX:+PrintTenuringDistribution --J=-XX:+PrintAdaptiveSizePolicy" > > We tried starting from scratch and worked our way up to the above > config with a series of load tests and other activities. Following are > the observations that we have seen so far: > > ?/When the SurvivorRatio element was removed, then we observed a > degraded performance/. /Thus we kept it at 2 instead of default 8./ > > ?/There are a handful of humongous allocations(maybe 3 -4 over a day) > that are requested during the batch run.The number of those is not > that high as you can see from the attached logs. Should I be changing > the region size for them?/ > > Please let me know what can be changed and tested so that we do not > encounter the jvm crash during batch times and also maintain the > current sla. I have attached all the gc log files for the cluster. > > Thanks and Regards, > > Rohit Chaubey > > Email: rohit.chaubey at broadridge.com > > Work: 201-714-3379, BB: 201-618-9230 > > > This message and any attachments are intended only for the use of the > addressee and may contain information that is privileged and > confidential. If the reader of the message is not the intended > recipient or an authorized representative of the intended recipient, > you are hereby notified that any dissemination of this communication > is strictly prohibited. If you have received this communication in > error, please notify us immediately by e-mail and delete the message > and any attachments from your system. > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.masamitsu at oracle.com Tue Apr 7 20:39:32 2015 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Tue, 07 Apr 2015 13:39:32 -0700 Subject: G1 GC tuning In-Reply-To: References: Message-ID: <55244084.5010904@oracle.com> I got some clarification on this from Rohit. The "crashes" are actually the JVM being shutdown because of non-responsiveness due to full GC's. Jon On 04/07/2015 10:10 AM, Chaubey, Rohit wrote: > > Hello > > We are trying to fine tune our high impact application and would > appreciate a little help in doing so using G1GC. Following are the > application requirements > > Application Requirements > > ? Expected load: 40 concurrent users with 1 Txns/user per second. > > ? 99th percentile ? 250 ms .Desired all response times to be less than > 1 second > > ? 15 GB RAM per JVM for one set(set1) and 12 GB RAM for another > set(set 2). Both jvms contain different kinds of data and the > application interacts with both to fetch desired output. Both JVMS > have the same GC setting. But the kind of data in set 2 is more > operation intensive and thus gets more activity that set 1 jvms. > > ? 2x load capacity. Hopefully we can get to 2X. > > We implemented G1GC replacing the CMS as > > 1)We have large heaps. Ranging from 8 Gb to 25 GB. The usual problem > comes in with the set2 JVMS. Set1 generally does not crash. > > 2)CMS was crashing the JVM frequently. A full GC cycle would crash the > JVM. It used to happen during the nightly batch run for the application. > > 3)We had Xmn upto 50% of the heap. > > Once we switched from CMS to G1GC, the set2 jvms stopped crashing > frequently. However we did have 2 incidents where the crash did > happen. The G1GC parametrs that are being used are as follows: > > JAVA_ARGS="$JAVA_ARGS --J=-Xss${STACK_SIZE} \ > > --J=-XX:+UseG1GC \ > > --J=-XX:MaxGCPauseMillis=200 \ > > --J=-XX:ParallelGCThreads=20 \ > > --J=-XX:InitiatingHeapOccupancyPercent=60 \ > > --J=-XX:SurvivorRatio=2 \ > > --J=-XX:ConcGCThreads=5 \ > > --J=-Xmx$SERVER_HEAP \ > > --J=-Xms$SERVER_HEAP \ > > --J=-DDistributionManager.DISCONNECT_WAIT=$DISCONNECT_WAIT_TIME \ > > --J=-XX:+HeapDumpOnOutOfMemoryError \ > > --J=-XX:HeapDumpPath=${GF_LOG}/jvmdumps \ > > --J=-DgemfireSecurityPropertyFile=$DIR/$cluster_name/runtime/servers/$SERVER_NAME/gfsecurity.properties > > --J=-verbose:gc \ > > --J=-Xloggc:${GF_LOG}/logs/$SERVER_NAME/gc.log \ > > --J=-XX:+UseGCLogFileRotation --J=-XX:NumberOfGCLogFiles=10 > --J=-XX:GCLogFileSize=1m \ > > --J=-XX:+PrintGCDateStamps --J=-XX:+PrintGCDetails > --J=-XX:+PrintTenuringDistribution --J=-XX:+PrintAdaptiveSizePolicy" > > We tried starting from scratch and worked our way up to the above > config with a series of load tests and other activities. Following are > the observations that we have seen so far: > > ?/When the SurvivorRatio element was removed, then we observed a > degraded performance/. /Thus we kept it at 2 instead of default 8./ > > ?/There are a handful of humongous allocations(maybe 3 -4 over a day) > that are requested during the batch run.The number of those is not > that high as you can see from the attached logs. Should I be changing > the region size for them?/ > > Please let me know what can be changed and tested so that we do not > encounter the jvm crash during batch times and also maintain the > current sla. I have attached all the gc log files for the cluster. > > Thanks and Regards, > > Rohit Chaubey > > Email: rohit.chaubey at broadridge.com > > Work: 201-714-3379, BB: 201-618-9230 > > > This message and any attachments are intended only for the use of the > addressee and may contain information that is privileged and > confidential. If the reader of the message is not the intended > recipient or an authorized representative of the intended recipient, > you are hereby notified that any dissemination of this communication > is strictly prohibited. If you have received this communication in > error, please notify us immediately by e-mail and delete the message > and any attachments from your system. > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.masamitsu at oracle.com Tue Apr 7 22:30:51 2015 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Tue, 07 Apr 2015 15:30:51 -0700 Subject: G1 GC tuning In-Reply-To: References: Message-ID: <55245A9B.4030400@oracle.com> Rohit, Is there any type of change of behavior of the applications around the 45k second mark? Around where the full GC's sometimes happen. Assuming nothing special is happening there, when you ran with CMS you were using half the heap for the young gen. Did you try G1 with that sized young gen? If you have not tried that and have a test setup where you can try it, I'd be interested in the results. I understand that specifying a size for young gen limits how G1 controls pauses. G1 is perhaps making some wrong decisions so I'd like to see how it behaves in a more constrained environment. Thanks. Jon On 04/07/2015 10:10 AM, Chaubey, Rohit wrote: > > Hello > > We are trying to fine tune our high impact application and would > appreciate a little help in doing so using G1GC. Following are the > application requirements > > Application Requirements > > ? Expected load: 40 concurrent users with 1 Txns/user per second. > > ? 99th percentile ? 250 ms .Desired all response times to be less than > 1 second > > ? 15 GB RAM per JVM for one set(set1) and 12 GB RAM for another > set(set 2). Both jvms contain different kinds of data and the > application interacts with both to fetch desired output. Both JVMS > have the same GC setting. But the kind of data in set 2 is more > operation intensive and thus gets more activity that set 1 jvms. > > ? 2x load capacity. Hopefully we can get to 2X. > > We implemented G1GC replacing the CMS as > > 1)We have large heaps. Ranging from 8 Gb to 25 GB. The usual problem > comes in with the set2 JVMS. Set1 generally does not crash. > > 2)CMS was crashing the JVM frequently. A full GC cycle would crash the > JVM. It used to happen during the nightly batch run for the application. > > 3)We had Xmn upto 50% of the heap. > > Once we switched from CMS to G1GC, the set2 jvms stopped crashing > frequently. However we did have 2 incidents where the crash did > happen. The G1GC parametrs that are being used are as follows: > > JAVA_ARGS="$JAVA_ARGS --J=-Xss${STACK_SIZE} \ > > --J=-XX:+UseG1GC \ > > --J=-XX:MaxGCPauseMillis=200 \ > > --J=-XX:ParallelGCThreads=20 \ > > --J=-XX:InitiatingHeapOccupancyPercent=60 \ > > --J=-XX:SurvivorRatio=2 \ > > --J=-XX:ConcGCThreads=5 \ > > --J=-Xmx$SERVER_HEAP \ > > --J=-Xms$SERVER_HEAP \ > > --J=-DDistributionManager.DISCONNECT_WAIT=$DISCONNECT_WAIT_TIME \ > > --J=-XX:+HeapDumpOnOutOfMemoryError \ > > --J=-XX:HeapDumpPath=${GF_LOG}/jvmdumps \ > > --J=-DgemfireSecurityPropertyFile=$DIR/$cluster_name/runtime/servers/$SERVER_NAME/gfsecurity.properties > > --J=-verbose:gc \ > > --J=-Xloggc:${GF_LOG}/logs/$SERVER_NAME/gc.log \ > > --J=-XX:+UseGCLogFileRotation --J=-XX:NumberOfGCLogFiles=10 > --J=-XX:GCLogFileSize=1m \ > > --J=-XX:+PrintGCDateStamps --J=-XX:+PrintGCDetails > --J=-XX:+PrintTenuringDistribution --J=-XX:+PrintAdaptiveSizePolicy" > > We tried starting from scratch and worked our way up to the above > config with a series of load tests and other activities. Following are > the observations that we have seen so far: > > ?/When the SurvivorRatio element was removed, then we observed a > degraded performance/. /Thus we kept it at 2 instead of default 8./ > > ?/There are a handful of humongous allocations(maybe 3 -4 over a day) > that are requested during the batch run.The number of those is not > that high as you can see from the attached logs. Should I be changing > the region size for them?/ > > Please let me know what can be changed and tested so that we do not > encounter the jvm crash during batch times and also maintain the > current sla. I have attached all the gc log files for the cluster. > > Thanks and Regards, > > Rohit Chaubey > > Email: rohit.chaubey at broadridge.com > > Work: 201-714-3379, BB: 201-618-9230 > > > This message and any attachments are intended only for the use of the > addressee and may contain information that is privileged and > confidential. If the reader of the message is not the intended > recipient or an authorized representative of the intended recipient, > you are hereby notified that any dissemination of this communication > is strictly prohibited. If you have received this communication in > error, please notify us immediately by e-mail and delete the message > and any attachments from your system. > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Tue Apr 7 22:59:54 2015 From: yu.zhang at oracle.com (Yu Zhang) Date: Tue, 07 Apr 2015 15:59:54 -0700 Subject: G1 GC tuning In-Reply-To: References: Message-ID: <5524616A.8020602@oracle.com> Rohit, Thanks for the logs. I took a look at 2 logs: edppblaspga02-gc/edppblaspga02-pr-cluster-2-platinum-server-2/gc.log.0 for humongous objects edppblaspga01-gc/edppblaspga01-pr-cluster-2-platinum-server-2/gc.log.0 for Full gcs For both logs, around timestamp 45,000,00, the allocation pattern changed. The age-1 objects jumps from ~2M to ~300M. Because g1 can not meet the 200ms pause time goal, it tries to push young gen size from 3.5G to 2g or even 500M. On the other hand, most of the objects do not die within age 3. you mentioned " /When the SurvivorRatio element was removed, then we observed a degraded performance/. /Thus we kept it at 2 instead of default 8." /Does it mean you observed more full gcs, or the young/mixed gc pause time goes up? My guess is with much smaller young gen, but objects do not die till age 3, bigger survivor space is better. About the Full gc, since heap after full gc is ~9G, about 74% (9g/12g) of the heap size, you can try in increasing -XX:InitiatingHeapOccupancyPercent=75 before the allocation pattern change, the marking cycles do not lead to any heap usage reduction. During the allocation spike, there is 1 mixed gc per marking cycle. So increasing it should save some marking cycles, and not hurting the mixed gc. The mixed gcs reduce about 1g heap, but not all of the are old gen. can you try-XX:G1HeapWastePercent=5? In the logs, the value is 10. So g1 stops mixed gc. The default is 5 in later builds, but was 5 before 8u40. So I am guessing you are using a before 8u40 build. Another thing you can try is -XX:+UnlockExperimentalVMOptions -XX:G1MixedGCLiveThresholdPercent=85. The default is 85 in later builds, but was 65 before 8u40. Can we start with those 3 and see if it is better? The Humongous objects are around 4m. Yes, we might waste some heap since the region size is 4M. But for now, I do not see it leads to full gcs. So let's keep it this way. If you see it is a problem after those 3 parameters, we can try to increase the region size. Thanks, Jenny On 4/7/2015 10:10 AM, Chaubey, Rohit wrote: > > Hello > > We are trying to fine tune our high impact application and would > appreciate a little help in doing so using G1GC. Following are the > application requirements > > Application Requirements > > ? Expected load: 40 concurrent users with 1 Txns/user per second. > > ? 99th percentile ? 250 ms .Desired all response times to be less than > 1 second > > ? 15 GB RAM per JVM for one set(set1) and 12 GB RAM for another > set(set 2). Both jvms contain different kinds of data and the > application interacts with both to fetch desired output. Both JVMS > have the same GC setting. But the kind of data in set 2 is more > operation intensive and thus gets more activity that set 1 jvms. > > ? 2x load capacity. Hopefully we can get to 2X. > > We implemented G1GC replacing the CMS as > > 1)We have large heaps. Ranging from 8 Gb to 25 GB. The usual problem > comes in with the set2 JVMS. Set1 generally does not crash. > > 2)CMS was crashing the JVM frequently. A full GC cycle would crash the > JVM. It used to happen during the nightly batch run for the application. > > 3)We had Xmn upto 50% of the heap. > > Once we switched from CMS to G1GC, the set2 jvms stopped crashing > frequently. However we did have 2 incidents where the crash did > happen. The G1GC parametrs that are being used are as follows: > > JAVA_ARGS="$JAVA_ARGS --J=-Xss${STACK_SIZE} \ > > --J=-XX:+UseG1GC \ > > --J=-XX:MaxGCPauseMillis=200 \ > > --J=-XX:ParallelGCThreads=20 \ > > --J=-XX:InitiatingHeapOccupancyPercent=60 \ > > --J=-XX:SurvivorRatio=2 \ > > --J=-XX:ConcGCThreads=5 \ > > --J=-Xmx$SERVER_HEAP \ > > --J=-Xms$SERVER_HEAP \ > > --J=-DDistributionManager.DISCONNECT_WAIT=$DISCONNECT_WAIT_TIME \ > > --J=-XX:+HeapDumpOnOutOfMemoryError \ > > --J=-XX:HeapDumpPath=${GF_LOG}/jvmdumps \ > > --J=-DgemfireSecurityPropertyFile=$DIR/$cluster_name/runtime/servers/$SERVER_NAME/gfsecurity.properties > > --J=-verbose:gc \ > > --J=-Xloggc:${GF_LOG}/logs/$SERVER_NAME/gc.log \ > > --J=-XX:+UseGCLogFileRotation --J=-XX:NumberOfGCLogFiles=10 > --J=-XX:GCLogFileSize=1m \ > > --J=-XX:+PrintGCDateStamps --J=-XX:+PrintGCDetails > --J=-XX:+PrintTenuringDistribution --J=-XX:+PrintAdaptiveSizePolicy" > > We tried starting from scratch and worked our way up to the above > config with a series of load tests and other activities. Following are > the observations that we have seen so far: > > ?/When the SurvivorRatio element was removed, then we observed a > degraded performance/. /Thus we kept it at 2 instead of default 8./ > > ?/There are a handful of humongous allocations(maybe 3 -4 over a day) > that are requested during the batch run.The number of those is not > that high as you can see from the attached logs. Should I be changing > the region size for them?/ > > Please let me know what can be changed and tested so that we do not > encounter the jvm crash during batch times and also maintain the > current sla. I have attached all the gc log files for the cluster. > > Thanks and Regards, > > Rohit Chaubey > > Email: rohit.chaubey at broadridge.com > > Work: 201-714-3379, BB: 201-618-9230 > > > This message and any attachments are intended only for the use of the > addressee and may contain information that is privileged and > confidential. If the reader of the message is not the intended > recipient or an authorized representative of the intended recipient, > you are hereby notified that any dissemination of this communication > is strictly prohibited. If you have received this communication in > error, please notify us immediately by e-mail and delete the message > and any attachments from your system. > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.williams at mayalane.com Thu Apr 9 22:07:28 2015 From: brian.williams at mayalane.com (Brian Williams) Date: Thu, 9 Apr 2015 18:07:28 -0400 Subject: potential causes of long remark pause? Message-ID: <20D7D0A2-5571-4E42-B411-A75A6ED9178A@mayalane.com> I'm looking for ideas on what would lead to a 5 second CMS Remark pause. Looking through previous GC logs, the largest remark pause I could find was .58 seconds, with the average being .15. The snippet of the GC log that covers the full CMS cycle is below. We're using Java 1.7.0_55 running on "Linux 2.6.32-431.29.2.el6.x86_64 amd64" with 24 virtual CPUs. Here are the JVM options as reported by -XX:+PrintCommandLineFlags: -XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:CMSMaxAbortablePrecleanTime=3600000 -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled -XX:+CMSScavengeBeforeRemark -XX:ConcGCThreads=6 -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=94489280512 -XX:MaxHeapSize=94489280512 -XX:MaxNewSize=2147483648 -XX:MaxTenuringThreshold=6 -XX:NewSize=2147483648 -XX:OldPLABSize=16 -XX:PermSize=67108864 -XX:+PrintCommandLineFlags -XX:PrintFLSStatistics=1 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:RefDiscoveryPolicy=1 -XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX:+UseParNewGC And the details of the CMS collection with the long remark pause: 2015-04-09T12:36:10.393-0700: [GC [1 CMS-initial-mark: 45088837K(90177536K)] 45111889K(92065024K), 0.0296420 secs] [Times: user=0.01 sys=0.00, real=0.03 secs] 2015-04-09T12:36:24.055-0700: [CMS-concurrent-mark: 13.633/13.633 secs] [Times: user=75.95 sys=2.86, real=13.63 secs] 2015-04-09T12:36:24.371-0700: [CMS-concurrent-preclean: 0.315/0.315 secs] [Times: user=0.44 sys=0.02, real=0.32 secs] 2015-04-09T12:36:29.659-0700: [GCBefore GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 1476244390 Max Chunk Size: 1476244390 Number of Blocks: 1 Av. Block Size: 1476244390 Tree Height: 1 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 2015-04-09T12:36:29.660-0700: [ParNew: 1680929K->4322K(1887488K), 0.0794930 secs] 46769766K->45093205K(92065024K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 1476237004 Max Chunk Size: 1476237004 Number of Blocks: 1 Av. Block Size: 1476237004 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 0.0797770 secs] [Times: user=1.40 sys=0.00, real=0.08 secs] 2015-04-09T12:36:38.230-0700: [CMS-concurrent-abortable-preclean: 13.774/13.859 secs] [Times: user=18.14 sys=0.50, real=13.86 secs] 2015-04-09T12:36:38.232-0700: [GC[YG occupancy: 845557 K (1887488 K)]2015-04-09T12:36:38.232-0700: [GCBefore GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 1476237004 Max Chunk Size: 1476237004 Number of Blocks: 1 Av. Block Size: 1476237004 Tree Height: 1 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 2015-04-09T12:36:38.232-0700: [ParNew: 845557K->4086K(1887488K), 0.0805340 secs] 45934440K->45093032K(92065024K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 1476225574 Max Chunk Size: 1476225574 Number of Blocks: 1 Av. Block Size: 1476225574 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 0.0808020 secs] [Times: user=1.42 sys=0.00, real=0.08 secs] 2015-04-09T12:36:38.313-0700: [Rescan (parallel) , 0.0218780 secs]2015-04-09T12:36:38.334-0700: [weak refs processing, 5.0944360 secs]2015-04-09T12:36:43.429-0700: [scrub string table, 0.0013140 secs] [1 CMS-remark: 45088945K(90177536K)] 45093032K(92065024K), 5.3492750 secs] [Times: user=6.98 sys=0.00, real=5.35 secs] CMS: Large Block: 0x00007f8e42f4aed0; Proximity: 0x0000000000000000 -> 0x00007f8e276f5b08 CMS: Large block 0x00007f8e42f4aed0 2015-04-09T12:36:55.505-0700: [CMS-concurrent-sweep: 11.913/11.924 secs] [Times: user=16.93 sys=0.26, real=11.93 secs] 2015-04-09T12:36:55.703-0700: [CMS-concurrent-reset: 0.187/0.187 secs] [Times: user=0.21 sys=0.00, real=0.19 secs] From jon.masamitsu at oracle.com Fri Apr 10 17:24:58 2015 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Fri, 10 Apr 2015 10:24:58 -0700 Subject: potential causes of long remark pause? In-Reply-To: <5528071C.2040907@oracle.com> References: <20D7D0A2-5571-4E42-B411-A75A6ED9178A@mayalane.com> <5528071C.2040907@oracle.com> Message-ID: <5528076A.4000404@oracle.com> Including the hotspot-gc-use. On 4/10/2015 10:23 AM, Jon Masamitsu wrote: > Brian, > > It looks like the weak reference processing. > > 2015-04-09T12:36:38.313-0700: [Rescan (parallel) , 0.0218780 secs]2015-04-09T12:36:38.334-0700:[weak refs processing, 5.0944360 secs] > > If you turn on PrintReferenceGC, you might get more information. > If you usually see long reference processing times, try > turning on ParallelRefProcEnabled. But if you only see > very intermittent spikes in reference processing times, > it might not be worth it (the parallel overhead will slow > down the remark times). > > Jon > > On 4/9/2015 3:07 PM, Brian Williams wrote: >> I'm looking for ideas on what would lead to a 5 second CMS Remark pause. Looking through previous GC logs, the largest remark pause I could find was .58 seconds, with the average being .15. The snippet of the GC log that covers the full CMS cycle is below. >> >> We're using Java 1.7.0_55 running on "Linux 2.6.32-431.29.2.el6.x86_64 amd64" with 24 virtual CPUs. >> >> Here are the JVM options as reported by -XX:+PrintCommandLineFlags: >> >> -XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:CMSMaxAbortablePrecleanTime=3600000 -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled -XX:+CMSScavengeBeforeRemark -XX:ConcGCThreads=6 -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=94489280512 -XX:MaxHeapSize=94489280512 -XX:MaxNewSize=2147483648 -XX:MaxTenuringThreshold=6 -XX:NewSize=2147483648 -XX:OldPLABSize=16 -XX:PermSize=67108864 -XX:+PrintCommandLineFlags -XX:PrintFLSStatistics=1 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:RefDiscoveryPolicy=1 -XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX:+UseParNewGC >> >> And the details of the CMS collection with the long remark pause: >> >> 2015-04-09T12:36:10.393-0700: [GC [1 CMS-initial-mark: 45088837K(90177536K)] 45111889K(92065024K), 0.0296420 secs] [Times: user=0.01 sys=0.00, real=0.03 secs] >> 2015-04-09T12:36:24.055-0700: [CMS-concurrent-mark: 13.633/13.633 secs] [Times: user=75.95 sys=2.86, real=13.63 secs] >> 2015-04-09T12:36:24.371-0700: [CMS-concurrent-preclean: 0.315/0.315 secs] [Times: user=0.44 sys=0.02, real=0.32 secs] >> 2015-04-09T12:36:29.659-0700: [GCBefore GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1476244390 >> Max Chunk Size: 1476244390 >> Number of Blocks: 1 >> Av. Block Size: 1476244390 >> Tree Height: 1 >> Before GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 0 >> Max Chunk Size: 0 >> Number of Blocks: 0 >> Tree Height: 0 >> 2015-04-09T12:36:29.660-0700: [ParNew: 1680929K->4322K(1887488K), 0.0794930 secs] 46769766K->45093205K(92065024K)After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1476237004 >> Max Chunk Size: 1476237004 >> Number of Blocks: 1 >> Av. Block Size: 1476237004 >> Tree Height: 1 >> After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 0 >> Max Chunk Size: 0 >> Number of Blocks: 0 >> Tree Height: 0 >> , 0.0797770 secs] [Times: user=1.40 sys=0.00, real=0.08 secs] >> 2015-04-09T12:36:38.230-0700: [CMS-concurrent-abortable-preclean: 13.774/13.859 secs] [Times: user=18.14 sys=0.50, real=13.86 secs] >> 2015-04-09T12:36:38.232-0700: [GC[YG occupancy: 845557 K (1887488 K)]2015-04-09T12:36:38.232-0700: [GCBefore GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1476237004 >> Max Chunk Size: 1476237004 >> Number of Blocks: 1 >> Av. Block Size: 1476237004 >> Tree Height: 1 >> Before GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 0 >> Max Chunk Size: 0 >> Number of Blocks: 0 >> Tree Height: 0 >> 2015-04-09T12:36:38.232-0700: [ParNew: 845557K->4086K(1887488K), 0.0805340 secs] 45934440K->45093032K(92065024K)After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1476225574 >> Max Chunk Size: 1476225574 >> Number of Blocks: 1 >> Av. Block Size: 1476225574 >> Tree Height: 1 >> After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 0 >> Max Chunk Size: 0 >> Number of Blocks: 0 >> Tree Height: 0 >> , 0.0808020 secs] [Times: user=1.42 sys=0.00, real=0.08 secs] >> 2015-04-09T12:36:38.313-0700: [Rescan (parallel) , 0.0218780 secs]2015-04-09T12:36:38.334-0700: [weak refs processing, 5.0944360 secs]2015-04-09T12:36:43.429-0700: [scrub string table, 0.0013140 secs] [1 CMS-remark: 45088945K(90177536K)] 45093032K(92065024K), 5.3492750 secs] [Times: user=6.98 sys=0.00, real=5.35 secs] >> CMS: Large Block: 0x00007f8e42f4aed0; Proximity: 0x0000000000000000 -> 0x00007f8e276f5b08 >> CMS: Large block 0x00007f8e42f4aed0 >> 2015-04-09T12:36:55.505-0700: [CMS-concurrent-sweep: 11.913/11.924 secs] [Times: user=16.93 sys=0.26, real=11.93 secs] >> 2015-04-09T12:36:55.703-0700: [CMS-concurrent-reset: 0.187/0.187 secs] [Times: user=0.21 sys=0.00, real=0.19 secs] >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.williams at mayalane.com Fri Apr 10 23:04:33 2015 From: brian.williams at mayalane.com (Brian Williams) Date: Fri, 10 Apr 2015 19:04:33 -0400 Subject: potential causes of long remark pause? In-Reply-To: <5528071C.2040907@oracle.com> References: <20D7D0A2-5571-4E42-B411-A75A6ED9178A@mayalane.com> <5528071C.2040907@oracle.com> Message-ID: <68362647-E7C0-4573-AAEF-3576BF6F1A48@mayalane.com> Thanks Jon. We noticed that as well. We do not create any WeakReferences within our application directly, and we couldn't find any in the libraries that we use. Perhaps they are coming from something within the JVM libraries. To our knowledge nothing has changed in the application load, so we were surprised to have such a large outlier. I guess we'll just see if it happens again and look for a pattern. Brian > On Apr 10, 2015, at 1:23 PM, Jon Masamitsu wrote: > > Brian, > > It looks like the weak reference processing. > > 2015-04-09T12:36:38.313-0700: [Rescan (parallel) , 0.0218780 secs]2015-04-09T12:36:38.334-0700: [weak refs processing, 5.0944360 secs] > > > If you turn on PrintReferenceGC, you might get more information. > If you usually see long reference processing times, try > turning on ParallelRefProcEnabled. But if you only see > very intermittent spikes in reference processing times, > it might not be worth it (the parallel overhead will slow > down the remark times). > > Jon > > > On 4/9/2015 3:07 PM, Brian Williams wrote: >> I'm looking for ideas on what would lead to a 5 second CMS Remark pause. Looking through previous GC logs, the largest remark pause I could find was .58 seconds, with the average being .15. The snippet of the GC log that covers the full CMS cycle is below. >> >> We're using Java 1.7.0_55 running on "Linux 2.6.32-431.29.2.el6.x86_64 amd64" with 24 virtual CPUs. >> >> Here are the JVM options as reported by -XX:+PrintCommandLineFlags: >> >> -XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:CMSMaxAbortablePrecleanTime=3600000 -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled -XX:+CMSScavengeBeforeRemark -XX:ConcGCThreads=6 -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=94489280512 -XX:MaxHeapSize=94489280512 -XX:MaxNewSize=2147483648 -XX:MaxTenuringThreshold=6 -XX:NewSize=2147483648 -XX:OldPLABSize=16 -XX:PermSize=67108864 -XX:+PrintCommandLineFlags -XX:PrintFLSStatistics=1 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:RefDiscoveryPolicy=1 -XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX:+UseParNewGC >> >> And the details of the CMS collection with the long remark pause: >> >> 2015-04-09T12:36:10.393-0700: [GC [1 CMS-initial-mark: 45088837K(90177536K)] 45111889K(92065024K), 0.0296420 secs] [Times: user=0.01 sys=0.00, real=0.03 secs] >> 2015-04-09T12:36:24.055-0700: [CMS-concurrent-mark: 13.633/13.633 secs] [Times: user=75.95 sys=2.86, real=13.63 secs] >> 2015-04-09T12:36:24.371-0700: [CMS-concurrent-preclean: 0.315/0.315 secs] [Times: user=0.44 sys=0.02, real=0.32 secs] >> 2015-04-09T12:36:29.659-0700: [GCBefore GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1476244390 >> Max Chunk Size: 1476244390 >> Number of Blocks: 1 >> Av. Block Size: 1476244390 >> Tree Height: 1 >> Before GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 0 >> Max Chunk Size: 0 >> Number of Blocks: 0 >> Tree Height: 0 >> 2015-04-09T12:36:29.660-0700: [ParNew: 1680929K->4322K(1887488K), 0.0794930 secs] 46769766K->45093205K(92065024K)After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1476237004 >> Max Chunk Size: 1476237004 >> Number of Blocks: 1 >> Av. Block Size: 1476237004 >> Tree Height: 1 >> After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 0 >> Max Chunk Size: 0 >> Number of Blocks: 0 >> Tree Height: 0 >> , 0.0797770 secs] [Times: user=1.40 sys=0.00, real=0.08 secs] >> 2015-04-09T12:36:38.230-0700: [CMS-concurrent-abortable-preclean: 13.774/13.859 secs] [Times: user=18.14 sys=0.50, real=13.86 secs] >> 2015-04-09T12:36:38.232-0700: [GC[YG occupancy: 845557 K (1887488 K)]2015-04-09T12:36:38.232-0700: [GCBefore GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1476237004 >> Max Chunk Size: 1476237004 >> Number of Blocks: 1 >> Av. Block Size: 1476237004 >> Tree Height: 1 >> Before GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 0 >> Max Chunk Size: 0 >> Number of Blocks: 0 >> Tree Height: 0 >> 2015-04-09T12:36:38.232-0700: [ParNew: 845557K->4086K(1887488K), 0.0805340 secs] 45934440K->45093032K(92065024K)After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 1476225574 >> Max Chunk Size: 1476225574 >> Number of Blocks: 1 >> Av. Block Size: 1476225574 >> Tree Height: 1 >> After GC: >> Statistics for BinaryTreeDictionary: >> ------------------------------------ >> Total Free Space: 0 >> Max Chunk Size: 0 >> Number of Blocks: 0 >> Tree Height: 0 >> , 0.0808020 secs] [Times: user=1.42 sys=0.00, real=0.08 secs] >> 2015-04-09T12:36:38.313-0700: [Rescan (parallel) , 0.0218780 secs]2015-04-09T12:36:38.334-0700: [weak refs processing, 5.0944360 secs]2015-04-09T12:36:43.429-0700: [scrub string table, 0.0013140 secs] [1 CMS-remark: 45088945K(90177536K)] 45093032K(92065024K), 5.3492750 secs] [Times: user=6.98 sys=0.00, real=5.35 secs] >> CMS: Large Block: 0x00007f8e42f4aed0; Proximity: 0x0000000000000000 -> 0x00007f8e276f5b08 >> CMS: Large block 0x00007f8e42f4aed0 >> 2015-04-09T12:36:55.505-0700: [CMS-concurrent-sweep: 11.913/11.924 secs] [Times: user=16.93 sys=0.26, real=11.93 secs] >> 2015-04-09T12:36:55.703-0700: [CMS-concurrent-reset: 0.187/0.187 secs] [Times: user=0.21 sys=0.00, real=0.19 secs] >> _______________________________________________ >> hotspot-gc-use mailing list >> >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From ysr1729 at gmail.com Sat Apr 11 02:16:15 2015 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 10 Apr 2015 19:16:15 -0700 Subject: potential causes of long remark pause? In-Reply-To: <68362647-E7C0-4573-AAEF-3576BF6F1A48@mayalane.com> References: <20D7D0A2-5571-4E42-B411-A75A6ED9178A@mayalane.com> <5528071C.2040907@oracle.com> <68362647-E7C0-4573-AAEF-3576BF6F1A48@mayalane.com> Message-ID: Hi Brian -- Note that the jvm terminology "weak refs processing" involves not just Java's WeakReferences, but indeed all subtypes of java.lang.Reference together. Sorry if that may have caused confusion. In particular, therefore, it includes the time spent by the garbage collector in dealing with SoftReferences, WeakReferences, FinalReferences, PhantomReferences and JNI weak global references. I'd go with Jon's suggestion of turning on -XX:+PrintReferenceGC to get at the details of what kind of references caused the issue in your case, and thence whether -XX:+ParallelRefProcEnabled would help. Relatedly, it might make sense for HotSpot's concurrent collectors to consider if some form of the Ugawa-Jones-Riston technique may be gainfully applied to reduce the pause time necessary for Reference object processing by GC. -- ramki On Fri, Apr 10, 2015 at 4:04 PM, Brian Williams wrote: > Thanks Jon. We noticed that as well. We do not create any WeakReferences > within our application directly, and we couldn't find any in the libraries > that we use. Perhaps they are coming from something within the JVM > libraries. To our knowledge nothing has changed in the application load, so > we were surprised to have such a large outlier. I guess we'll just see if > it happens again and look for a pattern. > > Brian > > > On Apr 10, 2015, at 1:23 PM, Jon Masamitsu > wrote: > > > > Brian, > > > > It looks like the weak reference processing. > > > > 2015-04-09T12:36:38.313-0700: [Rescan (parallel) , 0.0218780 > secs]2015-04-09T12:36:38.334-0700: [weak refs processing, 5.0944360 secs] > > > > > > If you turn on PrintReferenceGC, you might get more information. > > If you usually see long reference processing times, try > > turning on ParallelRefProcEnabled. But if you only see > > very intermittent spikes in reference processing times, > > it might not be worth it (the parallel overhead will slow > > down the remark times). > > > > Jon > > > > > > On 4/9/2015 3:07 PM, Brian Williams wrote: > >> I'm looking for ideas on what would lead to a 5 second CMS Remark > pause. Looking through previous GC logs, the largest remark pause I could > find was .58 seconds, with the average being .15. The snippet of the GC log > that covers the full CMS cycle is below. > >> > >> We're using Java 1.7.0_55 running on "Linux 2.6.32-431.29.2.el6.x86_64 > amd64" with 24 virtual CPUs. > >> > >> Here are the JVM options as reported by -XX:+PrintCommandLineFlags: > >> > >> -XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=80 > -XX:CMSMaxAbortablePrecleanTime=3600000 -XX:+CMSParallelRemarkEnabled > -XX:+CMSParallelSurvivorRemarkEnabled -XX:+CMSScavengeBeforeRemark > -XX:ConcGCThreads=6 -XX:+HeapDumpOnOutOfMemoryError > -XX:InitialHeapSize=94489280512 -XX:MaxHeapSize=94489280512 > -XX:MaxNewSize=2147483648 -XX:MaxTenuringThreshold=6 -XX:NewSize=2147483648 > -XX:OldPLABSize=16 -XX:PermSize=67108864 -XX:+PrintCommandLineFlags > -XX:PrintFLSStatistics=1 -XX:+PrintGCDateStamps -XX:+PrintGCDetails > -XX:RefDiscoveryPolicy=1 -XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC > -XX:+UseParNewGC > >> > >> And the details of the CMS collection with the long remark pause: > >> > >> 2015-04-09T12:36:10.393-0700: [GC [1 CMS-initial-mark: > 45088837K(90177536K)] 45111889K(92065024K), 0.0296420 secs] [Times: > user=0.01 sys=0.00, real=0.03 secs] > >> 2015-04-09T12:36:24.055-0700: [CMS-concurrent-mark: 13.633/13.633 secs] > [Times: user=75.95 sys=2.86, real=13.63 secs] > >> 2015-04-09T12:36:24.371-0700: [CMS-concurrent-preclean: 0.315/0.315 > secs] [Times: user=0.44 sys=0.02, real=0.32 secs] > >> 2015-04-09T12:36:29.659-0700: [GCBefore GC: > >> Statistics for BinaryTreeDictionary: > >> ------------------------------------ > >> Total Free Space: 1476244390 > >> Max Chunk Size: 1476244390 > >> Number of Blocks: 1 > >> Av. Block Size: 1476244390 > >> Tree Height: 1 > >> Before GC: > >> Statistics for BinaryTreeDictionary: > >> ------------------------------------ > >> Total Free Space: 0 > >> Max Chunk Size: 0 > >> Number of Blocks: 0 > >> Tree Height: 0 > >> 2015-04-09T12:36:29.660-0700: [ParNew: 1680929K->4322K(1887488K), > 0.0794930 secs] 46769766K->45093205K(92065024K)After GC: > >> Statistics for BinaryTreeDictionary: > >> ------------------------------------ > >> Total Free Space: 1476237004 > >> Max Chunk Size: 1476237004 > >> Number of Blocks: 1 > >> Av. Block Size: 1476237004 > >> Tree Height: 1 > >> After GC: > >> Statistics for BinaryTreeDictionary: > >> ------------------------------------ > >> Total Free Space: 0 > >> Max Chunk Size: 0 > >> Number of Blocks: 0 > >> Tree Height: 0 > >> , 0.0797770 secs] [Times: user=1.40 sys=0.00, real=0.08 secs] > >> 2015-04-09T12:36:38.230-0700: [CMS-concurrent-abortable-preclean: > 13.774/13.859 secs] [Times: user=18.14 sys=0.50, real=13.86 secs] > >> 2015-04-09T12:36:38.232-0700: [GC[YG occupancy: 845557 K (1887488 > K)]2015-04-09T12:36:38.232-0700: [GCBefore GC: > >> Statistics for BinaryTreeDictionary: > >> ------------------------------------ > >> Total Free Space: 1476237004 > >> Max Chunk Size: 1476237004 > >> Number of Blocks: 1 > >> Av. Block Size: 1476237004 > >> Tree Height: 1 > >> Before GC: > >> Statistics for BinaryTreeDictionary: > >> ------------------------------------ > >> Total Free Space: 0 > >> Max Chunk Size: 0 > >> Number of Blocks: 0 > >> Tree Height: 0 > >> 2015-04-09T12:36:38.232-0700: [ParNew: 845557K->4086K(1887488K), > 0.0805340 secs] 45934440K->45093032K(92065024K)After GC: > >> Statistics for BinaryTreeDictionary: > >> ------------------------------------ > >> Total Free Space: 1476225574 > >> Max Chunk Size: 1476225574 > >> Number of Blocks: 1 > >> Av. Block Size: 1476225574 > >> Tree Height: 1 > >> After GC: > >> Statistics for BinaryTreeDictionary: > >> ------------------------------------ > >> Total Free Space: 0 > >> Max Chunk Size: 0 > >> Number of Blocks: 0 > >> Tree Height: 0 > >> , 0.0808020 secs] [Times: user=1.42 sys=0.00, real=0.08 secs] > >> 2015-04-09T12:36:38.313-0700: [Rescan (parallel) , 0.0218780 > secs]2015-04-09T12:36:38.334-0700: [weak refs processing, 5.0944360 > secs]2015-04-09T12:36:43.429-0700: [scrub string table, 0.0013140 secs] [1 > CMS-remark: 45088945K(90177536K)] 45093032K(92065024K), 5.3492750 secs] > [Times: user=6.98 sys=0.00, real=5.35 secs] > >> CMS: Large Block: 0x00007f8e42f4aed0; Proximity: 0x0000000000000000 -> > 0x00007f8e276f5b08 > >> CMS: Large block 0x00007f8e42f4aed0 > >> 2015-04-09T12:36:55.505-0700: [CMS-concurrent-sweep: 11.913/11.924 > secs] [Times: user=16.93 sys=0.26, real=11.93 secs] > >> 2015-04-09T12:36:55.703-0700: [CMS-concurrent-reset: 0.187/0.187 secs] > [Times: user=0.21 sys=0.00, real=0.19 secs] > >> _______________________________________________ > >> hotspot-gc-use mailing list > >> > >> hotspot-gc-use at openjdk.java.net > >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Fri Apr 17 20:39:49 2015 From: yu.zhang at oracle.com (Yu Zhang) Date: Fri, 17 Apr 2015 13:39:49 -0700 Subject: longer ParNew collection in JDK8? In-Reply-To: <1905701844.2549870.1422556039324.JavaMail.yahoo@mail.yahoo.com> References: <1905701844.2549870.1422556039324.JavaMail.yahoo@mail.yahoo.com> Message-ID: <55316F95.5050508@oracle.com> Joy, I saw this blog post, it might be helpful to you. https://blogs.oracle.com/poonam/entry/longer_young_collections_with_jdk7 Thanks, Jenny On 1/29/2015 10:27 AM, Joy Xiong wrote: > > Hi, > > We recently move our services from JDK6 to JDK8, but observe longer > ParNew GC pause in JDK8. For example, the pause time increase from > 27ms to 43ms in one service. The service has the JVM parameters as below: > -Xms32684m -Xmx32684m -XX:NewSize=2048m -XX:MaxNewSize=2048m > -XX:+UseConcMarkSweepGC -XX:+UseParNewGC > -XX:CMSInitiatingOccupancyFraction=70 -XX:SurvivorRatio=2 > -XX:+AlwaysPreTouch -XX:+UseCompressedOops > -XX:+PrintTenuringDistribution -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -Xloggc:logs/gc.log > -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime > > Is this longer ParNew pause expected? Any suggestions to mitigate the > longer pause? > > thanks, > -Joy > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicola at infielddesign.com Mon Apr 20 17:49:45 2015 From: nicola at infielddesign.com (Nicola Abello) Date: Mon, 20 Apr 2015 10:49:45 -0700 Subject: Garbage Collection tuning advice ? Message-ID: Hello everyone, I am currently using the Adobe Experience Manager for a Client's site (Java language). It uses openJDK: #*java -version* *java version "1.7.0_65"* *OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)* *OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)* It is running on Rackspace with the following: *vCPU:* *4* *Memory:* *16GB* *Guest OS:* *Red Hat Enterprise Linux 6 (64-bit)* Since it has been in production I have been experiencing very slow performance on the part of the application. It goes like this I launch the app, everything is smooth then 3 to 4 days later the CPU usage spikes to 400% (~4000 users/day hit the site). I got a few OOM exceptions (1 or 2) but mostly the site was exceptionally slow and never becomes an OOM exception. Since I am a novice at Java Memory management I started reading about how it works and found tools like jstat. When the system was overwhelmed the second time around, I ran: #*top* Got the PID of the java process and then pressed shift+H and noted the PIDs of the threads with high CPU percentage. Then I ran #*sudo -uaem jstat * Got a thread dump and converted the thread PIDs I wrote down previously and searched for their hex value in the dump. After all that, I finally found that it was not surprisingly the Garbage Collector that is flipping out for some reason. I started reading a lot about Java GC tuning and came up with the following java options. So restarted the application with the following options: *java* -Dcom.day.crx.persistence.tar.IndexMergeDelay=0 -Djackrabbit.maxQueuedEvents=1000000 -Djava.io.tmpdir=/srv/aem/tmp/ * -XX:+HeapDumpOnOutOfMemoryError * * -XX:HeapDumpPath=/srv/aem/tmp/ * * -Xms8192m -Xmx8192m * * -XX:PermSize=256m * * -XX:MaxPermSize=1024m * * -XX:+UseParallelGC * * -XX:+UseParallelOldGC * * -XX:ParallelGCThreads=4 * * -XX:NewRatio=1* -Djava.awt.headless=true -server -Dsling.run.modes=publish -jar crx-quickstart/app/cq-quickstart-6.0.0-standalone.jar start -c crx-quickstart -i launchpad -p 4503 -Dsling.properties=conf/sling.properties And it looks like it is performing much better but I think that it probably needs more GC tuning. When I run: #*sudo -uaem jstat -gcutils* I get this: S0 S1 E O P YGC YGCT FGC FGCT GCT 0.00 0.00 55.97 100.00 45.09 4725 521.233 505 4179.584 4700.817 after 4 days that I restarted it. When I run: #*sudo -uaem jstat -gccapacity* I get this: NGCMN NGCMX NGC S0C S1C EC 4194304.0 4194304.0 4194304.0 272896.0 279040.0 3636224.0 OGCMN OGCMX OGC OC PGCMN PGCMX 4194304.0 4194304.0 4194304.0 4194304.0 262144.0 1048576.0 PGC PC YGC FGC 262144.0 262144.0 4725 509 after 4 days that I restarted it. These result are much better than when I started but I think it can get even better. I'm not really sure what to do next as I'm no GC pro so I was wondering if you guys would have any tips or advice for me on how I could get better app/GC performance and if anything is obvious like ratio's and sizes of youngGen and oldGen ? How should I set the survivors and eden sizes/ratios ? Should I change GC type like use CMS GC or G1 ? How should I proceed ? Any advice would be helpful. Best, Nicola -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmohta.coder at gmail.com Mon Apr 20 18:04:52 2015 From: rmohta.coder at gmail.com (Rohit Mohta) Date: Mon, 20 Apr 2015 19:04:52 +0100 Subject: Garbage Collection tuning advice ? In-Reply-To: References: Message-ID: Hi, We are facing similar issue for our high performance application. We noticed, JVM ergonomic took over and adjusted the values to meet its goal (adaptive policy). This worked in most scenario, but caused problems in some case. My recommendations will be a) Turn on GC logging (if not yet done) either for default setting or the one you have mentioned b) Check if you have a regular premature movement to old generation. If yes, you might want to set a specific new size and/or survivor size. c) You have 16GB ram but have set Max heap as 8GB. If you are getting OOM, increase Xmx. On 20 Apr 2015 18:50, "Nicola Abello" wrote: > Hello everyone, I am currently using the Adobe Experience Manager for a > Client's site (Java language). It uses openJDK: > > #*java -version* > > *java version "1.7.0_65"* > *OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)* > *OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)* > > It is running on Rackspace with the following: > > *vCPU:* *4* > *Memory:* *16GB* > *Guest OS:* *Red Hat Enterprise Linux 6 (64-bit)* > > Since it has been in production I have been experiencing very slow > performance on the part of the application. It goes like this I launch the > app, everything is smooth then 3 to 4 days later the CPU usage spikes to > 400% (~4000 users/day hit the site). I got a few OOM exceptions (1 or 2) > but mostly the site was exceptionally slow and never becomes an OOM > exception. Since I am a novice at Java Memory management I started reading > about how it works and found tools like jstat. When the system was > overwhelmed the second time around, I ran: > > #*top* > > Got the PID of the java process and then pressed shift+H and noted the > PIDs of the threads with high CPU percentage. Then I ran > > #*sudo -uaem jstat * > > Got a thread dump and converted the thread PIDs I wrote down previously > and searched for their hex value in the dump. After all that, I finally > found that it was not surprisingly the Garbage Collector that is flipping > out for some reason. > > I started reading a lot about Java GC tuning and came up with the > following java options. > So restarted the application with the following options: > > *java* -Dcom.day.crx.persistence.tar.IndexMergeDelay=0 > -Djackrabbit.maxQueuedEvents=1000000 > -Djava.io.tmpdir=/srv/aem/tmp/ > > * -XX:+HeapDumpOnOutOfMemoryError * > * -XX:HeapDumpPath=/srv/aem/tmp/ * > * -Xms8192m -Xmx8192m * > * -XX:PermSize=256m * > * -XX:MaxPermSize=1024m * > * -XX:+UseParallelGC * > * -XX:+UseParallelOldGC * > * -XX:ParallelGCThreads=4 * > * -XX:NewRatio=1* > > -Djava.awt.headless=true > -server > -Dsling.run.modes=publish > -jar crx-quickstart/app/cq-quickstart-6.0.0-standalone.jar start > -c crx-quickstart -i launchpad -p 4503 > -Dsling.properties=conf/sling.properties > > And it looks like it is performing much better but I think that it > probably needs more GC tuning. > > When I run: > > #*sudo -uaem jstat -gcutils* > > I get this: > > S0 S1 E O P YGC YGCT FGC FGCT GCT > > 0.00 0.00 55.97 100.00 45.09 4725 521.233 505 4179.584 > 4700.817 > > after 4 days that I restarted it. > > When I run: > > #*sudo -uaem jstat -gccapacity* > > I get this: > > NGCMN NGCMX NGC S0C S1C EC > 4194304.0 4194304.0 4194304.0 272896.0 279040.0 3636224.0 > > OGCMN OGCMX OGC OC PGCMN PGCMX > 4194304.0 4194304.0 4194304.0 4194304.0 262144.0 1048576.0 > > PGC PC YGC FGC > 262144.0 262144.0 4725 509 > > after 4 days that I restarted it. > > These result are much better than when I started but I think it can get > even better. I'm not really sure what to do next as I'm no GC pro so I was > wondering if you guys would have any tips or advice for me on how I could > get better app/GC performance and if anything is obvious like ratio's and > sizes of youngGen and oldGen ? > > How should I set the survivors and eden sizes/ratios ? > Should I change GC type like use CMS GC or G1 ? > How should I proceed ? > > Any advice would be helpful. > > Best, > Nicola > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.masamitsu at oracle.com Mon Apr 20 18:36:00 2015 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Mon, 20 Apr 2015 11:36:00 -0700 Subject: Garbage Collection tuning advice ? In-Reply-To: References: Message-ID: <55354710.2040208@oracle.com> Nicola, Have you looked at http://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/ And as Rohit suggest, you should turn on GC logging (-XX:+PrintGCDetails -XX:+PrintGCTimeStamps). Jon On 04/20/2015 10:49 AM, Nicola Abello wrote: > Hello everyone, I am currently using the Adobe Experience Manager for > a Client's site (Java language). It uses openJDK: > > #/java -version/ > > *java version "1.7.0_65"* > *OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)* > *OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)* > > It is running on Rackspace with the following: > > /_vCPU_:/ *4* > /_Memory_:/ *16GB* > /_Guest OS_:/ *Red Hat Enterprise Linux 6 (64-bit)* > Since it has been in production I have been experiencing very slow > performance on the part of the application. It goes like this I launch > the app, everything is smooth then 3 to 4 days later the CPU usage > spikes to 400% (~4000 users/day hit the site). I got a few OOM > exceptions (1 or 2) but mostly the site was exceptionally slow and > never becomes an OOM exception. Since I am a novice at Java Memory > management I started reading about how it works and found tools like > jstat. When the system was overwhelmed the second time around, I ran: > > #/top/ > > Got the PID of the java process and then pressed shift+H and noted the > PIDs of the threads with high CPU percentage. Then I ran > > #/sudo -uaem jstat / > > Got a thread dump and converted the thread PIDs I wrote down > previously and searched for their hex value in the dump. After all > that, I finally found that it was not surprisingly the Garbage > Collector that is flipping out for some reason. > > I started reading a lot about Java GC tuning and came up with the > following java options. > So restarted the application with the following options: > > /java/ -Dcom.day.crx.persistence.tar.IndexMergeDelay=0 > -Djackrabbit.maxQueuedEvents=1000000 > -Djava.io.tmpdir=/srv/aem/tmp/ > /*-XX:+HeapDumpOnOutOfMemoryError */ > /*-XX:HeapDumpPath=/srv/aem/tmp/ */ > /*-Xms8192m -Xmx8192m */ > /*-XX:PermSize=256m */ > /*-XX:MaxPermSize=1024m */ > /*-XX:+UseParallelGC */ > /*-XX:+UseParallelOldGC */ > /*-XX:ParallelGCThreads=4 */ > /*-XX:NewRatio=1*/ > -Djava.awt.headless=true > -server > -Dsling.run.modes=publish > -jar crx-quickstart/app/cq-quickstart-6.0.0-standalone.jar start > -c crx-quickstart -i launchpad -p 4503 > -Dsling.properties=conf/sling.properties > And it looks like it is performing much better but I think that it > probably needs more GC tuning. > > When I run: > > #/sudo -uaem jstat -gcutils/ > > I get this: > > S0 S1 E O P YGC YGCT FGC FGCT GCT > 0.00 0.00 55.97 100.00 45.09 4725 521.233 505 4179.584 4700.817 > > after 4 days that I restarted it. > > When I run: > > #/sudo -uaem jstat -gccapacity/ > > I get this: > > NGCMN NGCMX NGC S0C S1C EC > 4194304.0 4194304.0 4194304.0 272896.0 279040.0 3636224.0 > OGCMN OGCMX OGC OC PGCMN PGCMX > 4194304.0 4194304.0 4194304.0 4194304.0 262144.0 1048576.0 > > PGC PC YGC FGC > 262144.0 262144.0 4725 509 > > after 4 days that I restarted it. > > These result are much better than when I started but I think it can > get even better. I'm not really sure what to do next as I'm no GC pro > so I was wondering if you guys would have any tips or advice for me on > how I could get better app/GC performance and if anything is obvious > like ratio's and sizes of youngGen and oldGen ? > > How should I set the survivors and eden sizes/ratios ? > Should I change GC type like use CMS GC or G1 ? > How should I proceed ? > > Any advice would be helpful. > > Best, > Nicola > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From bernardo.s at pointclickcare.com Mon Apr 20 18:54:54 2015 From: bernardo.s at pointclickcare.com (Bernardo Sanchez) Date: Mon, 20 Apr 2015 18:54:54 +0000 Subject: Garbage Collection tuning advice ? In-Reply-To: References: Message-ID: CMS is still the best for high performance apps; but truly depends on how your app was created and what it does. There is no hard rule for what collector to use (i.e. reporting servers need diff settings than app servers?) The issue you?re having is probably related to running out or memory, or a memory leak. I don?t believe this to be a ?Collector? issue as when Java allocates memory, it requires contiguous blocks, so if it cannot allocate said memory, it will perform a full GC to effectively de-fragment the memory (this could cause stop the world pauses, but not OOM). As this does not appear to be your issue, you might have a memory leak on your hands (holding on to dead objects that is filling the heap). Here is a page for JVM options: http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html#G1Options here is a GC benchmarking doc: http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/ You really need to have GC logging enabled, then use a tool like GCViewer to see how your heap is acting (gcnormal.log file ? see below). Here are some GC Logging Settings: -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/srv/aem/tmp/ -Xloggc:/srv/aem/tmp/gcnormal.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintHeapAtGC I would also suggest taking a heap dump and inspect it with the Eclipse Memory Analyzer (leak suspects) to view what is truly occurring at the slow time. This can take quite a bit of time to perform, so you?ll probably need at minimum a second server to take over the load. You can also connect to your JVM using jconsole to see the live heap in action (requires this setting; you can change the port: -agentlib:jdwp=transport=dt_socket,address=8082,server=y,suspend=n) hope this helps, b From: hotspot-gc-use [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of Rohit Mohta Sent: April-20-15 2:05 PM To: Nicola Abello Cc: hotspot-gc-use at openjdk.java.net Subject: Re: Garbage Collection tuning advice ? Hi, We are facing similar issue for our high performance application. We noticed, JVM ergonomic took over and adjusted the values to meet its goal (adaptive policy). This worked in most scenario, but caused problems in some case. My recommendations will be a) Turn on GC logging (if not yet done) either for default setting or the one you have mentioned b) Check if you have a regular premature movement to old generation. If yes, you might want to set a specific new size and/or survivor size. c) You have 16GB ram but have set Max heap as 8GB. If you are getting OOM, increase Xmx. On 20 Apr 2015 18:50, "Nicola Abello" > wrote: Hello everyone, I am currently using the Adobe Experience Manager for a Client's site (Java language). It uses openJDK: #java -version java version "1.7.0_65" OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17) OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode) It is running on Rackspace with the following: vCPU: 4 Memory: 16GB Guest OS: Red Hat Enterprise Linux 6 (64-bit) Since it has been in production I have been experiencing very slow performance on the part of the application. It goes like this I launch the app, everything is smooth then 3 to 4 days later the CPU usage spikes to 400% (~4000 users/day hit the site). I got a few OOM exceptions (1 or 2) but mostly the site was exceptionally slow and never becomes an OOM exception. Since I am a novice at Java Memory management I started reading about how it works and found tools like jstat. When the system was overwhelmed the second time around, I ran: #top Got the PID of the java process and then pressed shift+H and noted the PIDs of the threads with high CPU percentage. Then I ran #sudo -uaem jstat Got a thread dump and converted the thread PIDs I wrote down previously and searched for their hex value in the dump. After all that, I finally found that it was not surprisingly the Garbage Collector that is flipping out for some reason. I started reading a lot about Java GC tuning and came up with the following java options. So restarted the application with the following options: java -Dcom.day.crx.persistence.tar.IndexMergeDelay=0 -Djackrabbit.maxQueuedEvents=1000000 -Djava.io.tmpdir=/srv/aem/tmp/ -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/srv/aem/tmp/ -Xms8192m -Xmx8192m -XX:PermSize=256m -XX:MaxPermSize=1024m -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:NewRatio=1 -Djava.awt.headless=true -server -Dsling.run.modes=publish -jar crx-quickstart/app/cq-quickstart-6.0.0-standalone.jar start -c crx-quickstart -i launchpad -p 4503 -Dsling.properties=conf/sling.properties And it looks like it is performing much better but I think that it probably needs more GC tuning. When I run: #sudo -uaem jstat -gcutils I get this: S0 S1 E O P YGC YGCT FGC FGCT GCT 0.00 0.00 55.97 100.00 45.09 4725 521.233 505 4179.584 4700.817 after 4 days that I restarted it. When I run: #sudo -uaem jstat -gccapacity I get this: NGCMN NGCMX NGC S0C S1C EC 4194304.0 4194304.0 4194304.0 272896.0 279040.0 3636224.0 OGCMN OGCMX OGC OC PGCMN PGCMX 4194304.0 4194304.0 4194304.0 4194304.0 262144.0 1048576.0 PGC PC YGC FGC 262144.0 262144.0 4725 509 after 4 days that I restarted it. These result are much better than when I started but I think it can get even better. I'm not really sure what to do next as I'm no GC pro so I was wondering if you guys would have any tips or advice for me on how I could get better app/GC performance and if anything is obvious like ratio's and sizes of youngGen and oldGen ? How should I set the survivors and eden sizes/ratios ? Should I change GC type like use CMS GC or G1 ? How should I proceed ? Any advice would be helpful. Best, Nicola _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.pedot at finkzeit.at Fri Apr 24 09:14:24 2015 From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot) Date: Fri, 24 Apr 2015 11:14:24 +0200 Subject: G1GC, Java8u40ea, Metaspace questions In-Reply-To: <54F961E8.1010000@oracle.com> References: <821215C9-36AC-41BB-A9A6-1E136341778F@finkzeit.at> <54DE3254.9030503@oracle.com> <54DE41A7.6050004@finkzeit.at> <54DE5495.2010501@oracle.com> <54E394FB.3040204@oracle.com> <54E39BCC.7090802@finkzeit.at> <54E41127.3040002@oracle.com> <54E5CD2B.7030201@finkzeit.at> <54F8A7E5.5080606@oracle.com> <54F8BF7E.2000805@finkzeit.at> <54F961E8.1010000@oracle.com> Message-ID: <553A0970.2070302@finkzeit.at> Hi, after all my questions/tests I just wanted to share what happened when I finally switched a production system from 7u60 to 8u40: The system has 13.6GB Heap, ~4.5GB usage after mixed collects during the day, target pause-time is 250ms but most collects take 100-160ms. It has been two weeks now (unfortunately not continuous uptime due to bugfixes) and I can really see the improvements made to G1. Recorded GC-Time has been reduced by at least 40% and I have not had a single Full-GC during the day yet (I still trigger a manual Full-GC in the night so I can see where collection-usage is going). Metaspace-usage is coninuously lower than PermGen was before because it is usually kept in check by the 5-6 mixed collects I see during a normal day. So far its been quite a success, thanks for all your input! regards Wolfgang Pedot From yu.zhang at oracle.com Fri Apr 24 15:01:38 2015 From: yu.zhang at oracle.com (Yu Zhang) Date: Fri, 24 Apr 2015 08:01:38 -0700 Subject: G1GC, Java8u40ea, Metaspace questions In-Reply-To: <553A0970.2070302@finkzeit.at> References: <821215C9-36AC-41BB-A9A6-1E136341778F@finkzeit.at> <54DE3254.9030503@oracle.com> <54DE41A7.6050004@finkzeit.at> <54DE5495.2010501@oracle.com> <54E394FB.3040204@oracle.com> <54E39BCC.7090802@finkzeit.at> <54E41127.3040002@oracle.com> <54E5CD2B.7030201@finkzeit.at> <54F8A7E5.5080606@oracle.com> <54F8BF7E.2000805@finkzeit.at> <54F961E8.1010000@oracle.com> <553A0970.2070302@finkzeit.at> Message-ID: <553A5AD2.9020403@oracle.com> Wolfgang, Thank you very much for sharing your story. You made my day! Thanks, Jenny On 4/24/2015 2:14 AM, Wolfgang Pedot wrote: > Hi, > > after all my questions/tests I just wanted to share what happened when > I finally switched a production system from 7u60 to 8u40: > The system has 13.6GB Heap, ~4.5GB usage after mixed collects during > the day, target pause-time is 250ms but most collects take 100-160ms. > > It has been two weeks now (unfortunately not continuous uptime due to > bugfixes) and I can really see the improvements made to G1. > Recorded GC-Time has been reduced by at least 40% and I have not had a > single Full-GC during the day yet (I still trigger a manual Full-GC in > the night so I can see where collection-usage is going). > Metaspace-usage is coninuously lower than PermGen was before because > it is usually kept in check by the 5-6 mixed collects I see during a > normal day. > So far its been quite a success, thanks for all your input! > > regards > Wolfgang Pedot -------------- next part -------------- An HTML attachment was scrubbed... URL: From bengt.rutisson at oracle.com Fri Apr 24 15:06:20 2015 From: bengt.rutisson at oracle.com (Bengt Rutisson) Date: Fri, 24 Apr 2015 17:06:20 +0200 Subject: G1GC, Java8u40ea, Metaspace questions In-Reply-To: <553A0970.2070302@finkzeit.at> References: <821215C9-36AC-41BB-A9A6-1E136341778F@finkzeit.at> <54DE3254.9030503@oracle.com> <54DE41A7.6050004@finkzeit.at> <54DE5495.2010501@oracle.com> <54E394FB.3040204@oracle.com> <54E39BCC.7090802@finkzeit.at> <54E41127.3040002@oracle.com> <54E5CD2B.7030201@finkzeit.at> <54F8A7E5.5080606@oracle.com> <54F8BF7E.2000805@finkzeit.at> <54F961E8.1010000@oracle.com> <553A0970.2070302@finkzeit.at> Message-ID: <553A5BEC.6030407@oracle.com> Hi Wolfgang, Thanks a lot for sharing this! Really nice! Bengt On 2015-04-24 11:14, Wolfgang Pedot wrote: > Hi, > > after all my questions/tests I just wanted to share what happened when > I finally switched a production system from 7u60 to 8u40: > The system has 13.6GB Heap, ~4.5GB usage after mixed collects during > the day, target pause-time is 250ms but most collects take 100-160ms. > > It has been two weeks now (unfortunately not continuous uptime due to > bugfixes) and I can really see the improvements made to G1. > Recorded GC-Time has been reduced by at least 40% and I have not had a > single Full-GC during the day yet (I still trigger a manual Full-GC in > the night so I can see where collection-usage is going). > Metaspace-usage is coninuously lower than PermGen was before because > it is usually kept in check by the 5-6 mixed collects I see during a > normal day. > So far its been quite a success, thanks for all your input! > > regards > Wolfgang Pedot > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use