From martin.makundi at koodaripalvelut.com Mon Aug 4 16:33:31 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Mon, 4 Aug 2014 19:33:31 +0300 Subject: G1gc compaction algorithm In-Reply-To: References: <53C8206F.9070303@oracle.com> <53D09794.3090806@oracle.com> <1406186302.2920.4.camel@cirrus> <1406186865.2920.8.camel@cirrus> <53DA7B4D.3090000@oracle.com> <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> Message-ID: Hi! Here are the fresh logs: http://81.22.250.165/log/gc-16m-2014-08-04.log Today we were hit by quite some number of Full GC's with quite short intervals and as can be suspected, not so happy users ;) Any ideas? I will reduce the region size to 4M for now, because it resulted in much fewer full gcs. ** Martin 2014-08-01 1:17 GMT+03:00 Martin Makundi : > Hmm.. ok, I copy pasted if from the mail, it works after typing manually, > thanks. > > Problem seems to have been BOTH a whitespace typo AND > UnlockDiagnosticOptions was on the right side. > > Thanks. > > Gathering logs now. > > ** > Martin > > > 2014-08-01 1:01 GMT+03:00 Yu Zhang : > > maybe some hidden text? >> >> Thanks, >> Jenny >> >> On 7/31/2014 2:52 PM, Martin Makundi wrote: >> >> Strange that it is in the property summary but doesn't allow setting it. >> >> >> 2014-08-01 0:39 GMT+03:00 Martin Makundi < >> martin.makundi at koodaripalvelut.com>: >> >>> Hi! >>> >>> UnlockDiagnosticVMOptions is on (though later (on the right side) in >>> the command line). Jvm version is >>> >>> java version "1.7.0_55" >>> Java(TM) SE Runtime Environment (build 1.7.0_55-b13) >>> Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) >>> >>> >>> >>> 2014-08-01 0:37 GMT+03:00 Yu Zhang : >>> >>> Martin, >>>> >>>> These 2 need to run with -XX:+UnlockDiagnosticVMOptions >>>> >>>> Thanks, >>>> Jenny >>>> >>>> On 7/31/2014 2:33 PM, Martin Makundi wrote: >>>> >>>> Hi! >>>> >>>> G1SummarizeRSetStats does not seem to work, jvm says: >>>> >>>> Improperly specified VM option 'G1SummarizeRSetStatsPeriod=10' >>>> Error: Could not create the Java Virtual Machine. >>>> Error: A fatal exception has occurred. Program will exit. >>>> >>>> Same for both new options >>>> >>>> >>>> >>>> 2014-07-31 20:22 GMT+03:00 Yu Zhang : >>>> >>>>> Martin, >>>>> >>>>> The ScanRS for mixed gc is extremely long, 1000-9000ms. Because it is >>>>> over pause time goal, minimum old regions can be added to CSet. So mixed >>>>> gc is not keeping up. >>>>> >>>>> Can do a run keeping 16m region size, no G1PrintRegionLivenessInfo, >>>>> no PrintHeapAtGC. But -XX:+G1SummarizeRSetStats >>>>> -XX:G1SummarizeRSetStatsPeriod=10 >>>>> >>>>> This should tell us more about RSet information. >>>>> >>>>> While the UpdateRS is not as bad as ScanRS, we can try to push it to >>>>> the concurrent threads. Can you add -XX:G1RSetUpdatingPauseTimePercent=5. >>>>> I am hoping this brings the UpdateRS down to 50ms. >>>>> >>>>> >>>>> Thanks, >>>>> Jenny >>>>> >>>>> On 7/28/2014 8:27 PM, Martin Makundi wrote: >>>>> >>>>>> Hi! >>>>>> >>>>>> We suffered a couple of Full GC's using regionsize 5M (it seems to be >>>>>> exact looking at logged actual parameters) and we tried the 16M option and >>>>>> this resulted in more severe Full GC behavior. >>>>>> >>>>>> Here is the promised log for 16 M setting: >>>>>> http://81.22.250.165/log/gc-16m.log >>>>>> >>>>>> We switch back to 5M hoping it will behave more nicely. >>>>>> >>>>>> ** >>>>>> Martin >>>>>> >>>>> >>>>> >>>> >>>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Wed Aug 6 21:53:18 2014 From: yu.zhang at oracle.com (Yu Zhang) Date: Wed, 06 Aug 2014 14:53:18 -0700 Subject: G1gc compaction algorithm In-Reply-To: References: <53D09794.3090806@oracle.com> <1406186302.2920.4.camel@cirrus> <1406186865.2920.8.camel@cirrus> <53DA7B4D.3090000@oracle.com> <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> Message-ID: <53E2A3CE.6090803@oracle.com> Martin, Thanks for the logs and following up with us. In this chart, the purple line is the ScanRS time for mixed gc. At the bottom there are grey circles indicating when the initial mark happens. The white is the ScanRS time for mixed gc with to-space exhausted. You can see that the first several scanRS after initial-mark is ok, then they go up to 7000ms. For the 16m region size runs, you have G1HeapWastePercent=0. (4m region size has G1HeapWastePercent=1). Because of this, g1 will not stop mixed gc till there is no candidate regions. From the space claimed by mixed gc, it claims 2-3g heap, but the price is too high. Another disadvantage is it does not start marking phases: "do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 2147483648 bytes, allocation request: 0 bytes, threshold: 2147483640 bytes (10.00 %), source: end of GC]" If you look at the claimable heap in MB, 4m heap region size starts mixed gc at lower reclaimable Another thing is there are coarsening for RSet Entries. Can you do one with -XX:G1HeapRegionSize=16m -XX:G1HeapWastePercent=10 -XX:G1RSetRegionEntries=1792 -XX:G1RSetSparseRegionEntries=40 Thanks, Jenny On 8/4/2014 9:33 AM, Martin Makundi wrote: > Hi! > > Here are the fresh logs: > > http://81.22.250.165/log/gc-16m-2014-08-04.log > > Today we were hit by quite some number of Full GC's with quite short > intervals and as can be suspected, not so happy users ;) > > Any ideas? I will reduce the region size to 4M for now, because it > resulted in much fewer full gcs. > > ** > Martin > > > 2014-08-01 1:17 GMT+03:00 Martin Makundi > >: > > Hmm.. ok, I copy pasted if from the mail, it works after typing > manually, thanks. > > Problem seems to have been BOTH a whitespace typo AND > UnlockDiagnosticOptions was on the right side. > > Thanks. > > Gathering logs now. > > ** > Martin > > > 2014-08-01 1:01 GMT+03:00 Yu Zhang >: > > maybe some hidden text? > > Thanks, > Jenny > > On 7/31/2014 2:52 PM, Martin Makundi wrote: >> Strange that it is in the property summary but doesn't allow >> setting it. >> >> >> 2014-08-01 0:39 GMT+03:00 Martin Makundi >> > >: >> >> Hi! >> >> UnlockDiagnosticVMOptions is on (though later (on the >> right side) in the command line). Jvm version is >> >> java version "1.7.0_55" >> Java(TM) SE Runtime Environment (build 1.7.0_55-b13) >> Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed >> mode) >> >> >> >> 2014-08-01 0:37 GMT+03:00 Yu Zhang > >: >> >> Martin, >> >> These 2 need to run with -XX:+UnlockDiagnosticVMOptions >> >> Thanks, >> Jenny >> >> On 7/31/2014 2:33 PM, Martin Makundi wrote: >>> Hi! >>> >>> G1SummarizeRSetStats does not seem to work, jvm says: >>> >>> Improperly specified VM option >>> 'G1SummarizeRSetStatsPeriod=10' >>> Error: Could not create the Java Virtual Machine. >>> Error: A fatal exception has occurred. Program will >>> exit. >>> >>> Same for both new options >>> >>> >>> >>> 2014-07-31 20:22 GMT+03:00 Yu Zhang >>> >: >>> >>> Martin, >>> >>> The ScanRS for mixed gc is extremely long, >>> 1000-9000ms. Because it is over pause time >>> goal, minimum old regions can be added to CSet. >>> So mixed gc is not keeping up. >>> >>> Can do a run keeping 16m region size, no >>> G1PrintRegionLivenessInfo, no PrintHeapAtGC. >>> But -XX:+G1SummarizeRSetStats >>> -XX:G1SummarizeRSetStatsPeriod=10 >>> >>> This should tell us more about RSet information. >>> >>> While the UpdateRS is not as bad as ScanRS, we >>> can try to push it to the concurrent threads. >>> Can you add >>> -XX:G1RSetUpdatingPauseTimePercent=5. I am >>> hoping this brings the UpdateRS down to 50ms. >>> >>> >>> Thanks, >>> Jenny >>> >>> On 7/28/2014 8:27 PM, Martin Makundi wrote: >>> >>> Hi! >>> >>> We suffered a couple of Full GC's using >>> regionsize 5M (it seems to be exact looking >>> at logged actual parameters) and we tried >>> the 16M option and this resulted in more >>> severe Full GC behavior. >>> >>> Here is the promised log for 16 M setting: >>> http://81.22.250.165/log/gc-16m.log >>> >>> We switch back to 5M hoping it will behave >>> more nicely. >>> >>> ** >>> Martin >>> >>> >>> >> >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: jggabdja.png Type: image/png Size: 22281 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dbhagche.png Type: image/png Size: 19368 bytes Desc: not available URL: From srini_was at yahoo.com Wed Aug 6 22:48:51 2014 From: srini_was at yahoo.com (Srini Padman) Date: Wed, 6 Aug 2014 15:48:51 -0700 Subject: Seeking help regarding Full GCs with G1 Message-ID: <1407365331.22707.YahooMailNeo@web140705.mail.bf1.yahoo.com> Hello, I am currently evaluating the use of the G1 Collector for our application, to combat the fragmentation issues we ran into while using the CMS collector (several cases of failed promotions, followed by *really* long pauses). However, I am also having trouble with tuning the G1 collector, and am seeing behavior that I can't fully understand. I will appreciate any help/insight that you guys can offer. What I find puzzling from looking at the G1 GC logs from our tests is that the concurrent marking phase does not really seem to identify many old regions to clean up at all, and the heap usage keeps growing. At some point, there is no further room to expand ("heap expansion operation failed") and this is followed by a Full GC that lasts about 10 seconds. But the Full GC actually brings the memory down by almost 50%, from 4095M to 2235M. If the Full GC can collect this much of the heap, I don't fully understand why the concurrent mark phase does not identify these (old?) regions for (mixed?) collection subsequently. On the assumption that we should let the GC ergonomics do its thing freely, I initially did not set any parameter other than -Xmx, -Xms, and the PermGen sizes. I added the G1HeapRegionSize and G1MixedGCLiveThresholdPercent settings (see below) because, when I saw the Full GCs with the default settings, I wondered whether we might be getting into a situation where all (or most?) regions are roughly 65% live so the concurrent marking phase does not identify them for collection but a subsequent Full GC is able to. That is, I wondered whether our application's heap footprint being 65% of the max heap led to these full GCs coincidentally (since G1MixedGCLiveThresholdPercent is 65% by default). But I don't know why the same thing happens when I set G1MixedGCLiveThresholdPercent down to 40% - even if all regions are 40% full, we will only be at about 1.6 GB, and that is far below what I think our heap footprint is in the long run (2.2 GB). So I don't understand how to ensure that old regions are cleaned up regularly so a Full GC is not required. GC Settings in use: -server -Xms4096m -Xmx4096m -Xss512k -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:+DisableExplicitGC -verbose:gc -XX:+PrintAdaptiveSizePolicy -XX:+UnlockExperimentalVMOptions -XX:G1HeapRegionSize=2m -XX:G1MixedGCLiveThresholdPercent=40 This is using JRE 1.7.0_55. I am including a short(ish) GC log snippet for the time leading up to the Full GC. I can send the full GC log (about 8 MB, zipped) if necessary. Any help will be greatly appreciated! Regards, Srini. --------------------------- 2014-08-06T04:46:00.067-0700: 1124501.033: [GC concurrent-root-region-scan-start] 2014-08-06T04:46:00.081-0700: 1124501.047: [GC concurrent-root-region-scan-end, 0.0139487 secs] 2014-08-06T04:46:00.081-0700: 1124501.047: [GC concurrent-mark-start] 2014-08-06T04:46:10.531-0700: 1124511.514: [GC concurrent-mark-end, 10.4675249 secs] 2014-08-06T04:46:10.532-0700: 1124511.515: [GC remark 2014-08-06T04:46:10.532-0700: 1124511.516: [GC ref-proc, 0.0018819 secs], 0.0225253 secs] ?[Times: user=0.01 sys=0.00, real=0.02 secs] 2014-08-06T04:46:10.555-0700: 1124511.539: [GC cleanup 3922M->3922M(4096M), 0.0098209 secs] ?[Times: user=0.03 sys=0.03, real=0.01 secs] 2014-08-06T04:46:49.603-0700: 1124550.652: [GC pause (young) Desired survivor size 13631488 bytes, new threshold 15 (max 15) - age?? 1:??? 1531592 bytes,??? 1531592 total - age?? 2:??? 1087648 bytes,??? 2619240 total - age?? 3:???? 259480 bytes,??? 2878720 total - age?? 4:???? 493976 bytes,??? 3372696 total - age?? 5:???? 213472 bytes,??? 3586168 total - age?? 6:???? 186104 bytes,??? 3772272 total - age?? 7:???? 169832 bytes,??? 3942104 total - age?? 8:???? 201968 bytes,??? 4144072 total - age?? 9:???? 183752 bytes,??? 4327824 total - age? 10:???? 136480 bytes,??? 4464304 total - age? 11:???? 366208 bytes,??? 4830512 total - age? 12:???? 137296 bytes,??? 4967808 total - age? 13:???? 133592 bytes,??? 5101400 total - age? 14:???? 162232 bytes,??? 5263632 total - age? 15:???? 139984 bytes,??? 5403616 total ?1124550.652: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 21647, predicted base time: 37.96 ms, remaining time: 162.04 ms, target pause time: 200.00 ms] ?1124550.652: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 95 regions, survivors: 7 regions, predicted young region time: 4.46 ms] ?1124550.652: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 95 regions, survivors: 7 regions, old: 0 regions, predicted pause time: 42.42 ms, target pause time: 200.00 ms] ?1124550.701: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 4064280576 bytes, allocation request: 0 bytes, threshold: 1932735240 bytes (45.00 %), source: end of GC] ?1124550.701: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: candidate old regions available, candidate old regions: 285 regions, reclaimable: 430117688 bytes (10.01 %), threshold: 10.00 %] , 0.0494015 secs] ?? [Parallel Time: 43.7 ms, GC Workers: 4] ????? [GC Worker Start (ms): Min: 1124550651.8, Avg: 1124550668.7, Max: 1124550674.3, Diff: 22.6] ????? [Ext Root Scanning (ms): Min: 0.1, Avg: 6.7, Max: 22.3, Diff: 22.2, Sum: 26.8] ????? [Update RS (ms): Min: 9.9, Avg: 11.0, Max: 12.3, Diff: 2.5, Sum: 44.0] ???????? [Processed Buffers: Min: 39, Avg: 40.3, Max: 41, Diff: 2, Sum: 161] ????? [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2] ????? [Object Copy (ms): Min: 8.6, Avg: 8.9, Max: 9.6, Diff: 1.0, Sum: 35.6] ????? [Termination (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.3] ????? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.2] ????? [GC Worker Total (ms): Min: 21.1, Avg: 26.8, Max: 43.7, Diff: 22.6, Sum: 107.1] ????? [GC Worker End (ms): Min: 1124550695.4, Avg: 1124550695.5, Max: 1124550695.5, Diff: 0.0] ?? [Code Root Fixup: 0.0 ms] ?? [Clear CT: 0.1 ms] ?? [Other: 5.6 ms] ????? [Choose CSet: 0.0 ms] ????? [Ref Proc: 4.5 ms] ????? [Ref Enq: 0.1 ms] ????? [Free CSet: 0.3 ms] ?? [Eden: 190.0M(190.0M)->0.0B(190.0M) Survivors: 14.0M->14.0M Heap: 4077.0M(4096.0M)->3887.1M(4096.0M)] ?[Times: user=0.11 sys=0.00, real=0.05 secs] 2014-08-06T04:47:45.545-0700: 1124606.686: [GC pause (mixed) Desired survivor size 13631488 bytes, new threshold 15 (max 15) - age?? 1:??? 1323232 bytes,??? 1323232 total - age?? 2:???? 716576 bytes,??? 2039808 total - age?? 3:??? 1058584 bytes,??? 3098392 total - age?? 4:???? 225208 bytes,??? 3323600 total - age?? 5:???? 447688 bytes,??? 3771288 total - age?? 6:???? 195112 bytes,??? 3966400 total - age?? 7:???? 178000 bytes,??? 4144400 total - age?? 8:???? 156904 bytes,??? 4301304 total - age?? 9:???? 193424 bytes,??? 4494728 total - age? 10:???? 176272 bytes,??? 4671000 total - age? 11:???? 134768 bytes,??? 4805768 total - age? 12:???? 138896 bytes,??? 4944664 total - age? 13:???? 132272 bytes,??? 5076936 total - age? 14:???? 132856 bytes,??? 5209792 total - age? 15:???? 161912 bytes,??? 5371704 total ?1124606.686: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 20335, predicted base time: 38.61 ms, remaining time: 161.39 ms, target pause time: 200.00 ms] ?1124606.686: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 95 regions, survivors: 7 regions, predicted young region time: 4.53 ms] ?1124606.686: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: reclaimable percentage not over threshold, old: 1 regions, max: 205 regions, reclaimable: 428818280 bytes (9.98 %), threshold: 10.00 %] ?1124606.686: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 95 regions, survivors: 7 regions, old: 1 regions, predicted pause time: 45.72 ms, target pause time: 200.00 ms] ?1124606.731: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: region allocation request failed, allocation request: 1048576 bytes] ?1124606.731: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 1048576 bytes, attempted expansion amount: 2097152 bytes] ?1124606.731: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] ?1124606.743: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 4095737856 bytes, allocation request: 0 bytes, threshold: 1932735240 bytes (45.00 %), source: end of GC] ?1124606.743: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 284 regions, reclaimable: 428818280 bytes (9.98 %), threshold: 10.00 %] ?(to-space exhausted), 0.0568178 secs] ?? [Parallel Time: 40.4 ms, GC Workers: 4] ????? [GC Worker Start (ms): Min: 1124606686.1, Avg: 1124606701.7, Max: 1124606723.8, Diff: 37.7] ????? [Ext Root Scanning (ms): Min: 0.1, Avg: 6.3, Max: 16.1, Diff: 16.1, Sum: 25.4] ????? [Update RS (ms): Min: 0.0, Avg: 9.6, Max: 13.3, Diff: 13.3, Sum: 38.6] ???????? [Processed Buffers: Min: 0, Avg: 37.5, Max: 52, Diff: 52, Sum: 150] ????? [Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 0.4, Diff: 0.4, Sum: 1.1] ????? [Object Copy (ms): Min: 2.6, Avg: 8.4, Max: 11.1, Diff: 8.5, Sum: 33.7] ????? [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] ????? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] ????? [GC Worker Total (ms): Min: 2.7, Avg: 24.7, Max: 40.4, Diff: 37.7, Sum: 98.9] ????? [GC Worker End (ms): Min: 1124606726.5, Avg: 1124606726.5, Max: 1124606726.5, Diff: 0.0] ?? [Code Root Fixup: 0.0 ms] ?? [Clear CT: 0.1 ms] ?? [Other: 16.3 ms] ????? [Choose CSet: 0.0 ms] ????? [Ref Proc: 7.7 ms] ????? [Ref Enq: 0.2 ms] ????? [Free CSet: 0.3 ms] ?? [Eden: 190.0M(190.0M)->0.0B(188.0M) Survivors: 14.0M->16.0M Heap: 4077.1M(4096.0M)->3921.6M(4096.0M)] ?[Times: user=0.11 sys=0.00, real=0.06 secs] 2014-08-06T04:49:57.698-0700: 1124739.058: [GC pause (young) Desired survivor size 13631488 bytes, new threshold 15 (max 15) - age?? 1:??? 1130192 bytes,??? 1130192 total - age?? 2:???? 492816 bytes,??? 1623008 total - age?? 3:???? 675240 bytes,??? 2298248 total - age?? 4:??? 1038536 bytes,??? 3336784 total - age?? 5:???? 208048 bytes,??? 3544832 total - age?? 6:???? 436520 bytes,??? 3981352 total - age?? 7:???? 184528 bytes,??? 4165880 total - age?? 8:???? 165376 bytes,??? 4331256 total - age?? 9:???? 154872 bytes,??? 4486128 total - age? 10:???? 179016 bytes,??? 4665144 total - age? 11:???? 167760 bytes,??? 4832904 total - age? 12:???? 132056 bytes,??? 4964960 total - age? 13:???? 138736 bytes,??? 5103696 total - age? 14:???? 132080 bytes,??? 5235776 total - age? 15:???? 132856 bytes,??? 5368632 total ?1124739.058: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 44501, predicted base time: 51.94 ms, remaining time: 148.06 ms, target pause time: 200.00 ms] ?1124739.058: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 87 regions, survivors: 8 regions, predicted young region time: 4.37 ms] ?1124739.058: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 87 regions, survivors: 8 regions, old: 0 regions, predicted pause time: 56.32 ms, target pause time: 200.00 ms] ?1124739.060: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: region allocation request failed, allocation request: 1048576 bytes] ?1124739.060: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 1048576 bytes, attempted expansion amount: 2097152 bytes] ?1124739.060: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] ?1124739.252: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 4294967296 bytes, allocation request: 0 bytes, threshold: 1932735240 bytes (45.00 %), source: end of GC] ?(to-space exhausted), 0.1936102 secs] ?? [Parallel Time: 146.6 ms, GC Workers: 4] ????? [GC Worker Start (ms): Min: 1124739058.5, Avg: 1124739061.6, Max: 1124739063.0, Diff: 4.4] ????? [Ext Root Scanning (ms): Min: 0.2, Avg: 7.0, Max: 14.3, Diff: 14.0, Sum: 28.2] ????? [Update RS (ms): Min: 4.8, Avg: 10.7, Max: 17.6, Diff: 12.8, Sum: 42.7] ???????? [Processed Buffers: Min: 47, Avg: 56.3, Max: 69, Diff: 22, Sum: 225] ????? [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.2] ????? [Object Copy (ms): Min: 113.1, Avg: 125.6, Max: 137.6, Diff: 24.5, Sum: 502.5] ????? [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.2] ????? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] ????? [GC Worker Total (ms): Min: 142.1, Avg: 143.5, Max: 146.5, Diff: 4.4, Sum: 573.8] ????? [GC Worker End (ms): Min: 1124739205.1, Avg: 1124739205.1, Max: 1124739205.1, Diff: 0.0] ?? [Code Root Fixup: 0.0 ms] ?? [Clear CT: 0.1 ms] ?? [Other: 46.9 ms] ????? [Choose CSet: 0.0 ms] ????? [Ref Proc: 1.0 ms] ????? [Ref Enq: 0.1 ms] ????? [Free CSet: 0.2 ms] ?? [Eden: 174.0M(188.0M)->0.0B(204.0M) Survivors: 16.0M->0.0B Heap: 4095.6M(4096.0M)->4095.6M(4096.0M)] ?[Times: user=0.36 sys=0.00, real=0.19 secs] ?1124739.259: [G1Ergonomics (Concurrent Cycles) initiate concurrent cycle, reason: concurrent cycle initiation requested] 2014-08-06T04:49:57.898-0700: 1124739.259: [GC pause (young) (initial-mark) Desired survivor size 13631488 bytes, new threshold 15 (max 15) ?1124739.259: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 322560, predicted base time: 205.33 ms, remaining time: 0.00 ms, target pause time: 200.00 ms] ?1124739.259: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 0 regions, survivors: 0 regions, predicted young region time: 0.00 ms] ?1124739.259: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 0 regions, survivors: 0 regions, old: 0 regions, predicted pause time: 205.33 ms, target pause time: 200.00 ms] , 0.0347198 secs] ?? [Parallel Time: 33.1 ms, GC Workers: 4] ????? [GC Worker Start (ms): Min: 1124739259.3, Avg: 1124739259.3, Max: 1124739259.3, Diff: 0.0] ????? [Ext Root Scanning (ms): Min: 5.5, Avg: 7.7, Max: 11.0, Diff: 5.4, Sum: 30.6] ????? [Update RS (ms): Min: 18.4, Avg: 19.8, Max: 20.6, Diff: 2.2, Sum: 79.4] ???????? [Processed Buffers: Min: 293, Avg: 315.3, Max: 350, Diff: 57, Sum: 1261] ????? [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] ????? [Object Copy (ms): Min: 1.6, Avg: 5.4, Max: 6.9, Diff: 5.3, Sum: 21.7] ????? [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.4] ????? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] ????? [GC Worker Total (ms): Min: 33.0, Avg: 33.0, Max: 33.1, Diff: 0.1, Sum: 132.1] ????? [GC Worker End (ms): Min: 1124739292.3, Avg: 1124739292.3, Max: 1124739292.3, Diff: 0.0] ?? [Code Root Fixup: 0.0 ms] ?? [Clear CT: 0.1 ms] ?? [Other: 1.5 ms] ????? [Choose CSet: 0.0 ms] ????? [Ref Proc: 1.0 ms] ????? [Ref Enq: 0.1 ms] ????? [Free CSet: 0.0 ms] ?? [Eden: 0.0B(204.0M)->0.0B(204.0M) Survivors: 0.0B->0.0B Heap: 4095.6M(4096.0M)->4095.6M(4096.0M)] ?[Times: user=0.12 sys=0.00, real=0.04 secs] 2014-08-06T04:49:57.933-0700: 1124739.294: [GC concurrent-root-region-scan-start] 2014-08-06T04:49:57.933-0700: 1124739.294: [GC concurrent-root-region-scan-end, 0.0000157 secs] 2014-08-06T04:49:57.933-0700: 1124739.294: [GC concurrent-mark-start] ?1124739.295: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 80 bytes] ?1124739.295: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 2097152 bytes, attempted expansion amount: 2097152 bytes] ?1124739.295: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 2014-08-06T04:49:57.934-0700: 1124739.295: [Full GC 4095M->2235M(4096M), 10.5341003 secs] ?? [Eden: 0.0B(204.0M)->0.0B(1436.0M) Survivors: 0.0B->0.0B Heap: 4095.6M(4096.0M)->2235.4M(4096.0M)] ?[Times: user=13.20 sys=0.03, real=10.52 secs] -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Wed Aug 6 23:54:31 2014 From: yu.zhang at oracle.com (Yu Zhang) Date: Wed, 06 Aug 2014 16:54:31 -0700 Subject: Seeking help regarding Full GCs with G1 In-Reply-To: <1407365331.22707.YahooMailNeo@web140705.mail.bf1.yahoo.com> References: <1407365331.22707.YahooMailNeo@web140705.mail.bf1.yahoo.com> Message-ID: <53E2C037.1090009@oracle.com> Srini, About -XX:G1MixedGCLiveThresholdPercent=65(default) If the region's live data is above 65%, this region will not be considered as a candidate. In your case, you need to increase -XX:G1MixedGCLiveThresholdPercent. Thanks, Jenny On 8/6/2014 3:48 PM, Srini Padman wrote: > Hello, > > I am currently evaluating the use of the G1 Collector for our > application, to combat the fragmentation issues we ran into while > using the CMS collector (several cases of failed promotions, followed > by *really* long pauses). However, I am also having trouble with > tuning the G1 collector, and am seeing behavior that I can't fully > understand. I will appreciate any help/insight that you guys can offer. > > What I find puzzling from looking at the G1 GC logs from our tests is > that the concurrent marking phase does not really seem to identify > many old regions to clean up at all, and the heap usage keeps growing. > At some point, there is no further room to expand ("heap expansion > operation failed") and this is followed by a Full GC that lasts about > 10 seconds. But the Full GC actually brings the memory down by almost > 50%, from 4095M to 2235M. > > If the Full GC can collect this much of the heap, I don't fully > understand why the concurrent mark phase does not identify these > (old?) regions for (mixed?) collection subsequently. > > On the assumption that we should let the GC ergonomics do its thing > freely, I initially did not set any parameter other than -Xmx, -Xms, > and the PermGen sizes. I added the G1HeapRegionSize and > G1MixedGCLiveThresholdPercent settings (see below) because, when I saw > the Full GCs with the default settings, I wondered whether we might be > getting into a situation where all (or most?) regions are roughly 65% > live so the concurrent marking phase does not identify them for > collection but a subsequent Full GC is able to. That is, I wondered > whether our application's heap footprint being 65% of the max heap led > to these full GCs coincidentally (since G1MixedGCLiveThresholdPercent > is 65% by default). But I don't know why the same thing happens when I > set G1MixedGCLiveThresholdPercent down to 40% - even if all regions > are 40% full, we will only be at about 1.6 GB, and that is far below > what I think our heap footprint is in the long run (2.2 GB). So I > don't understand how to ensure that old regions are cleaned up > regularly so a Full GC is not required. > > GC Settings in use: > > -server -Xms4096m -Xmx4096m -Xss512k -XX:PermSize=128m > -XX:MaxPermSize=128m -XX:+UseG1GC -XX:+ParallelRefProcEnabled > -XX:+DisableExplicitGC -verbose:gc -XX:+PrintAdaptiveSizePolicy > -XX:+UnlockExperimentalVMOptions -XX:G1HeapRegionSize=2m > -XX:G1MixedGCLiveThresholdPercent=40 > > This is using JRE 1.7.0_55. > > I am including a short(ish) GC log snippet for the time leading up to > the Full GC. I can send the full GC log (about 8 MB, zipped) if necessary. > > Any help will be greatly appreciated! > > Regards, > Srini. > > --------------------------- > > 2014-08-06T04:46:00.067-0700: 1124501.033: [GC > concurrent-root-region-scan-start] > 2014-08-06T04:46:00.081-0700: 1124501.047: [GC > concurrent-root-region-scan-end, 0.0139487 secs] > 2014-08-06T04:46:00.081-0700: 1124501.047: [GC concurrent-mark-start] > 2014-08-06T04:46:10.531-0700: 1124511.514: [GC concurrent-mark-end, > 10.4675249 secs] > 2014-08-06T04:46:10.532-0700: 1124511.515: [GC remark > 2014-08-06T04:46:10.532-0700: 1124511.516: [GC ref-proc, 0.0018819 > secs], 0.0225253 secs] > [Times: user=0.01 sys=0.00, real=0.02 secs] > 2014-08-06T04:46:10.555-0700: 1124511.539: [GC cleanup > 3922M->3922M(4096M), 0.0098209 secs] > [Times: user=0.03 sys=0.03, real=0.01 secs] > 2014-08-06T04:46:49.603-0700: 1124550.652: [GC pause (young) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > - age 1: 1531592 bytes, 1531592 total > - age 2: 1087648 bytes, 2619240 total > - age 3: 259480 bytes, 2878720 total > - age 4: 493976 bytes, 3372696 total > - age 5: 213472 bytes, 3586168 total > - age 6: 186104 bytes, 3772272 total > - age 7: 169832 bytes, 3942104 total > - age 8: 201968 bytes, 4144072 total > - age 9: 183752 bytes, 4327824 total > - age 10: 136480 bytes, 4464304 total > - age 11: 366208 bytes, 4830512 total > - age 12: 137296 bytes, 4967808 total > - age 13: 133592 bytes, 5101400 total > - age 14: 162232 bytes, 5263632 total > - age 15: 139984 bytes, 5403616 total > 1124550.652: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 21647, predicted base time: 37.96 ms, remaining time: > 162.04 ms, target pause time: 200.00 ms] > 1124550.652: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 95 regions, survivors: 7 regions, predicted young region > time: 4.46 ms] > 1124550.652: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 95 regions, survivors: 7 regions, old: 0 regions, predicted > pause time: 42.42 ms, target pause time: 200.00 ms] > 1124550.701: [G1Ergonomics (Concurrent Cycles) do not request > concurrent cycle initiation, reason: still doing mixed collections, > occupancy: 4064280576 bytes, allocation request: 0 bytes, threshold: > 1932735240 bytes (45.00 %), source: end of GC] > 1124550.701: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: > candidate old regions available, candidate old regions: 285 regions, > reclaimable: 430117688 bytes (10.01 %), threshold: 10.00 %] > , 0.0494015 secs] > [Parallel Time: 43.7 ms, GC Workers: 4] > [GC Worker Start (ms): Min: 1124550651.8, Avg: 1124550668.7, > Max: 1124550674.3, Diff: 22.6] > [Ext Root Scanning (ms): Min: 0.1, Avg: 6.7, Max: 22.3, Diff: > 22.2, Sum: 26.8] > [Update RS (ms): Min: 9.9, Avg: 11.0, Max: 12.3, Diff: 2.5, Sum: > 44.0] > [Processed Buffers: Min: 39, Avg: 40.3, Max: 41, Diff: 2, > Sum: 161] > [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2] > [Object Copy (ms): Min: 8.6, Avg: 8.9, Max: 9.6, Diff: 1.0, Sum: > 35.6] > [Termination (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: > 0.3] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, > Sum: 0.2] > [GC Worker Total (ms): Min: 21.1, Avg: 26.8, Max: 43.7, Diff: > 22.6, Sum: 107.1] > [GC Worker End (ms): Min: 1124550695.4, Avg: 1124550695.5, Max: > 1124550695.5, Diff: 0.0] > [Code Root Fixup: 0.0 ms] > [Clear CT: 0.1 ms] > [Other: 5.6 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 4.5 ms] > [Ref Enq: 0.1 ms] > [Free CSet: 0.3 ms] > [Eden: 190.0M(190.0M)->0.0B(190.0M) Survivors: 14.0M->14.0M Heap: > 4077.0M(4096.0M)->3887.1M(4096.0M)] > [Times: user=0.11 sys=0.00, real=0.05 secs] > 2014-08-06T04:47:45.545-0700: 1124606.686: [GC pause (mixed) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > - age 1: 1323232 bytes, 1323232 total > - age 2: 716576 bytes, 2039808 total > - age 3: 1058584 bytes, 3098392 total > - age 4: 225208 bytes, 3323600 total > - age 5: 447688 bytes, 3771288 total > - age 6: 195112 bytes, 3966400 total > - age 7: 178000 bytes, 4144400 total > - age 8: 156904 bytes, 4301304 total > - age 9: 193424 bytes, 4494728 total > - age 10: 176272 bytes, 4671000 total > - age 11: 134768 bytes, 4805768 total > - age 12: 138896 bytes, 4944664 total > - age 13: 132272 bytes, 5076936 total > - age 14: 132856 bytes, 5209792 total > - age 15: 161912 bytes, 5371704 total > 1124606.686: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 20335, predicted base time: 38.61 ms, remaining time: > 161.39 ms, target pause time: 200.00 ms] > 1124606.686: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 95 regions, survivors: 7 regions, predicted young region > time: 4.53 ms] > 1124606.686: [G1Ergonomics (CSet Construction) finish adding old > regions to CSet, reason: reclaimable percentage not over threshold, > old: 1 regions, max: 205 regions, reclaimable: 428818280 bytes (9.98 > %), threshold: 10.00 %] > 1124606.686: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 95 regions, survivors: 7 regions, old: 1 regions, predicted > pause time: 45.72 ms, target pause time: 200.00 ms] > 1124606.731: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: region allocation request failed, allocation request: 1048576 > bytes] > 1124606.731: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 1048576 bytes, attempted expansion amount: 2097152 > bytes] > 1124606.731: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > 1124606.743: [G1Ergonomics (Concurrent Cycles) do not request > concurrent cycle initiation, reason: still doing mixed collections, > occupancy: 4095737856 bytes, allocation request: 0 bytes, threshold: > 1932735240 bytes (45.00 %), source: end of GC] > 1124606.743: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, > reason: reclaimable percentage not over threshold, candidate old > regions: 284 regions, reclaimable: 428818280 bytes (9.98 %), > threshold: 10.00 %] > (to-space exhausted), 0.0568178 secs] > [Parallel Time: 40.4 ms, GC Workers: 4] > [GC Worker Start (ms): Min: 1124606686.1, Avg: 1124606701.7, > Max: 1124606723.8, Diff: 37.7] > [Ext Root Scanning (ms): Min: 0.1, Avg: 6.3, Max: 16.1, Diff: > 16.1, Sum: 25.4] > [Update RS (ms): Min: 0.0, Avg: 9.6, Max: 13.3, Diff: 13.3, Sum: > 38.6] > [Processed Buffers: Min: 0, Avg: 37.5, Max: 52, Diff: 52, > Sum: 150] > [Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 0.4, Diff: 0.4, Sum: 1.1] > [Object Copy (ms): Min: 2.6, Avg: 8.4, Max: 11.1, Diff: 8.5, > Sum: 33.7] > [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: > 0.0] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > Sum: 0.1] > [GC Worker Total (ms): Min: 2.7, Avg: 24.7, Max: 40.4, Diff: > 37.7, Sum: 98.9] > [GC Worker End (ms): Min: 1124606726.5, Avg: 1124606726.5, Max: > 1124606726.5, Diff: 0.0] > [Code Root Fixup: 0.0 ms] > [Clear CT: 0.1 ms] > [Other: 16.3 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 7.7 ms] > [Ref Enq: 0.2 ms] > [Free CSet: 0.3 ms] > [Eden: 190.0M(190.0M)->0.0B(188.0M) Survivors: 14.0M->16.0M Heap: > 4077.1M(4096.0M)->3921.6M(4096.0M)] > [Times: user=0.11 sys=0.00, real=0.06 secs] > 2014-08-06T04:49:57.698-0700: 1124739.058: [GC pause (young) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > - age 1: 1130192 bytes, 1130192 total > - age 2: 492816 bytes, 1623008 total > - age 3: 675240 bytes, 2298248 total > - age 4: 1038536 bytes, 3336784 total > - age 5: 208048 bytes, 3544832 total > - age 6: 436520 bytes, 3981352 total > - age 7: 184528 bytes, 4165880 total > - age 8: 165376 bytes, 4331256 total > - age 9: 154872 bytes, 4486128 total > - age 10: 179016 bytes, 4665144 total > - age 11: 167760 bytes, 4832904 total > - age 12: 132056 bytes, 4964960 total > - age 13: 138736 bytes, 5103696 total > - age 14: 132080 bytes, 5235776 total > - age 15: 132856 bytes, 5368632 total > 1124739.058: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 44501, predicted base time: 51.94 ms, remaining time: > 148.06 ms, target pause time: 200.00 ms] > 1124739.058: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 87 regions, survivors: 8 regions, predicted young region > time: 4.37 ms] > 1124739.058: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 87 regions, survivors: 8 regions, old: 0 regions, predicted > pause time: 56.32 ms, target pause time: 200.00 ms] > 1124739.060: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: region allocation request failed, allocation request: 1048576 > bytes] > 1124739.060: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 1048576 bytes, attempted expansion amount: 2097152 > bytes] > 1124739.060: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > 1124739.252: [G1Ergonomics (Concurrent Cycles) request concurrent > cycle initiation, reason: occupancy higher than threshold, occupancy: > 4294967296 bytes, allocation request: 0 bytes, threshold: 1932735240 > bytes (45.00 %), source: end of GC] > (to-space exhausted), 0.1936102 secs] > [Parallel Time: 146.6 ms, GC Workers: 4] > [GC Worker Start (ms): Min: 1124739058.5, Avg: 1124739061.6, > Max: 1124739063.0, Diff: 4.4] > [Ext Root Scanning (ms): Min: 0.2, Avg: 7.0, Max: 14.3, Diff: > 14.0, Sum: 28.2] > [Update RS (ms): Min: 4.8, Avg: 10.7, Max: 17.6, Diff: 12.8, > Sum: 42.7] > [Processed Buffers: Min: 47, Avg: 56.3, Max: 69, Diff: 22, > Sum: 225] > [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.2] > [Object Copy (ms): Min: 113.1, Avg: 125.6, Max: 137.6, Diff: > 24.5, Sum: 502.5] > [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: > 0.2] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > Sum: 0.1] > [GC Worker Total (ms): Min: 142.1, Avg: 143.5, Max: 146.5, Diff: > 4.4, Sum: 573.8] > [GC Worker End (ms): Min: 1124739205.1, Avg: 1124739205.1, Max: > 1124739205.1, Diff: 0.0] > [Code Root Fixup: 0.0 ms] > [Clear CT: 0.1 ms] > [Other: 46.9 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 1.0 ms] > [Ref Enq: 0.1 ms] > [Free CSet: 0.2 ms] > [Eden: 174.0M(188.0M)->0.0B(204.0M) Survivors: 16.0M->0.0B Heap: > 4095.6M(4096.0M)->4095.6M(4096.0M)] > [Times: user=0.36 sys=0.00, real=0.19 secs] > 1124739.259: [G1Ergonomics (Concurrent Cycles) initiate concurrent > cycle, reason: concurrent cycle initiation requested] > 2014-08-06T04:49:57.898-0700: 1124739.259: [GC pause (young) > (initial-mark) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > 1124739.259: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 322560, predicted base time: 205.33 ms, remaining > time: 0.00 ms, target pause time: 200.00 ms] > 1124739.259: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 0 regions, survivors: 0 regions, predicted young region > time: 0.00 ms] > 1124739.259: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 0 regions, survivors: 0 regions, old: 0 regions, predicted pause > time: 205.33 ms, target pause time: 200.00 ms] > , 0.0347198 secs] > [Parallel Time: 33.1 ms, GC Workers: 4] > [GC Worker Start (ms): Min: 1124739259.3, Avg: 1124739259.3, > Max: 1124739259.3, Diff: 0.0] > [Ext Root Scanning (ms): Min: 5.5, Avg: 7.7, Max: 11.0, Diff: > 5.4, Sum: 30.6] > [Update RS (ms): Min: 18.4, Avg: 19.8, Max: 20.6, Diff: 2.2, > Sum: 79.4] > [Processed Buffers: Min: 293, Avg: 315.3, Max: 350, Diff: 57, > Sum: 1261] > [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] > [Object Copy (ms): Min: 1.6, Avg: 5.4, Max: 6.9, Diff: 5.3, Sum: > 21.7] > [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: > 0.4] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > Sum: 0.1] > [GC Worker Total (ms): Min: 33.0, Avg: 33.0, Max: 33.1, Diff: > 0.1, Sum: 132.1] > [GC Worker End (ms): Min: 1124739292.3, Avg: 1124739292.3, Max: > 1124739292.3, Diff: 0.0] > [Code Root Fixup: 0.0 ms] > [Clear CT: 0.1 ms] > [Other: 1.5 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 1.0 ms] > [Ref Enq: 0.1 ms] > [Free CSet: 0.0 ms] > [Eden: 0.0B(204.0M)->0.0B(204.0M) Survivors: 0.0B->0.0B Heap: > 4095.6M(4096.0M)->4095.6M(4096.0M)] > [Times: user=0.12 sys=0.00, real=0.04 secs] > 2014-08-06T04:49:57.933-0700: 1124739.294: [GC > concurrent-root-region-scan-start] > 2014-08-06T04:49:57.933-0700: 1124739.294: [GC > concurrent-root-region-scan-end, 0.0000157 secs] > 2014-08-06T04:49:57.933-0700: 1124739.294: [GC concurrent-mark-start] > 1124739.295: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: allocation request failed, allocation request: 80 bytes] > 1124739.295: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 2097152 bytes, attempted expansion amount: 2097152 > bytes] > 1124739.295: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > 2014-08-06T04:49:57.934-0700: 1124739.295: [Full GC > 4095M->2235M(4096M), 10.5341003 secs] > [Eden: 0.0B(204.0M)->0.0B(1436.0M) Survivors: 0.0B->0.0B Heap: > 4095.6M(4096.0M)->2235.4M(4096.0M)] > [Times: user=13.20 sys=0.03, real=10.52 secs] > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.makundi at koodaripalvelut.com Thu Aug 7 00:14:31 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Thu, 7 Aug 2014 03:14:31 +0300 Subject: G1gc compaction algorithm In-Reply-To: <53E2A3CE.6090803@oracle.com> References: <53D09794.3090806@oracle.com> <1406186302.2920.4.camel@cirrus> <1406186865.2920.8.camel@cirrus> <53DA7B4D.3090000@oracle.com> <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> Message-ID: Hi! Thanks. We were currently running 4M with wastepercent=0 but unfortunately don't have any logs for that. Will try "-XX:G1HeapRegionSize=16m -XX:G1HeapWastePercent=10 -XX:G1RSetRegionEntries=1792 -XX:G1RSetSparseRegionEntries=40" and post results later back here. ** Martin 2014-08-07 0:53 GMT+03:00 Yu Zhang : > Martin, > > Thanks for the logs and following up with us. > > In this chart, the purple line is the ScanRS time for mixed gc. At the > bottom there are grey circles indicating when the initial mark happens. > The white is the ScanRS time for mixed gc with to-space exhausted. > > > You can see that the first several scanRS after initial-mark is ok, then > they go up to 7000ms. For the 16m region size runs, you have > G1HeapWastePercent=0. (4m region size has G1HeapWastePercent=1). Because > of this, g1 will not stop mixed gc till there is no candidate regions. > From the space claimed by mixed gc, it claims 2-3g heap, but the price is > too high. Another disadvantage is it does not start marking phases: > "do not request concurrent cycle initiation, reason: still doing mixed > collections, occupancy: 2147483648 bytes, allocation request: 0 bytes, > threshold: 2147483640 bytes (10.00 %), source: end of GC]" > > If you look at the claimable heap in MB, 4m heap region size starts mixed > gc at lower reclaimable > > > Another thing is there are coarsening for RSet Entries. > > Can you do one with > -XX:G1HeapRegionSize=16m -XX:G1HeapWastePercent=10 > -XX:G1RSetRegionEntries=1792 -XX:G1RSetSparseRegionEntries=40 > > Thanks, > Jenny > > On 8/4/2014 9:33 AM, Martin Makundi wrote: > > Hi! > > Here are the fresh logs: > > http://81.22.250.165/log/gc-16m-2014-08-04.log > > Today we were hit by quite some number of Full GC's with quite short > intervals and as can be suspected, not so happy users ;) > > Any ideas? I will reduce the region size to 4M for now, because it > resulted in much fewer full gcs. > > ** > Martin > > > 2014-08-01 1:17 GMT+03:00 Martin Makundi < > martin.makundi at koodaripalvelut.com>: > >> Hmm.. ok, I copy pasted if from the mail, it works after typing manually, >> thanks. >> >> Problem seems to have been BOTH a whitespace typo AND >> UnlockDiagnosticOptions was on the right side. >> >> Thanks. >> >> Gathering logs now. >> >> ** >> Martin >> >> >> 2014-08-01 1:01 GMT+03:00 Yu Zhang : >> >> maybe some hidden text? >>> >>> Thanks, >>> Jenny >>> >>> On 7/31/2014 2:52 PM, Martin Makundi wrote: >>> >>> Strange that it is in the property summary but doesn't allow setting it. >>> >>> >>> 2014-08-01 0:39 GMT+03:00 Martin Makundi < >>> martin.makundi at koodaripalvelut.com>: >>> >>>> Hi! >>>> >>>> UnlockDiagnosticVMOptions is on (though later (on the right side) in >>>> the command line). Jvm version is >>>> >>>> java version "1.7.0_55" >>>> Java(TM) SE Runtime Environment (build 1.7.0_55-b13) >>>> Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) >>>> >>>> >>>> >>>> 2014-08-01 0:37 GMT+03:00 Yu Zhang : >>>> >>>> Martin, >>>>> >>>>> These 2 need to run with -XX:+UnlockDiagnosticVMOptions >>>>> >>>>> Thanks, >>>>> Jenny >>>>> >>>>> On 7/31/2014 2:33 PM, Martin Makundi wrote: >>>>> >>>>> Hi! >>>>> >>>>> G1SummarizeRSetStats does not seem to work, jvm says: >>>>> >>>>> Improperly specified VM option 'G1SummarizeRSetStatsPeriod=10' >>>>> Error: Could not create the Java Virtual Machine. >>>>> Error: A fatal exception has occurred. Program will exit. >>>>> >>>>> Same for both new options >>>>> >>>>> >>>>> >>>>> 2014-07-31 20:22 GMT+03:00 Yu Zhang : >>>>> >>>>>> Martin, >>>>>> >>>>>> The ScanRS for mixed gc is extremely long, 1000-9000ms. Because it >>>>>> is over pause time goal, minimum old regions can be added to CSet. So >>>>>> mixed gc is not keeping up. >>>>>> >>>>>> Can do a run keeping 16m region size, no G1PrintRegionLivenessInfo, >>>>>> no PrintHeapAtGC. But -XX:+G1SummarizeRSetStats >>>>>> -XX:G1SummarizeRSetStatsPeriod=10 >>>>>> >>>>>> This should tell us more about RSet information. >>>>>> >>>>>> While the UpdateRS is not as bad as ScanRS, we can try to push it to >>>>>> the concurrent threads. Can you add -XX:G1RSetUpdatingPauseTimePercent=5. >>>>>> I am hoping this brings the UpdateRS down to 50ms. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Jenny >>>>>> >>>>>> On 7/28/2014 8:27 PM, Martin Makundi wrote: >>>>>> >>>>>>> Hi! >>>>>>> >>>>>>> We suffered a couple of Full GC's using regionsize 5M (it seems to >>>>>>> be exact looking at logged actual parameters) and we tried the 16M option >>>>>>> and this resulted in more severe Full GC behavior. >>>>>>> >>>>>>> Here is the promised log for 16 M setting: >>>>>>> http://81.22.250.165/log/gc-16m.log >>>>>>> >>>>>>> We switch back to 5M hoping it will behave more nicely. >>>>>>> >>>>>>> ** >>>>>>> Martin >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: jggabdja.png Type: image/png Size: 22281 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dbhagche.png Type: image/png Size: 19368 bytes Desc: not available URL: From martin.makundi at koodaripalvelut.com Thu Aug 7 00:18:20 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Thu, 7 Aug 2014 03:18:20 +0300 Subject: G1gc compaction algorithm In-Reply-To: References: <53D09794.3090806@oracle.com> <1406186302.2920.4.camel@cirrus> <1406186865.2920.8.camel@cirrus> <53DA7B4D.3090000@oracle.com> <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> Message-ID: Hi! Meanwhile, I was wondering what G1RSetRegionEntries and G1RSetSparseRegionEntries do, google didn't give much information about those. How do they work and which things do they affect? ** Martin 2014-08-07 3:14 GMT+03:00 Martin Makundi : > Hi! > > Thanks. We were currently running 4M with wastepercent=0 but unfortunately > don't have any logs for that. > > Will try "-XX:G1HeapRegionSize=16m -XX:G1HeapWastePercent=10 > -XX:G1RSetRegionEntries=1792 -XX:G1RSetSparseRegionEntries=40" and post > results later back here. > > ** > Martin > > > 2014-08-07 0:53 GMT+03:00 Yu Zhang : > > Martin, >> >> Thanks for the logs and following up with us. >> >> In this chart, the purple line is the ScanRS time for mixed gc. At the >> bottom there are grey circles indicating when the initial mark happens. >> The white is the ScanRS time for mixed gc with to-space exhausted. >> >> >> You can see that the first several scanRS after initial-mark is ok, then >> they go up to 7000ms. For the 16m region size runs, you have >> G1HeapWastePercent=0. (4m region size has G1HeapWastePercent=1). Because >> of this, g1 will not stop mixed gc till there is no candidate regions. >> From the space claimed by mixed gc, it claims 2-3g heap, but the price is >> too high. Another disadvantage is it does not start marking phases: >> "do not request concurrent cycle initiation, reason: still doing mixed >> collections, occupancy: 2147483648 bytes, allocation request: 0 bytes, >> threshold: 2147483640 bytes (10.00 %), source: end of GC]" >> >> If you look at the claimable heap in MB, 4m heap region size starts mixed >> gc at lower reclaimable >> >> >> Another thing is there are coarsening for RSet Entries. >> >> Can you do one with >> -XX:G1HeapRegionSize=16m -XX:G1HeapWastePercent=10 >> -XX:G1RSetRegionEntries=1792 -XX:G1RSetSparseRegionEntries=40 >> >> Thanks, >> Jenny >> >> On 8/4/2014 9:33 AM, Martin Makundi wrote: >> >> Hi! >> >> Here are the fresh logs: >> >> http://81.22.250.165/log/gc-16m-2014-08-04.log >> >> Today we were hit by quite some number of Full GC's with quite short >> intervals and as can be suspected, not so happy users ;) >> >> Any ideas? I will reduce the region size to 4M for now, because it >> resulted in much fewer full gcs. >> >> ** >> Martin >> >> >> 2014-08-01 1:17 GMT+03:00 Martin Makundi < >> martin.makundi at koodaripalvelut.com>: >> >>> Hmm.. ok, I copy pasted if from the mail, it works after typing >>> manually, thanks. >>> >>> Problem seems to have been BOTH a whitespace typo AND >>> UnlockDiagnosticOptions was on the right side. >>> >>> Thanks. >>> >>> Gathering logs now. >>> >>> ** >>> Martin >>> >>> >>> 2014-08-01 1:01 GMT+03:00 Yu Zhang : >>> >>> maybe some hidden text? >>>> >>>> Thanks, >>>> Jenny >>>> >>>> On 7/31/2014 2:52 PM, Martin Makundi wrote: >>>> >>>> Strange that it is in the property summary but doesn't allow setting it. >>>> >>>> >>>> 2014-08-01 0:39 GMT+03:00 Martin Makundi < >>>> martin.makundi at koodaripalvelut.com>: >>>> >>>>> Hi! >>>>> >>>>> UnlockDiagnosticVMOptions is on (though later (on the right side) in >>>>> the command line). Jvm version is >>>>> >>>>> java version "1.7.0_55" >>>>> Java(TM) SE Runtime Environment (build 1.7.0_55-b13) >>>>> Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) >>>>> >>>>> >>>>> >>>>> 2014-08-01 0:37 GMT+03:00 Yu Zhang : >>>>> >>>>> Martin, >>>>>> >>>>>> These 2 need to run with -XX:+UnlockDiagnosticVMOptions >>>>>> >>>>>> Thanks, >>>>>> Jenny >>>>>> >>>>>> On 7/31/2014 2:33 PM, Martin Makundi wrote: >>>>>> >>>>>> Hi! >>>>>> >>>>>> G1SummarizeRSetStats does not seem to work, jvm says: >>>>>> >>>>>> Improperly specified VM option 'G1SummarizeRSetStatsPeriod=10' >>>>>> Error: Could not create the Java Virtual Machine. >>>>>> Error: A fatal exception has occurred. Program will exit. >>>>>> >>>>>> Same for both new options >>>>>> >>>>>> >>>>>> >>>>>> 2014-07-31 20:22 GMT+03:00 Yu Zhang : >>>>>> >>>>>>> Martin, >>>>>>> >>>>>>> The ScanRS for mixed gc is extremely long, 1000-9000ms. Because it >>>>>>> is over pause time goal, minimum old regions can be added to CSet. So >>>>>>> mixed gc is not keeping up. >>>>>>> >>>>>>> Can do a run keeping 16m region size, no G1PrintRegionLivenessInfo, >>>>>>> no PrintHeapAtGC. But -XX:+G1SummarizeRSetStats >>>>>>> -XX:G1SummarizeRSetStatsPeriod=10 >>>>>>> >>>>>>> This should tell us more about RSet information. >>>>>>> >>>>>>> While the UpdateRS is not as bad as ScanRS, we can try to push it to >>>>>>> the concurrent threads. Can you add -XX:G1RSetUpdatingPauseTimePercent=5. >>>>>>> I am hoping this brings the UpdateRS down to 50ms. >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Jenny >>>>>>> >>>>>>> On 7/28/2014 8:27 PM, Martin Makundi wrote: >>>>>>> >>>>>>>> Hi! >>>>>>>> >>>>>>>> We suffered a couple of Full GC's using regionsize 5M (it seems to >>>>>>>> be exact looking at logged actual parameters) and we tried the 16M option >>>>>>>> and this resulted in more severe Full GC behavior. >>>>>>>> >>>>>>>> Here is the promised log for 16 M setting: >>>>>>>> http://81.22.250.165/log/gc-16m.log >>>>>>>> >>>>>>>> We switch back to 5M hoping it will behave more nicely. >>>>>>>> >>>>>>>> ** >>>>>>>> Martin >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: jggabdja.png Type: image/png Size: 22281 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dbhagche.png Type: image/png Size: 19368 bytes Desc: not available URL: From srini_was at yahoo.com Thu Aug 7 02:17:15 2014 From: srini_was at yahoo.com (Srini Padman) Date: Wed, 6 Aug 2014 19:17:15 -0700 Subject: Seeking help regarding Full GCs with G1 In-Reply-To: <53E2C037.1090009@oracle.com> References: <1407365331.22707.YahooMailNeo@web140705.mail.bf1.yahoo.com> <53E2C037.1090009@oracle.com> Message-ID: <1407377835.75474.YahooMailNeo@web140706.mail.bf1.yahoo.com> Ah, I see. I think that does explain the mystery. The Oracle documentation only says this parameter "sets the occupancy threshold for an old region to be included in a mixed garbage collection cycle. The default occupancy is 65 percent." and I interpreted that as meaning a lower threshold rather than an upper threshold. It should have been obvious to me that the "Garbage First' collector, which aims to collect the "least live data first", will aim for an upper threshold, but yet I settled on exactly the wrong interpretation. Thanks for setting me right! Knowing this, I now think that the default 10% G1HeapWastePercent? value (which computes to 400 MB for us) and the application's footprint (~ 2.2 GB) *together* put our application at exactly the default occupancy threshold of 65% (2.6 GB). So we would have been operating in a configuration where no mixed GCs would ever be seen to be required. (Please correct me if that calculation is wrong.) If we decrease G1HeapWastePercent to 5% and increase G1MixedGCLiveThresholdPercent to 75%, then the corresponding numbers would be (2.2 GB footprint + 200 MB waste < 3 GB) so we will be doing mixed GCs from 2.4 GB until we (roughly) hit 3 GB after which the chances of a Full GC increase again. So the hope in this case would be that enough heap is collected to never let the usage go above 3 GB (assuming, of course, that that means all regions are at equal occupancy of 75%). Am I understanding this correctly? Also, what is the additional (performance) cost incurred because of setting the G1MixedGCLiveThresholdPercent value to 75%? Thanks once again - due to my misunderstanding, I was really not sure what to do next! Regards, Srini. On Wednesday, August 6, 2014 7:54 PM, Yu Zhang wrote: Srini, About -XX:G1MixedGCLiveThresholdPercent=65(default) If the region's live data is above 65%, this region will not be considered as a candidate. In your case, you need to increase -XX:G1MixedGCLiveThresholdPercent. Thanks, Jenny On 8/6/2014 3:48 PM, Srini Padman wrote: Hello, > >I am currently evaluating the use of the G1 Collector for our application, to combat the fragmentation issues we ran into while using the CMS collector (several cases of failed promotions, followed by *really* long pauses). However, I am also having trouble with tuning the G1 collector, and am seeing behavior that I can't fully understand. I will appreciate any help/insight that you guys can offer. > >What I find puzzling from looking at the G1 GC logs from our tests is that the concurrent marking phase does not really seem to identify many old regions to clean up at all, and the heap usage keeps growing. At some point, there is no further room to expand ("heap expansion operation failed") and this is followed by a Full GC that lasts about 10 seconds. But the Full GC actually brings the memory down by almost 50%, from 4095M to 2235M. > >If the Full GC can collect this much of the heap, I don't fully understand why the concurrent mark phase does not identify these (old?) regions for (mixed?) collection subsequently. > >On the assumption that we should let the GC ergonomics do its thing freely, I initially did not set any parameter other than -Xmx, -Xms, and the PermGen sizes. I added the G1HeapRegionSize and G1MixedGCLiveThresholdPercent settings (see below) because, when I saw the Full GCs with the default settings, I wondered whether we might be getting into a situation where all (or most?) regions are roughly 65% live so the concurrent marking phase does not identify them for collection but a subsequent Full GC is able to. That is, I wondered whether our application's heap footprint being 65% of the max heap led to these full GCs coincidentally (since G1MixedGCLiveThresholdPercent is 65% by default). But I don't know why the same thing happens when I set G1MixedGCLiveThresholdPercent down to 40% - even if all regions are 40% full, we will only be at about 1.6 GB, and that is far below what I think our heap footprint is in the long run (2.2 GB). So I don't understand how to ensure that old regions are cleaned up regularly so a Full GC is not required. > >GC Settings in use: > >-server -Xms4096m -Xmx4096m -Xss512k -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:+DisableExplicitGC -verbose:gc -XX:+PrintAdaptiveSizePolicy -XX:+UnlockExperimentalVMOptions -XX:G1HeapRegionSize=2m -XX:G1MixedGCLiveThresholdPercent=40 > >This is using JRE 1.7.0_55. > >I am including a short(ish) GC log snippet for the time leading up to the Full GC. I can send the full GC log (about 8 MB, zipped) if necessary. > >Any help will be greatly appreciated! > >Regards, >Srini. > >--------------------------- > >2014-08-06T04:46:00.067-0700: 1124501.033: [GC concurrent-root-region-scan-start] >2014-08-06T04:46:00.081-0700: 1124501.047: [GC concurrent-root-region-scan-end, 0.0139487 secs] >2014-08-06T04:46:00.081-0700: 1124501.047: [GC concurrent-mark-start] >2014-08-06T04:46:10.531-0700: 1124511.514: [GC concurrent-mark-end, 10.4675249 secs] >2014-08-06T04:46:10.532-0700: 1124511.515: [GC remark 2014-08-06T04:46:10.532-0700: 1124511.516: [GC ref-proc, 0.0018819 secs], 0.0225253 secs] >?[Times: user=0.01 sys=0.00, real=0.02 secs] >2014-08-06T04:46:10.555-0700: 1124511.539: [GC cleanup 3922M->3922M(4096M), 0.0098209 secs] >?[Times: user=0.03 sys=0.03, real=0.01 secs] >2014-08-06T04:46:49.603-0700: 1124550.652: [GC pause (young) >Desired survivor size 13631488 bytes, new threshold 15 (max 15) >- age?? 1:??? 1531592 bytes,??? 1531592 total >- age?? 2:??? 1087648 bytes,??? 2619240 total >- age?? 3:???? 259480 bytes,??? 2878720 total >- age?? 4:???? 493976 bytes,??? 3372696 total >- age?? 5:???? 213472 bytes,??? 3586168 total >- age?? 6:???? 186104 bytes,??? 3772272 total >- age?? 7:???? 169832 bytes,??? 3942104 total >- age?? 8:???? 201968 bytes,??? 4144072 total >- age?? 9:???? 183752 bytes,??? 4327824 total >- age? 10:???? 136480 bytes,??? 4464304 total >- age? 11:???? 366208 bytes,??? 4830512 total >- age? 12:???? 137296 bytes,??? 4967808 total >- age? 13:???? 133592 bytes,??? 5101400 total >- age? 14:???? 162232 bytes,??? 5263632 total >- age? 15:???? 139984 bytes,??? 5403616 total >?1124550.652: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 21647, predicted base time: 37.96 ms, remaining time: 162.04 ms, target pause time: 200.00 ms] >?1124550.652: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 95 regions, survivors: 7 regions, predicted young region time: 4.46 ms] >?1124550.652: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 95 regions, survivors: 7 regions, old: 0 regions, predicted pause time: 42.42 ms, target pause time: 200.00 ms] >?1124550.701: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 4064280576 bytes, allocation request: 0 bytes, threshold: 1932735240 bytes (45.00 %), source: end of GC] >?1124550.701: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: candidate old regions available, candidate old regions: 285 regions, reclaimable: 430117688 bytes (10.01 %), threshold: 10.00 %] >, 0.0494015 secs] >?? [Parallel Time: 43.7 ms, GC Workers: 4] >????? [GC Worker Start (ms): Min: 1124550651.8, Avg: 1124550668.7, Max: 1124550674.3, Diff: 22.6] >????? [Ext Root Scanning (ms): Min: 0.1, Avg: 6.7, Max: 22.3, Diff: 22.2, Sum: 26.8] >????? [Update RS (ms): Min: 9.9, Avg: 11.0, Max: 12.3, Diff: 2.5, Sum: 44.0] >???????? [Processed Buffers: Min: 39, Avg: 40.3, Max: 41, Diff: 2, Sum: 161] >????? [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2] >????? [Object Copy (ms): Min: 8.6, Avg: 8.9, Max: 9.6, Diff: 1.0, Sum: 35.6] >????? [Termination (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.3] >????? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.2] >????? [GC Worker Total (ms): Min: 21.1, Avg: 26.8, Max: 43.7, Diff: 22.6, Sum: 107.1] >????? [GC Worker End (ms): Min: 1124550695.4, Avg: 1124550695.5, Max: 1124550695.5, Diff: 0.0] >?? [Code Root Fixup: 0.0 ms] >?? [Clear CT: 0.1 ms] >?? [Other: 5.6 ms] >????? [Choose CSet: 0.0 ms] >????? [Ref Proc: 4.5 ms] >????? [Ref Enq: 0.1 ms] >????? [Free CSet: 0.3 ms] >?? [Eden: 190.0M(190.0M)->0.0B(190.0M) Survivors: 14.0M->14.0M Heap: 4077.0M(4096.0M)->3887.1M(4096.0M)] >?[Times: user=0.11 sys=0.00, real=0.05 secs] >2014-08-06T04:47:45.545-0700: 1124606.686: [GC pause (mixed) >Desired survivor size 13631488 bytes, new threshold 15 (max 15) >- age?? 1:??? 1323232 bytes,??? 1323232 total >- age?? 2:???? 716576 bytes,??? 2039808 total >- age?? 3:??? 1058584 bytes,??? 3098392 total >- age?? 4:???? 225208 bytes,??? 3323600 total >- age?? 5:???? 447688 bytes,??? 3771288 total >- age?? 6:???? 195112 bytes,??? 3966400 total >- age?? 7:???? 178000 bytes,??? 4144400 total >- age?? 8:???? 156904 bytes,??? 4301304 total >- age?? 9:???? 193424 bytes,??? 4494728 total >- age? 10:???? 176272 bytes,??? 4671000 total >- age? 11:???? 134768 bytes,??? 4805768 total >- age? 12:???? 138896 bytes,??? 4944664 total >- age? 13:???? 132272 bytes,??? 5076936 total >- age? 14:???? 132856 bytes,??? 5209792 total >- age? 15:???? 161912 bytes,??? 5371704 total >?1124606.686: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 20335, predicted base time: 38.61 ms, remaining time: 161.39 ms, target pause time: 200.00 ms] >?1124606.686: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 95 regions, survivors: 7 regions, predicted young region time: 4.53 ms] >?1124606.686: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: reclaimable percentage not over threshold, old: 1 regions, max: 205 regions, reclaimable: 428818280 bytes (9.98 %), threshold: 10.00 %] >?1124606.686: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 95 regions, survivors: 7 regions, old: 1 regions, predicted pause time: 45.72 ms, target pause time: 200.00 ms] >?1124606.731: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: region allocation request failed, allocation request: 1048576 bytes] >?1124606.731: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 1048576 bytes, attempted expansion amount: 2097152 bytes] >?1124606.731: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] >?1124606.743: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 4095737856 bytes, allocation request: 0 bytes, threshold: 1932735240 bytes (45.00 %), source: end of GC] >?1124606.743: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 284 regions, reclaimable: 428818280 bytes (9.98 %), threshold: 10.00 %] >?(to-space exhausted), 0.0568178 secs] >?? [Parallel Time: 40.4 ms, GC Workers: 4] >????? [GC Worker Start (ms): Min: 1124606686.1, Avg: 1124606701.7, Max: 1124606723.8, Diff: 37.7] >????? [Ext Root Scanning (ms): Min: 0.1, Avg: 6.3, Max: 16.1, Diff: 16.1, Sum: 25.4] >????? [Update RS (ms): Min: 0.0, Avg: 9.6, Max: 13.3, Diff: 13.3, Sum: 38.6] >???????? [Processed Buffers: Min: 0, Avg: 37.5, Max: 52, Diff: 52, Sum: 150] >????? [Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 0.4, Diff: 0.4, Sum: 1.1] >????? [Object Copy (ms): Min: 2.6, Avg: 8.4, Max: 11.1, Diff: 8.5, Sum: 33.7] >????? [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] >????? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] >????? [GC Worker Total (ms): Min: 2.7, Avg: 24.7, Max: 40.4, Diff: 37.7, Sum: 98.9] >????? [GC Worker End (ms): Min: 1124606726.5, Avg: 1124606726.5, Max: 1124606726.5, Diff: 0.0] >?? [Code Root Fixup: 0.0 ms] >?? [Clear CT: 0.1 ms] >?? [Other: 16.3 ms] >????? [Choose CSet: 0.0 ms] >????? [Ref Proc: 7.7 ms] >????? [Ref Enq: 0.2 ms] >????? [Free CSet: 0.3 ms] >?? [Eden: 190.0M(190.0M)->0.0B(188.0M) Survivors: 14.0M->16.0M Heap: 4077.1M(4096.0M)->3921.6M(4096.0M)] >?[Times: user=0.11 sys=0.00, real=0.06 secs] >2014-08-06T04:49:57.698-0700: 1124739.058: [GC pause (young) >Desired survivor size 13631488 bytes, new threshold 15 (max 15) >- age?? 1:??? 1130192 bytes,??? 1130192 total >- age?? 2:???? 492816 bytes,??? 1623008 total >- age?? 3:???? 675240 bytes,??? 2298248 total >- age?? 4:??? 1038536 bytes,??? 3336784 total >- age?? 5:???? 208048 bytes,??? 3544832 total >- age?? 6:???? 436520 bytes,??? 3981352 total >- age?? 7:???? 184528 bytes,??? 4165880 total >- age?? 8:???? 165376 bytes,??? 4331256 total >- age?? 9:???? 154872 bytes,??? 4486128 total >- age? 10:???? 179016 bytes,??? 4665144 total >- age? 11:???? 167760 bytes,??? 4832904 total >- age? 12:???? 132056 bytes,??? 4964960 total >- age? 13:???? 138736 bytes,??? 5103696 total >- age? 14:???? 132080 bytes,??? 5235776 total >- age? 15:???? 132856 bytes,??? 5368632 total >?1124739.058: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 44501, predicted base time: 51.94 ms, remaining time: 148.06 ms, target pause time: 200.00 ms] >?1124739.058: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 87 regions, survivors: 8 regions, predicted young region time: 4.37 ms] >?1124739.058: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 87 regions, survivors: 8 regions, old: 0 regions, predicted pause time: 56.32 ms, target pause time: 200.00 ms] >?1124739.060: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: region allocation request failed, allocation request: 1048576 bytes] >?1124739.060: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 1048576 bytes, attempted expansion amount: 2097152 bytes] >?1124739.060: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] >?1124739.252: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 4294967296 bytes, allocation request: 0 bytes, threshold: 1932735240 bytes (45.00 %), source: end of GC] >?(to-space exhausted), 0.1936102 secs] >?? [Parallel Time: 146.6 ms, GC Workers: 4] >????? [GC Worker Start (ms): Min: 1124739058.5, Avg: 1124739061.6, Max: 1124739063.0, Diff: 4.4] >????? [Ext Root Scanning (ms): Min: 0.2, Avg: 7.0, Max: 14.3, Diff: 14.0, Sum: 28.2] >????? [Update RS (ms): Min: 4.8, Avg: 10.7, Max: 17.6, Diff: 12.8, Sum: 42.7] >???????? [Processed Buffers: Min: 47, Avg: 56.3, Max: 69, Diff: 22, Sum: 225] >????? [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.2] >????? [Object Copy (ms): Min: 113.1, Avg: 125.6, Max: 137.6, Diff: 24.5, Sum: 502.5] >????? [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.2] >????? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] >????? [GC Worker Total (ms): Min: 142.1, Avg: 143.5, Max: 146.5, Diff: 4.4, Sum: 573.8] >????? [GC Worker End (ms): Min: 1124739205.1, Avg: 1124739205.1, Max: 1124739205.1, Diff: 0.0] >?? [Code Root Fixup: 0.0 ms] >?? [Clear CT: 0.1 ms] >?? [Other: 46.9 ms] >????? [Choose CSet: 0.0 ms] >????? [Ref Proc: 1.0 ms] >????? [Ref Enq: 0.1 ms] >????? [Free CSet: 0.2 ms] >?? [Eden: 174.0M(188.0M)->0.0B(204.0M) Survivors: 16.0M->0.0B Heap: 4095.6M(4096.0M)->4095.6M(4096.0M)] >?[Times: user=0.36 sys=0.00, real=0.19 secs] >?1124739.259: [G1Ergonomics (Concurrent Cycles) initiate concurrent cycle, reason: concurrent cycle initiation requested] >2014-08-06T04:49:57.898-0700: 1124739.259: [GC pause (young) (initial-mark) >Desired survivor size 13631488 bytes, new threshold 15 (max 15) >?1124739.259: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 322560, predicted base time: 205.33 ms, remaining time: 0.00 ms, target pause time: 200.00 ms] >?1124739.259: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 0 regions, survivors: 0 regions, predicted young region time: 0.00 ms] >?1124739.259: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 0 regions, survivors: 0 regions, old: 0 regions, predicted pause time: 205.33 ms, target pause time: 200.00 ms] >, 0.0347198 secs] >?? [Parallel Time: 33.1 ms, GC Workers: 4] >????? [GC Worker Start (ms): Min: 1124739259.3, Avg: 1124739259.3, Max: 1124739259.3, Diff: 0.0] >????? [Ext Root Scanning (ms): Min: 5.5, Avg: 7.7, Max: 11.0, Diff: 5.4, Sum: 30.6] >????? [Update RS (ms): Min: 18.4, Avg: 19.8, Max: 20.6, Diff: 2.2, Sum: 79.4] >???????? [Processed Buffers: Min: 293, Avg: 315.3, Max: 350, Diff: 57, Sum: 1261] >????? [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] >????? [Object Copy (ms): Min: 1.6, Avg: 5.4, Max: 6.9, Diff: 5.3, Sum: 21.7] >????? [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.4] >????? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] >????? [GC Worker Total (ms): Min: 33.0, Avg: 33.0, Max: 33.1, Diff: 0.1, Sum: 132.1] >????? [GC Worker End (ms): Min: 1124739292.3, Avg: 1124739292.3, Max: 1124739292.3, Diff: 0.0] >?? [Code Root Fixup: 0.0 ms] >?? [Clear CT: 0.1 ms] >?? [Other: 1.5 ms] >????? [Choose CSet: 0.0 ms] >????? [Ref Proc: 1.0 ms] >????? [Ref Enq: 0.1 ms] >????? [Free CSet: 0.0 ms] >?? [Eden: 0.0B(204.0M)->0.0B(204.0M) Survivors: 0.0B->0.0B Heap: 4095.6M(4096.0M)->4095.6M(4096.0M)] >?[Times: user=0.12 sys=0.00, real=0.04 secs] >2014-08-06T04:49:57.933-0700: 1124739.294: [GC concurrent-root-region-scan-start] >2014-08-06T04:49:57.933-0700: 1124739.294: [GC concurrent-root-region-scan-end, 0.0000157 secs] >2014-08-06T04:49:57.933-0700: 1124739.294: [GC concurrent-mark-start] >?1124739.295: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 80 bytes] >?1124739.295: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 2097152 bytes, attempted expansion amount: 2097152 bytes] >?1124739.295: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] >2014-08-06T04:49:57.934-0700: 1124739.295: [Full GC 4095M->2235M(4096M), 10.5341003 secs] >?? [Eden: 0.0B(204.0M)->0.0B(1436.0M) Survivors: 0.0B->0.0B Heap: 4095.6M(4096.0M)->2235.4M(4096.0M)] >?[Times: user=13.20 sys=0.03, real=10.52 secs] > > > > > > >_______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.pedot at finkzeit.at Thu Aug 7 11:46:30 2014 From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot) Date: Thu, 07 Aug 2014 13:46:30 +0200 Subject: Seeking help regarding Full GCs with G1 In-Reply-To: <1407377835.75474.YahooMailNeo@web140706.mail.bf1.yahoo.com> References: <1407365331.22707.YahooMailNeo@web140705.mail.bf1.yahoo.com> <53E2C037.1090009@oracle.com> <1407377835.75474.YahooMailNeo@web140706.mail.bf1.yahoo.com> Message-ID: <53E36716.3010106@finkzeit.at> Hi, I think you should also increase InitiatingHeapOccupancyPercent, this marks the level of total heap occupation where G1 starts the concurrent marking phase leading to mixed collects. The default is 45% and since your footprint seems to be above that G1 will probably do lots of "useless" mixed collects all the time. regards Wolfgang Am 07.08.2014 04:17, schrieb Srini Padman: > Ah, I see. I think that does explain the mystery. > > The Oracle documentation only says this parameter "sets the occupancy > threshold for an old region to be included in a mixed garbage > collection cycle. The default occupancy is 65 percent." and I > interpreted that as meaning a lower threshold rather than an upper > threshold. It should have been obvious to me that the "Garbage First' > collector, which aims to collect the "least live data first", will aim > for an upper threshold, but yet I settled on exactly the wrong > interpretation. Thanks for setting me right! > > Knowing this, I now think that the default 10% G1HeapWastePercent > value (which computes to 400 MB for us) and the application's > footprint (~ 2.2 GB) *together* put our application at exactly the > default occupancy threshold of 65% (2.6 GB). So we would have been > operating in a configuration where no mixed GCs would ever be seen to > be required. (Please correct me if that calculation is wrong.) > > If we decrease G1HeapWastePercent to 5% and increase > G1MixedGCLiveThresholdPercent to 75%, then the corresponding numbers > would be (2.2 GB footprint + 200 MB waste < 3 GB) so we will be doing > mixed GCs from 2.4 GB until we (roughly) hit 3 GB after which the > chances of a Full GC increase again. So the hope in this case would be > that enough heap is collected to never let the usage go above 3 GB > (assuming, of course, that that means all regions are at equal > occupancy of 75%). Am I understanding this correctly? > > Also, what is the additional (performance) cost incurred because of > setting the G1MixedGCLiveThresholdPercent value to 75%? > > Thanks once again - due to my misunderstanding, I was really not sure > what to do next! > > Regards, > Srini. > > > On Wednesday, August 6, 2014 7:54 PM, Yu Zhang > wrote: > > > Srini, > > About -XX:G1MixedGCLiveThresholdPercent=65(default) > If the region's live data is above 65%, this region will not be > considered as a candidate. > > In your case, you need to increase -XX:G1MixedGCLiveThresholdPercent. > Thanks, > Jenny > On 8/6/2014 3:48 PM, Srini Padman wrote: >> Hello, >> >> I am currently evaluating the use of the G1 Collector for our >> application, to combat the fragmentation issues we ran into while >> using the CMS collector (several cases of failed promotions, followed >> by *really* long pauses). However, I am also having trouble with >> tuning the G1 collector, and am seeing behavior that I can't fully >> understand. I will appreciate any help/insight that you guys can offer. >> >> What I find puzzling from looking at the G1 GC logs from our tests is >> that the concurrent marking phase does not really seem to identify >> many old regions to clean up at all, and the heap usage keeps >> growing. At some point, there is no further room to expand ("heap >> expansion operation failed") and this is followed by a Full GC that >> lasts about 10 seconds. But the Full GC actually brings the memory >> down by almost 50%, from 4095M to 2235M. >> >> If the Full GC can collect this much of the heap, I don't fully >> understand why the concurrent mark phase does not identify these >> (old?) regions for (mixed?) collection subsequently. >> >> On the assumption that we should let the GC ergonomics do its thing >> freely, I initially did not set any parameter other than -Xmx, -Xms, >> and the PermGen sizes. I added the G1HeapRegionSize and >> G1MixedGCLiveThresholdPercent settings (see below) because, when I >> saw the Full GCs with the default settings, I wondered whether we >> might be getting into a situation where all (or most?) regions are >> roughly 65% live so the concurrent marking phase does not identify >> them for collection but a subsequent Full GC is able to. That is, I >> wondered whether our application's heap footprint being 65% of the >> max heap led to these full GCs coincidentally (since >> G1MixedGCLiveThresholdPercent is 65% by default). But I don't know >> why the same thing happens when I set G1MixedGCLiveThresholdPercent >> down to 40% - even if all regions are 40% full, we will only be at >> about 1.6 GB, and that is far below what I think our heap footprint >> is in the long run (2.2 GB). So I don't understand how to ensure that >> old regions are cleaned up regularly so a Full GC is not required. >> >> GC Settings in use: >> >> -server -Xms4096m -Xmx4096m -Xss512k -XX:PermSize=128m >> -XX:MaxPermSize=128m -XX:+UseG1GC -XX:+ParallelRefProcEnabled >> -XX:+DisableExplicitGC -verbose:gc -XX:+PrintAdaptiveSizePolicy >> -XX:+UnlockExperimentalVMOptions -XX:G1HeapRegionSize=2m >> -XX:G1MixedGCLiveThresholdPercent=40 >> >> This is using JRE 1.7.0_55. >> >> I am including a short(ish) GC log snippet for the time leading up to >> the Full GC. I can send the full GC log (about 8 MB, zipped) if >> necessary. >> >> Any help will be greatly appreciated! >> >> Regards, >> Srini. >> >> --------------------------- >> >> 2014-08-06T04:46:00.067-0700: 1124501.033: [GC >> concurrent-root-region-scan-start] >> 2014-08-06T04:46:00.081-0700: 1124501.047: [GC >> concurrent-root-region-scan-end, 0.0139487 secs] >> 2014-08-06T04:46:00.081-0700: 1124501.047: [GC concurrent-mark-start] >> 2014-08-06T04:46:10.531-0700: 1124511.514: [GC concurrent-mark-end, >> 10.4675249 secs] >> 2014-08-06T04:46:10.532-0700: 1124511.515: [GC remark >> 2014-08-06T04:46:10.532-0700: 1124511.516: [GC ref-proc, 0.0018819 >> secs], 0.0225253 secs] >> [Times: user=0.01 sys=0.00, real=0.02 secs] >> 2014-08-06T04:46:10.555-0700: 1124511.539: [GC cleanup >> 3922M->3922M(4096M), 0.0098209 secs] >> [Times: user=0.03 sys=0.03, real=0.01 secs] >> 2014-08-06T04:46:49.603-0700: 1124550.652: [GC pause (young) >> Desired survivor size 13631488 bytes, new threshold 15 (max 15) >> - age 1: 1531592 bytes, 1531592 total >> - age 2: 1087648 bytes, 2619240 total >> - age 3: 259480 bytes, 2878720 total >> - age 4: 493976 bytes, 3372696 total >> - age 5: 213472 bytes, 3586168 total >> - age 6: 186104 bytes, 3772272 total >> - age 7: 169832 bytes, 3942104 total >> - age 8: 201968 bytes, 4144072 total >> - age 9: 183752 bytes, 4327824 total >> - age 10: 136480 bytes, 4464304 total >> - age 11: 366208 bytes, 4830512 total >> - age 12: 137296 bytes, 4967808 total >> - age 13: 133592 bytes, 5101400 total >> - age 14: 162232 bytes, 5263632 total >> - age 15: 139984 bytes, 5403616 total >> 1124550.652: [G1Ergonomics (CSet Construction) start choosing CSet, >> _pending_cards: 21647, predicted base time: 37.96 ms, remaining time: >> 162.04 ms, target pause time: 200.00 ms] >> 1124550.652: [G1Ergonomics (CSet Construction) add young regions to >> CSet, eden: 95 regions, survivors: 7 regions, predicted young region >> time: 4.46 ms] >> 1124550.652: [G1Ergonomics (CSet Construction) finish choosing CSet, >> eden: 95 regions, survivors: 7 regions, old: 0 regions, predicted >> pause time: 42.42 ms, target pause time: 200.00 ms] >> 1124550.701: [G1Ergonomics (Concurrent Cycles) do not request >> concurrent cycle initiation, reason: still doing mixed collections, >> occupancy: 4064280576 bytes, allocation request: 0 bytes, threshold: >> 1932735240 bytes (45.00 %), source: end of GC] >> 1124550.701: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: >> candidate old regions available, candidate old regions: 285 regions, >> reclaimable: 430117688 bytes (10.01 %), threshold: 10.00 %] >> , 0.0494015 secs] >> [Parallel Time: 43.7 ms, GC Workers: 4] >> [GC Worker Start (ms): Min: 1124550651.8, Avg: 1124550668.7, >> Max: 1124550674.3, Diff: 22.6] >> [Ext Root Scanning (ms): Min: 0.1, Avg: 6.7, Max: 22.3, Diff: >> 22.2, Sum: 26.8] >> [Update RS (ms): Min: 9.9, Avg: 11.0, Max: 12.3, Diff: 2.5, >> Sum: 44.0] >> [Processed Buffers: Min: 39, Avg: 40.3, Max: 41, Diff: 2, >> Sum: 161] >> [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2] >> [Object Copy (ms): Min: 8.6, Avg: 8.9, Max: 9.6, Diff: 1.0, >> Sum: 35.6] >> [Termination (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.0, >> Sum: 0.3] >> [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, >> Sum: 0.2] >> [GC Worker Total (ms): Min: 21.1, Avg: 26.8, Max: 43.7, Diff: >> 22.6, Sum: 107.1] >> [GC Worker End (ms): Min: 1124550695.4, Avg: 1124550695.5, Max: >> 1124550695.5, Diff: 0.0] >> [Code Root Fixup: 0.0 ms] >> [Clear CT: 0.1 ms] >> [Other: 5.6 ms] >> [Choose CSet: 0.0 ms] >> [Ref Proc: 4.5 ms] >> [Ref Enq: 0.1 ms] >> [Free CSet: 0.3 ms] >> [Eden: 190.0M(190.0M)->0.0B(190.0M) Survivors: 14.0M->14.0M Heap: >> 4077.0M(4096.0M)->3887.1M(4096.0M)] >> [Times: user=0.11 sys=0.00, real=0.05 secs] >> 2014-08-06T04:47:45.545-0700: 1124606.686: [GC pause (mixed) >> Desired survivor size 13631488 bytes, new threshold 15 (max 15) >> - age 1: 1323232 bytes, 1323232 total >> - age 2: 716576 bytes, 2039808 total >> - age 3: 1058584 bytes, 3098392 total >> - age 4: 225208 bytes, 3323600 total >> - age 5: 447688 bytes, 3771288 total >> - age 6: 195112 bytes, 3966400 total >> - age 7: 178000 bytes, 4144400 total >> - age 8: 156904 bytes, 4301304 total >> - age 9: 193424 bytes, 4494728 total >> - age 10: 176272 bytes, 4671000 total >> - age 11: 134768 bytes, 4805768 total >> - age 12: 138896 bytes, 4944664 total >> - age 13: 132272 bytes, 5076936 total >> - age 14: 132856 bytes, 5209792 total >> - age 15: 161912 bytes, 5371704 total >> 1124606.686: [G1Ergonomics (CSet Construction) start choosing CSet, >> _pending_cards: 20335, predicted base time: 38.61 ms, remaining time: >> 161.39 ms, target pause time: 200.00 ms] >> 1124606.686: [G1Ergonomics (CSet Construction) add young regions to >> CSet, eden: 95 regions, survivors: 7 regions, predicted young region >> time: 4.53 ms] >> 1124606.686: [G1Ergonomics (CSet Construction) finish adding old >> regions to CSet, reason: reclaimable percentage not over threshold, >> old: 1 regions, max: 205 regions, reclaimable: 428818280 bytes (9.98 >> %), threshold: 10.00 %] >> 1124606.686: [G1Ergonomics (CSet Construction) finish choosing CSet, >> eden: 95 regions, survivors: 7 regions, old: 1 regions, predicted >> pause time: 45.72 ms, target pause time: 200.00 ms] >> 1124606.731: [G1Ergonomics (Heap Sizing) attempt heap expansion, >> reason: region allocation request failed, allocation request: 1048576 >> bytes] >> 1124606.731: [G1Ergonomics (Heap Sizing) expand the heap, requested >> expansion amount: 1048576 bytes, attempted expansion amount: 2097152 >> bytes] >> 1124606.731: [G1Ergonomics (Heap Sizing) did not expand the heap, >> reason: heap expansion operation failed] >> 1124606.743: [G1Ergonomics (Concurrent Cycles) do not request >> concurrent cycle initiation, reason: still doing mixed collections, >> occupancy: 4095737856 bytes, allocation request: 0 bytes, threshold: >> 1932735240 bytes (45.00 %), source: end of GC] >> 1124606.743: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, >> reason: reclaimable percentage not over threshold, candidate old >> regions: 284 regions, reclaimable: 428818280 bytes (9.98 %), >> threshold: 10.00 %] >> (to-space exhausted), 0.0568178 secs] >> [Parallel Time: 40.4 ms, GC Workers: 4] >> [GC Worker Start (ms): Min: 1124606686.1, Avg: 1124606701.7, >> Max: 1124606723.8, Diff: 37.7] >> [Ext Root Scanning (ms): Min: 0.1, Avg: 6.3, Max: 16.1, Diff: >> 16.1, Sum: 25.4] >> [Update RS (ms): Min: 0.0, Avg: 9.6, Max: 13.3, Diff: 13.3, >> Sum: 38.6] >> [Processed Buffers: Min: 0, Avg: 37.5, Max: 52, Diff: 52, >> Sum: 150] >> [Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 0.4, Diff: 0.4, Sum: 1.1] >> [Object Copy (ms): Min: 2.6, Avg: 8.4, Max: 11.1, Diff: 8.5, >> Sum: 33.7] >> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, >> Sum: 0.0] >> [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, >> Sum: 0.1] >> [GC Worker Total (ms): Min: 2.7, Avg: 24.7, Max: 40.4, Diff: >> 37.7, Sum: 98.9] >> [GC Worker End (ms): Min: 1124606726.5, Avg: 1124606726.5, Max: >> 1124606726.5, Diff: 0.0] >> [Code Root Fixup: 0.0 ms] >> [Clear CT: 0.1 ms] >> [Other: 16.3 ms] >> [Choose CSet: 0.0 ms] >> [Ref Proc: 7.7 ms] >> [Ref Enq: 0.2 ms] >> [Free CSet: 0.3 ms] >> [Eden: 190.0M(190.0M)->0.0B(188.0M) Survivors: 14.0M->16.0M Heap: >> 4077.1M(4096.0M)->3921.6M(4096.0M)] >> [Times: user=0.11 sys=0.00, real=0.06 secs] >> 2014-08-06T04:49:57.698-0700: 1124739.058: [GC pause (young) >> Desired survivor size 13631488 bytes, new threshold 15 (max 15) >> - age 1: 1130192 bytes, 1130192 total >> - age 2: 492816 bytes, 1623008 total >> - age 3: 675240 bytes, 2298248 total >> - age 4: 1038536 bytes, 3336784 total >> - age 5: 208048 bytes, 3544832 total >> - age 6: 436520 bytes, 3981352 total >> - age 7: 184528 bytes, 4165880 total >> - age 8: 165376 bytes, 4331256 total >> - age 9: 154872 bytes, 4486128 total >> - age 10: 179016 bytes, 4665144 total >> - age 11: 167760 bytes, 4832904 total >> - age 12: 132056 bytes, 4964960 total >> - age 13: 138736 bytes, 5103696 total >> - age 14: 132080 bytes, 5235776 total >> - age 15: 132856 bytes, 5368632 total >> 1124739.058: [G1Ergonomics (CSet Construction) start choosing CSet, >> _pending_cards: 44501, predicted base time: 51.94 ms, remaining time: >> 148.06 ms, target pause time: 200.00 ms] >> 1124739.058: [G1Ergonomics (CSet Construction) add young regions to >> CSet, eden: 87 regions, survivors: 8 regions, predicted young region >> time: 4.37 ms] >> 1124739.058: [G1Ergonomics (CSet Construction) finish choosing CSet, >> eden: 87 regions, survivors: 8 regions, old: 0 regions, predicted >> pause time: 56.32 ms, target pause time: 200.00 ms] >> 1124739.060: [G1Ergonomics (Heap Sizing) attempt heap expansion, >> reason: region allocation request failed, allocation request: 1048576 >> bytes] >> 1124739.060: [G1Ergonomics (Heap Sizing) expand the heap, requested >> expansion amount: 1048576 bytes, attempted expansion amount: 2097152 >> bytes] >> 1124739.060: [G1Ergonomics (Heap Sizing) did not expand the heap, >> reason: heap expansion operation failed] >> 1124739.252: [G1Ergonomics (Concurrent Cycles) request concurrent >> cycle initiation, reason: occupancy higher than threshold, occupancy: >> 4294967296 bytes, allocation request: 0 bytes, threshold: 1932735240 >> bytes (45.00 %), source: end of GC] >> (to-space exhausted), 0.1936102 secs] >> [Parallel Time: 146.6 ms, GC Workers: 4] >> [GC Worker Start (ms): Min: 1124739058.5, Avg: 1124739061.6, >> Max: 1124739063.0, Diff: 4.4] >> [Ext Root Scanning (ms): Min: 0.2, Avg: 7.0, Max: 14.3, Diff: >> 14.0, Sum: 28.2] >> [Update RS (ms): Min: 4.8, Avg: 10.7, Max: 17.6, Diff: 12.8, >> Sum: 42.7] >> [Processed Buffers: Min: 47, Avg: 56.3, Max: 69, Diff: 22, >> Sum: 225] >> [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.2] >> [Object Copy (ms): Min: 113.1, Avg: 125.6, Max: 137.6, Diff: >> 24.5, Sum: 502.5] >> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, >> Sum: 0.2] >> [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, >> Sum: 0.1] >> [GC Worker Total (ms): Min: 142.1, Avg: 143.5, Max: 146.5, >> Diff: 4.4, Sum: 573.8] >> [GC Worker End (ms): Min: 1124739205.1, Avg: 1124739205.1, Max: >> 1124739205.1, Diff: 0.0] >> [Code Root Fixup: 0.0 ms] >> [Clear CT: 0.1 ms] >> [Other: 46.9 ms] >> [Choose CSet: 0.0 ms] >> [Ref Proc: 1.0 ms] >> [Ref Enq: 0.1 ms] >> [Free CSet: 0.2 ms] >> [Eden: 174.0M(188.0M)->0.0B(204.0M) Survivors: 16.0M->0.0B Heap: >> 4095.6M(4096.0M)->4095.6M(4096.0M)] >> [Times: user=0.36 sys=0.00, real=0.19 secs] >> 1124739.259: [G1Ergonomics (Concurrent Cycles) initiate concurrent >> cycle, reason: concurrent cycle initiation requested] >> 2014-08-06T04:49:57.898-0700: 1124739.259: [GC pause (young) >> (initial-mark) >> Desired survivor size 13631488 bytes, new threshold 15 (max 15) >> 1124739.259: [G1Ergonomics (CSet Construction) start choosing CSet, >> _pending_cards: 322560, predicted base time: 205.33 ms, remaining >> time: 0.00 ms, target pause time: 200.00 ms] >> 1124739.259: [G1Ergonomics (CSet Construction) add young regions to >> CSet, eden: 0 regions, survivors: 0 regions, predicted young region >> time: 0.00 ms] >> 1124739.259: [G1Ergonomics (CSet Construction) finish choosing CSet, >> eden: 0 regions, survivors: 0 regions, old: 0 regions, predicted >> pause time: 205.33 ms, target pause time: 200.00 ms] >> , 0.0347198 secs] >> [Parallel Time: 33.1 ms, GC Workers: 4] >> [GC Worker Start (ms): Min: 1124739259.3, Avg: 1124739259.3, >> Max: 1124739259.3, Diff: 0.0] >> [Ext Root Scanning (ms): Min: 5.5, Avg: 7.7, Max: 11.0, Diff: >> 5.4, Sum: 30.6] >> [Update RS (ms): Min: 18.4, Avg: 19.8, Max: 20.6, Diff: 2.2, >> Sum: 79.4] >> [Processed Buffers: Min: 293, Avg: 315.3, Max: 350, Diff: >> 57, Sum: 1261] >> [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] >> [Object Copy (ms): Min: 1.6, Avg: 5.4, Max: 6.9, Diff: 5.3, >> Sum: 21.7] >> [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, >> Sum: 0.4] >> [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, >> Sum: 0.1] >> [GC Worker Total (ms): Min: 33.0, Avg: 33.0, Max: 33.1, Diff: >> 0.1, Sum: 132.1] >> [GC Worker End (ms): Min: 1124739292.3, Avg: 1124739292.3, Max: >> 1124739292.3, Diff: 0.0] >> [Code Root Fixup: 0.0 ms] >> [Clear CT: 0.1 ms] >> [Other: 1.5 ms] >> [Choose CSet: 0.0 ms] >> [Ref Proc: 1.0 ms] >> [Ref Enq: 0.1 ms] >> [Free CSet: 0.0 ms] >> [Eden: 0.0B(204.0M)->0.0B(204.0M) Survivors: 0.0B->0.0B Heap: >> 4095.6M(4096.0M)->4095.6M(4096.0M)] >> [Times: user=0.12 sys=0.00, real=0.04 secs] >> 2014-08-06T04:49:57.933-0700: 1124739.294: [GC >> concurrent-root-region-scan-start] >> 2014-08-06T04:49:57.933-0700: 1124739.294: [GC >> concurrent-root-region-scan-end, 0.0000157 secs] >> 2014-08-06T04:49:57.933-0700: 1124739.294: [GC concurrent-mark-start] >> 1124739.295: [G1Ergonomics (Heap Sizing) attempt heap expansion, >> reason: allocation request failed, allocation request: 80 bytes] >> 1124739.295: [G1Ergonomics (Heap Sizing) expand the heap, requested >> expansion amount: 2097152 bytes, attempted expansion amount: 2097152 >> bytes] >> 1124739.295: [G1Ergonomics (Heap Sizing) did not expand the heap, >> reason: heap expansion operation failed] >> 2014-08-06T04:49:57.934-0700: 1124739.295: [Full GC >> 4095M->2235M(4096M), 10.5341003 secs] >> [Eden: 0.0B(204.0M)->0.0B(1436.0M) Survivors: 0.0B->0.0B Heap: >> 4095.6M(4096.0M)->2235.4M(4096.0M)] >> [Times: user=13.20 sys=0.03, real=10.52 secs] >> >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From wolfgang.pedot at finkzeit.at Thu Aug 7 13:01:55 2014 From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot) Date: Thu, 07 Aug 2014 15:01:55 +0200 Subject: Seeking help regarding Full GCs with G1 In-Reply-To: <1407365331.22707.YahooMailNeo@web140705.mail.bf1.yahoo.com> References: <1407365331.22707.YahooMailNeo@web140705.mail.bf1.yahoo.com> Message-ID: <53E378C3.6040503@finkzeit.at> Hi again, it might also help to to look at how the regions are occupied. G1PrintRegionLivenessInfo will print the regions during the marking-phase so you can see how many are OLD or possibly HUMS and how they are occupied. This information has helped me quite a bit while tweaking G1 and our application for optimal performance. regards Wolfgang Am 07.08.2014 00:48, schrieb Srini Padman: > Hello, > > I am currently evaluating the use of the G1 Collector for our > application, to combat the fragmentation issues we ran into while > using the CMS collector (several cases of failed promotions, followed > by *really* long pauses). However, I am also having trouble with > tuning the G1 collector, and am seeing behavior that I can't fully > understand. I will appreciate any help/insight that you guys can offer. > > What I find puzzling from looking at the G1 GC logs from our tests is > that the concurrent marking phase does not really seem to identify > many old regions to clean up at all, and the heap usage keeps growing. > At some point, there is no further room to expand ("heap expansion > operation failed") and this is followed by a Full GC that lasts about > 10 seconds. But the Full GC actually brings the memory down by almost > 50%, from 4095M to 2235M. > > If the Full GC can collect this much of the heap, I don't fully > understand why the concurrent mark phase does not identify these > (old?) regions for (mixed?) collection subsequently. > > On the assumption that we should let the GC ergonomics do its thing > freely, I initially did not set any parameter other than -Xmx, -Xms, > and the PermGen sizes. I added the G1HeapRegionSize and > G1MixedGCLiveThresholdPercent settings (see below) because, when I saw > the Full GCs with the default settings, I wondered whether we might be > getting into a situation where all (or most?) regions are roughly 65% > live so the concurrent marking phase does not identify them for > collection but a subsequent Full GC is able to. That is, I wondered > whether our application's heap footprint being 65% of the max heap led > to these full GCs coincidentally (since G1MixedGCLiveThresholdPercent > is 65% by default). But I don't know why the same thing happens when I > set G1MixedGCLiveThresholdPercent down to 40% - even if all regions > are 40% full, we will only be at about 1.6 GB, and that is far below > what I think our heap footprint is in the long run (2.2 GB). So I > don't understand how to ensure that old regions are cleaned up > regularly so a Full GC is not required. > > GC Settings in use: > > -server -Xms4096m -Xmx4096m -Xss512k -XX:PermSize=128m > -XX:MaxPermSize=128m -XX:+UseG1GC -XX:+ParallelRefProcEnabled > -XX:+DisableExplicitGC -verbose:gc -XX:+PrintAdaptiveSizePolicy > -XX:+UnlockExperimentalVMOptions -XX:G1HeapRegionSize=2m > -XX:G1MixedGCLiveThresholdPercent=40 > > This is using JRE 1.7.0_55. > > I am including a short(ish) GC log snippet for the time leading up to > the Full GC. I can send the full GC log (about 8 MB, zipped) if necessary. > > Any help will be greatly appreciated! > > Regards, > Srini. > > --------------------------- > > 2014-08-06T04:46:00.067-0700: 1124501.033: [GC > concurrent-root-region-scan-start] > 2014-08-06T04:46:00.081-0700: 1124501.047: [GC > concurrent-root-region-scan-end, 0.0139487 secs] > 2014-08-06T04:46:00.081-0700: 1124501.047: [GC concurrent-mark-start] > 2014-08-06T04:46:10.531-0700: 1124511.514: [GC concurrent-mark-end, > 10.4675249 secs] > 2014-08-06T04:46:10.532-0700: 1124511.515: [GC remark > 2014-08-06T04:46:10.532-0700: 1124511.516: [GC ref-proc, 0.0018819 > secs], 0.0225253 secs] > [Times: user=0.01 sys=0.00, real=0.02 secs] > 2014-08-06T04:46:10.555-0700: 1124511.539: [GC cleanup > 3922M->3922M(4096M), 0.0098209 secs] > [Times: user=0.03 sys=0.03, real=0.01 secs] > 2014-08-06T04:46:49.603-0700: 1124550.652: [GC pause (young) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > - age 1: 1531592 bytes, 1531592 total > - age 2: 1087648 bytes, 2619240 total > - age 3: 259480 bytes, 2878720 total > - age 4: 493976 bytes, 3372696 total > - age 5: 213472 bytes, 3586168 total > - age 6: 186104 bytes, 3772272 total > - age 7: 169832 bytes, 3942104 total > - age 8: 201968 bytes, 4144072 total > - age 9: 183752 bytes, 4327824 total > - age 10: 136480 bytes, 4464304 total > - age 11: 366208 bytes, 4830512 total > - age 12: 137296 bytes, 4967808 total > - age 13: 133592 bytes, 5101400 total > - age 14: 162232 bytes, 5263632 total > - age 15: 139984 bytes, 5403616 total > 1124550.652: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 21647, predicted base time: 37.96 ms, remaining time: > 162.04 ms, target pause time: 200.00 ms] > 1124550.652: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 95 regions, survivors: 7 regions, predicted young region > time: 4.46 ms] > 1124550.652: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 95 regions, survivors: 7 regions, old: 0 regions, predicted > pause time: 42.42 ms, target pause time: 200.00 ms] > 1124550.701: [G1Ergonomics (Concurrent Cycles) do not request > concurrent cycle initiation, reason: still doing mixed collections, > occupancy: 4064280576 bytes, allocation request: 0 bytes, threshold: > 1932735240 bytes (45.00 %), source: end of GC] > 1124550.701: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: > candidate old regions available, candidate old regions: 285 regions, > reclaimable: 430117688 bytes (10.01 %), threshold: 10.00 %] > , 0.0494015 secs] > [Parallel Time: 43.7 ms, GC Workers: 4] > [GC Worker Start (ms): Min: 1124550651.8, Avg: 1124550668.7, > Max: 1124550674.3, Diff: 22.6] > [Ext Root Scanning (ms): Min: 0.1, Avg: 6.7, Max: 22.3, Diff: > 22.2, Sum: 26.8] > [Update RS (ms): Min: 9.9, Avg: 11.0, Max: 12.3, Diff: 2.5, Sum: > 44.0] > [Processed Buffers: Min: 39, Avg: 40.3, Max: 41, Diff: 2, > Sum: 161] > [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2] > [Object Copy (ms): Min: 8.6, Avg: 8.9, Max: 9.6, Diff: 1.0, Sum: > 35.6] > [Termination (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: > 0.3] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, > Sum: 0.2] > [GC Worker Total (ms): Min: 21.1, Avg: 26.8, Max: 43.7, Diff: > 22.6, Sum: 107.1] > [GC Worker End (ms): Min: 1124550695.4, Avg: 1124550695.5, Max: > 1124550695.5, Diff: 0.0] > [Code Root Fixup: 0.0 ms] > [Clear CT: 0.1 ms] > [Other: 5.6 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 4.5 ms] > [Ref Enq: 0.1 ms] > [Free CSet: 0.3 ms] > [Eden: 190.0M(190.0M)->0.0B(190.0M) Survivors: 14.0M->14.0M Heap: > 4077.0M(4096.0M)->3887.1M(4096.0M)] > [Times: user=0.11 sys=0.00, real=0.05 secs] > 2014-08-06T04:47:45.545-0700: 1124606.686: [GC pause (mixed) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > - age 1: 1323232 bytes, 1323232 total > - age 2: 716576 bytes, 2039808 total > - age 3: 1058584 bytes, 3098392 total > - age 4: 225208 bytes, 3323600 total > - age 5: 447688 bytes, 3771288 total > - age 6: 195112 bytes, 3966400 total > - age 7: 178000 bytes, 4144400 total > - age 8: 156904 bytes, 4301304 total > - age 9: 193424 bytes, 4494728 total > - age 10: 176272 bytes, 4671000 total > - age 11: 134768 bytes, 4805768 total > - age 12: 138896 bytes, 4944664 total > - age 13: 132272 bytes, 5076936 total > - age 14: 132856 bytes, 5209792 total > - age 15: 161912 bytes, 5371704 total > 1124606.686: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 20335, predicted base time: 38.61 ms, remaining time: > 161.39 ms, target pause time: 200.00 ms] > 1124606.686: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 95 regions, survivors: 7 regions, predicted young region > time: 4.53 ms] > 1124606.686: [G1Ergonomics (CSet Construction) finish adding old > regions to CSet, reason: reclaimable percentage not over threshold, > old: 1 regions, max: 205 regions, reclaimable: 428818280 bytes (9.98 > %), threshold: 10.00 %] > 1124606.686: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 95 regions, survivors: 7 regions, old: 1 regions, predicted > pause time: 45.72 ms, target pause time: 200.00 ms] > 1124606.731: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: region allocation request failed, allocation request: 1048576 > bytes] > 1124606.731: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 1048576 bytes, attempted expansion amount: 2097152 > bytes] > 1124606.731: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > 1124606.743: [G1Ergonomics (Concurrent Cycles) do not request > concurrent cycle initiation, reason: still doing mixed collections, > occupancy: 4095737856 bytes, allocation request: 0 bytes, threshold: > 1932735240 bytes (45.00 %), source: end of GC] > 1124606.743: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, > reason: reclaimable percentage not over threshold, candidate old > regions: 284 regions, reclaimable: 428818280 bytes (9.98 %), > threshold: 10.00 %] > (to-space exhausted), 0.0568178 secs] > [Parallel Time: 40.4 ms, GC Workers: 4] > [GC Worker Start (ms): Min: 1124606686.1, Avg: 1124606701.7, > Max: 1124606723.8, Diff: 37.7] > [Ext Root Scanning (ms): Min: 0.1, Avg: 6.3, Max: 16.1, Diff: > 16.1, Sum: 25.4] > [Update RS (ms): Min: 0.0, Avg: 9.6, Max: 13.3, Diff: 13.3, Sum: > 38.6] > [Processed Buffers: Min: 0, Avg: 37.5, Max: 52, Diff: 52, > Sum: 150] > [Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 0.4, Diff: 0.4, Sum: 1.1] > [Object Copy (ms): Min: 2.6, Avg: 8.4, Max: 11.1, Diff: 8.5, > Sum: 33.7] > [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: > 0.0] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > Sum: 0.1] > [GC Worker Total (ms): Min: 2.7, Avg: 24.7, Max: 40.4, Diff: > 37.7, Sum: 98.9] > [GC Worker End (ms): Min: 1124606726.5, Avg: 1124606726.5, Max: > 1124606726.5, Diff: 0.0] > [Code Root Fixup: 0.0 ms] > [Clear CT: 0.1 ms] > [Other: 16.3 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 7.7 ms] > [Ref Enq: 0.2 ms] > [Free CSet: 0.3 ms] > [Eden: 190.0M(190.0M)->0.0B(188.0M) Survivors: 14.0M->16.0M Heap: > 4077.1M(4096.0M)->3921.6M(4096.0M)] > [Times: user=0.11 sys=0.00, real=0.06 secs] > 2014-08-06T04:49:57.698-0700: 1124739.058: [GC pause (young) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > - age 1: 1130192 bytes, 1130192 total > - age 2: 492816 bytes, 1623008 total > - age 3: 675240 bytes, 2298248 total > - age 4: 1038536 bytes, 3336784 total > - age 5: 208048 bytes, 3544832 total > - age 6: 436520 bytes, 3981352 total > - age 7: 184528 bytes, 4165880 total > - age 8: 165376 bytes, 4331256 total > - age 9: 154872 bytes, 4486128 total > - age 10: 179016 bytes, 4665144 total > - age 11: 167760 bytes, 4832904 total > - age 12: 132056 bytes, 4964960 total > - age 13: 138736 bytes, 5103696 total > - age 14: 132080 bytes, 5235776 total > - age 15: 132856 bytes, 5368632 total > 1124739.058: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 44501, predicted base time: 51.94 ms, remaining time: > 148.06 ms, target pause time: 200.00 ms] > 1124739.058: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 87 regions, survivors: 8 regions, predicted young region > time: 4.37 ms] > 1124739.058: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 87 regions, survivors: 8 regions, old: 0 regions, predicted > pause time: 56.32 ms, target pause time: 200.00 ms] > 1124739.060: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: region allocation request failed, allocation request: 1048576 > bytes] > 1124739.060: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 1048576 bytes, attempted expansion amount: 2097152 > bytes] > 1124739.060: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > 1124739.252: [G1Ergonomics (Concurrent Cycles) request concurrent > cycle initiation, reason: occupancy higher than threshold, occupancy: > 4294967296 bytes, allocation request: 0 bytes, threshold: 1932735240 > bytes (45.00 %), source: end of GC] > (to-space exhausted), 0.1936102 secs] > [Parallel Time: 146.6 ms, GC Workers: 4] > [GC Worker Start (ms): Min: 1124739058.5, Avg: 1124739061.6, > Max: 1124739063.0, Diff: 4.4] > [Ext Root Scanning (ms): Min: 0.2, Avg: 7.0, Max: 14.3, Diff: > 14.0, Sum: 28.2] > [Update RS (ms): Min: 4.8, Avg: 10.7, Max: 17.6, Diff: 12.8, > Sum: 42.7] > [Processed Buffers: Min: 47, Avg: 56.3, Max: 69, Diff: 22, > Sum: 225] > [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.2] > [Object Copy (ms): Min: 113.1, Avg: 125.6, Max: 137.6, Diff: > 24.5, Sum: 502.5] > [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: > 0.2] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > Sum: 0.1] > [GC Worker Total (ms): Min: 142.1, Avg: 143.5, Max: 146.5, Diff: > 4.4, Sum: 573.8] > [GC Worker End (ms): Min: 1124739205.1, Avg: 1124739205.1, Max: > 1124739205.1, Diff: 0.0] > [Code Root Fixup: 0.0 ms] > [Clear CT: 0.1 ms] > [Other: 46.9 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 1.0 ms] > [Ref Enq: 0.1 ms] > [Free CSet: 0.2 ms] > [Eden: 174.0M(188.0M)->0.0B(204.0M) Survivors: 16.0M->0.0B Heap: > 4095.6M(4096.0M)->4095.6M(4096.0M)] > [Times: user=0.36 sys=0.00, real=0.19 secs] > 1124739.259: [G1Ergonomics (Concurrent Cycles) initiate concurrent > cycle, reason: concurrent cycle initiation requested] > 2014-08-06T04:49:57.898-0700: 1124739.259: [GC pause (young) > (initial-mark) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > 1124739.259: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 322560, predicted base time: 205.33 ms, remaining > time: 0.00 ms, target pause time: 200.00 ms] > 1124739.259: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 0 regions, survivors: 0 regions, predicted young region > time: 0.00 ms] > 1124739.259: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 0 regions, survivors: 0 regions, old: 0 regions, predicted pause > time: 205.33 ms, target pause time: 200.00 ms] > , 0.0347198 secs] > [Parallel Time: 33.1 ms, GC Workers: 4] > [GC Worker Start (ms): Min: 1124739259.3, Avg: 1124739259.3, > Max: 1124739259.3, Diff: 0.0] > [Ext Root Scanning (ms): Min: 5.5, Avg: 7.7, Max: 11.0, Diff: > 5.4, Sum: 30.6] > [Update RS (ms): Min: 18.4, Avg: 19.8, Max: 20.6, Diff: 2.2, > Sum: 79.4] > [Processed Buffers: Min: 293, Avg: 315.3, Max: 350, Diff: 57, > Sum: 1261] > [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] > [Object Copy (ms): Min: 1.6, Avg: 5.4, Max: 6.9, Diff: 5.3, Sum: > 21.7] > [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: > 0.4] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > Sum: 0.1] > [GC Worker Total (ms): Min: 33.0, Avg: 33.0, Max: 33.1, Diff: > 0.1, Sum: 132.1] > [GC Worker End (ms): Min: 1124739292.3, Avg: 1124739292.3, Max: > 1124739292.3, Diff: 0.0] > [Code Root Fixup: 0.0 ms] > [Clear CT: 0.1 ms] > [Other: 1.5 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 1.0 ms] > [Ref Enq: 0.1 ms] > [Free CSet: 0.0 ms] > [Eden: 0.0B(204.0M)->0.0B(204.0M) Survivors: 0.0B->0.0B Heap: > 4095.6M(4096.0M)->4095.6M(4096.0M)] > [Times: user=0.12 sys=0.00, real=0.04 secs] > 2014-08-06T04:49:57.933-0700: 1124739.294: [GC > concurrent-root-region-scan-start] > 2014-08-06T04:49:57.933-0700: 1124739.294: [GC > concurrent-root-region-scan-end, 0.0000157 secs] > 2014-08-06T04:49:57.933-0700: 1124739.294: [GC concurrent-mark-start] > 1124739.295: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: allocation request failed, allocation request: 80 bytes] > 1124739.295: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 2097152 bytes, attempted expansion amount: 2097152 > bytes] > 1124739.295: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > 2014-08-06T04:49:57.934-0700: 1124739.295: [Full GC > 4095M->2235M(4096M), 10.5341003 secs] > [Eden: 0.0B(204.0M)->0.0B(1436.0M) Survivors: 0.0B->0.0B Heap: > 4095.6M(4096.0M)->2235.4M(4096.0M)] > [Times: user=13.20 sys=0.03, real=10.52 secs] > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From srini_was at yahoo.com Thu Aug 7 14:54:15 2014 From: srini_was at yahoo.com (Srini Padman) Date: Thu, 7 Aug 2014 07:54:15 -0700 Subject: Seeking help regarding Full GCs with G1 In-Reply-To: <53E378C3.6040503@finkzeit.at> References: <1407365331.22707.YahooMailNeo@web140705.mail.bf1.yahoo.com> <53E378C3.6040503@finkzeit.at> Message-ID: <1407423255.31358.YahooMailNeo@web140701.mail.bf1.yahoo.com> Thanks for both the suggestions, Wolfgang. We are going with the following parameters for the next test run: -server -Xms4096m -Xmx4096m -Xss512k -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:+DisableExplicitGC -verbose:gc -XX:+PrintAdaptiveSizePolicy -XX:+UnlockExperimentalVMOptions -XX:G1HeapRegionSize=2m -XX:G1MixedGCLiveThresholdPercent=75 -XX:G1HeapWastePercent=5 -XX:InitiatingHeapOccupancyPercent=65 -XX:+UnlockDiagnosticVMOptions -XX:+G1PrintRegionLivenessInfo The expectations being: 1\ with a total heap size of 4 GB, an application memory footprint of 2.2 GB, and an acceptable heap waste of 5%, the "effective" footprint is 2.2 + 0.05 * 4 GB = 2.4 GB, which is slightly smaller than 60% of the heap 2\ setting the initiating occupancy percent to 65% gives us a little bit of operating room over the effective heap footprint of 60% 3\ even if the heap is "perfectly" fragmented, that is, even if this means *all* regions are 60% occupied, all of them will still be eligible for mixed GCs since the threshold is now 75%. Regards, Srini. On Thursday, August 7, 2014 9:02 AM, Wolfgang Pedot wrote: Hi again, it might also help to to look at how the regions are occupied. G1PrintRegionLivenessInfo will print the regions during the marking-phase so you can see how many are OLD or possibly HUMS and how they are occupied. This information has helped me quite a bit while tweaking G1 and our application for optimal performance. regards Wolfgang Am 07.08.2014 00:48, schrieb Srini Padman: > Hello, > > I am currently evaluating the use of the G1 Collector for our > application, to combat the fragmentation issues we ran into while > using the CMS collector (several cases of failed promotions, followed > by *really* long pauses). However, I am also having trouble with > tuning the G1 collector, and am seeing behavior that I can't fully > understand. I will appreciate any help/insight that you guys can offer. > > What I find puzzling from looking at the G1 GC logs from our tests is > that the concurrent marking phase does not really seem to identify > many old regions to clean up at all, and the heap usage keeps growing. > At some point, there is no further room to expand ("heap expansion > operation failed") and this is followed by a Full GC that lasts about > 10 seconds. But the Full GC actually brings the memory down by almost > 50%, from 4095M to 2235M. > > If the Full GC can collect this much of the heap, I don't fully > understand why the concurrent mark phase does not identify these > (old?) regions for (mixed?) collection subsequently. > > On the assumption that we should let the GC ergonomics do its thing > freely, I initially did not set any parameter other than -Xmx, -Xms, > and the PermGen sizes. I added the G1HeapRegionSize and > G1MixedGCLiveThresholdPercent settings (see below) because, when I saw > the Full GCs with the default settings, I wondered whether we might be > getting into a situation where all (or most?) regions are roughly 65% > live so the concurrent marking phase does not identify them for > collection but a subsequent Full GC is able to. That is, I wondered > whether our application's heap footprint being 65% of the max heap led > to these full GCs coincidentally (since G1MixedGCLiveThresholdPercent > is 65% by default). But I don't know why the same thing happens when I > set G1MixedGCLiveThresholdPercent down to 40% - even if all regions > are 40% full, we will only be at about 1.6 GB, and that is far below > what I think our heap footprint is in the long run (2.2 GB). So I > don't understand how to ensure that old regions are cleaned up > regularly so a Full GC is not required. > > GC Settings in use: > > -server -Xms4096m -Xmx4096m -Xss512k -XX:PermSize=128m > -XX:MaxPermSize=128m -XX:+UseG1GC -XX:+ParallelRefProcEnabled > -XX:+DisableExplicitGC -verbose:gc -XX:+PrintAdaptiveSizePolicy > -XX:+UnlockExperimentalVMOptions -XX:G1HeapRegionSize=2m > -XX:G1MixedGCLiveThresholdPercent=40 > > This is using JRE 1.7.0_55. > > I am including a short(ish) GC log snippet for the time leading up to > the Full GC. I can send the full GC log (about 8 MB, zipped) if necessary. > > Any help will be greatly appreciated! > > Regards, > Srini. > > --------------------------- > > 2014-08-06T04:46:00.067-0700: 1124501.033: [GC > concurrent-root-region-scan-start] > 2014-08-06T04:46:00.081-0700: 1124501.047: [GC > concurrent-root-region-scan-end, 0.0139487 secs] > 2014-08-06T04:46:00.081-0700: 1124501.047: [GC concurrent-mark-start] > 2014-08-06T04:46:10.531-0700: 1124511.514: [GC concurrent-mark-end, > 10.4675249 secs] > 2014-08-06T04:46:10.532-0700: 1124511.515: [GC remark > 2014-08-06T04:46:10.532-0700: 1124511.516: [GC ref-proc, 0.0018819 > secs], 0.0225253 secs] >? [Times: user=0.01 sys=0.00, real=0.02 secs] > 2014-08-06T04:46:10.555-0700: 1124511.539: [GC cleanup > 3922M->3922M(4096M), 0.0098209 secs] >? [Times: user=0.03 sys=0.03, real=0.01 secs] > 2014-08-06T04:46:49.603-0700: 1124550.652: [GC pause (young) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > - age? 1:? ? 1531592 bytes,? ? 1531592 total > - age? 2:? ? 1087648 bytes,? ? 2619240 total > - age? 3:? ? 259480 bytes,? ? 2878720 total > - age? 4:? ? 493976 bytes,? ? 3372696 total > - age? 5:? ? 213472 bytes,? ? 3586168 total > - age? 6:? ? 186104 bytes,? ? 3772272 total > - age? 7:? ? 169832 bytes,? ? 3942104 total > - age? 8:? ? 201968 bytes,? ? 4144072 total > - age? 9:? ? 183752 bytes,? ? 4327824 total > - age? 10:? ? 136480 bytes,? ? 4464304 total > - age? 11:? ? 366208 bytes,? ? 4830512 total > - age? 12:? ? 137296 bytes,? ? 4967808 total > - age? 13:? ? 133592 bytes,? ? 5101400 total > - age? 14:? ? 162232 bytes,? ? 5263632 total > - age? 15:? ? 139984 bytes,? ? 5403616 total >? 1124550.652: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 21647, predicted base time: 37.96 ms, remaining time: > 162.04 ms, target pause time: 200.00 ms] >? 1124550.652: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 95 regions, survivors: 7 regions, predicted young region > time: 4.46 ms] >? 1124550.652: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 95 regions, survivors: 7 regions, old: 0 regions, predicted > pause time: 42.42 ms, target pause time: 200.00 ms] >? 1124550.701: [G1Ergonomics (Concurrent Cycles) do not request > concurrent cycle initiation, reason: still doing mixed collections, > occupancy: 4064280576 bytes, allocation request: 0 bytes, threshold: > 1932735240 bytes (45.00 %), source: end of GC] >? 1124550.701: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: > candidate old regions available, candidate old regions: 285 regions, > reclaimable: 430117688 bytes (10.01 %), threshold: 10.00 %] > , 0.0494015 secs] >? ? [Parallel Time: 43.7 ms, GC Workers: 4] >? ? ? [GC Worker Start (ms): Min: 1124550651.8, Avg: 1124550668.7, > Max: 1124550674.3, Diff: 22.6] >? ? ? [Ext Root Scanning (ms): Min: 0.1, Avg: 6.7, Max: 22.3, Diff: > 22.2, Sum: 26.8] >? ? ? [Update RS (ms): Min: 9.9, Avg: 11.0, Max: 12.3, Diff: 2.5, Sum: > 44.0] >? ? ? ? ? [Processed Buffers: Min: 39, Avg: 40.3, Max: 41, Diff: 2, > Sum: 161] >? ? ? [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2] >? ? ? [Object Copy (ms): Min: 8.6, Avg: 8.9, Max: 9.6, Diff: 1.0, Sum: > 35.6] >? ? ? [Termination (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: > 0.3] >? ? ? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, > Sum: 0.2] >? ? ? [GC Worker Total (ms): Min: 21.1, Avg: 26.8, Max: 43.7, Diff: > 22.6, Sum: 107.1] >? ? ? [GC Worker End (ms): Min: 1124550695.4, Avg: 1124550695.5, Max: > 1124550695.5, Diff: 0.0] >? ? [Code Root Fixup: 0.0 ms] >? ? [Clear CT: 0.1 ms] >? ? [Other: 5.6 ms] >? ? ? [Choose CSet: 0.0 ms] >? ? ? [Ref Proc: 4.5 ms] >? ? ? [Ref Enq: 0.1 ms] >? ? ? [Free CSet: 0.3 ms] >? ? [Eden: 190.0M(190.0M)->0.0B(190.0M) Survivors: 14.0M->14.0M Heap: > 4077.0M(4096.0M)->3887.1M(4096.0M)] >? [Times: user=0.11 sys=0.00, real=0.05 secs] > 2014-08-06T04:47:45.545-0700: 1124606.686: [GC pause (mixed) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > - age? 1:? ? 1323232 bytes,? ? 1323232 total > - age? 2:? ? 716576 bytes,? ? 2039808 total > - age? 3:? ? 1058584 bytes,? ? 3098392 total > - age? 4:? ? 225208 bytes,? ? 3323600 total > - age? 5:? ? 447688 bytes,? ? 3771288 total > - age? 6:? ? 195112 bytes,? ? 3966400 total > - age? 7:? ? 178000 bytes,? ? 4144400 total > - age? 8:? ? 156904 bytes,? ? 4301304 total > - age? 9:? ? 193424 bytes,? ? 4494728 total > - age? 10:? ? 176272 bytes,? ? 4671000 total > - age? 11:? ? 134768 bytes,? ? 4805768 total > - age? 12:? ? 138896 bytes,? ? 4944664 total > - age? 13:? ? 132272 bytes,? ? 5076936 total > - age? 14:? ? 132856 bytes,? ? 5209792 total > - age? 15:? ? 161912 bytes,? ? 5371704 total >? 1124606.686: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 20335, predicted base time: 38.61 ms, remaining time: > 161.39 ms, target pause time: 200.00 ms] >? 1124606.686: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 95 regions, survivors: 7 regions, predicted young region > time: 4.53 ms] >? 1124606.686: [G1Ergonomics (CSet Construction) finish adding old > regions to CSet, reason: reclaimable percentage not over threshold, > old: 1 regions, max: 205 regions, reclaimable: 428818280 bytes (9.98 > %), threshold: 10.00 %] >? 1124606.686: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 95 regions, survivors: 7 regions, old: 1 regions, predicted > pause time: 45.72 ms, target pause time: 200.00 ms] >? 1124606.731: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: region allocation request failed, allocation request: 1048576 > bytes] >? 1124606.731: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 1048576 bytes, attempted expansion amount: 2097152 > bytes] >? 1124606.731: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] >? 1124606.743: [G1Ergonomics (Concurrent Cycles) do not request > concurrent cycle initiation, reason: still doing mixed collections, > occupancy: 4095737856 bytes, allocation request: 0 bytes, threshold: > 1932735240 bytes (45.00 %), source: end of GC] >? 1124606.743: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, > reason: reclaimable percentage not over threshold, candidate old > regions: 284 regions, reclaimable: 428818280 bytes (9.98 %), > threshold: 10.00 %] >? (to-space exhausted), 0.0568178 secs] >? ? [Parallel Time: 40.4 ms, GC Workers: 4] >? ? ? [GC Worker Start (ms): Min: 1124606686.1, Avg: 1124606701.7, > Max: 1124606723.8, Diff: 37.7] >? ? ? [Ext Root Scanning (ms): Min: 0.1, Avg: 6.3, Max: 16.1, Diff: > 16.1, Sum: 25.4] >? ? ? [Update RS (ms): Min: 0.0, Avg: 9.6, Max: 13.3, Diff: 13.3, Sum: > 38.6] >? ? ? ? ? [Processed Buffers: Min: 0, Avg: 37.5, Max: 52, Diff: 52, > Sum: 150] >? ? ? [Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 0.4, Diff: 0.4, Sum: 1.1] >? ? ? [Object Copy (ms): Min: 2.6, Avg: 8.4, Max: 11.1, Diff: 8.5, > Sum: 33.7] >? ? ? [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: > 0.0] >? ? ? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > Sum: 0.1] >? ? ? [GC Worker Total (ms): Min: 2.7, Avg: 24.7, Max: 40.4, Diff: > 37.7, Sum: 98.9] >? ? ? [GC Worker End (ms): Min: 1124606726.5, Avg: 1124606726.5, Max: > 1124606726.5, Diff: 0.0] >? ? [Code Root Fixup: 0.0 ms] >? ? [Clear CT: 0.1 ms] >? ? [Other: 16.3 ms] >? ? ? [Choose CSet: 0.0 ms] >? ? ? [Ref Proc: 7.7 ms] >? ? ? [Ref Enq: 0.2 ms] >? ? ? [Free CSet: 0.3 ms] >? ? [Eden: 190.0M(190.0M)->0.0B(188.0M) Survivors: 14.0M->16.0M Heap: > 4077.1M(4096.0M)->3921.6M(4096.0M)] >? [Times: user=0.11 sys=0.00, real=0.06 secs] > 2014-08-06T04:49:57.698-0700: 1124739.058: [GC pause (young) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > - age? 1:? ? 1130192 bytes,? ? 1130192 total > - age? 2:? ? 492816 bytes,? ? 1623008 total > - age? 3:? ? 675240 bytes,? ? 2298248 total > - age? 4:? ? 1038536 bytes,? ? 3336784 total > - age? 5:? ? 208048 bytes,? ? 3544832 total > - age? 6:? ? 436520 bytes,? ? 3981352 total > - age? 7:? ? 184528 bytes,? ? 4165880 total > - age? 8:? ? 165376 bytes,? ? 4331256 total > - age? 9:? ? 154872 bytes,? ? 4486128 total > - age? 10:? ? 179016 bytes,? ? 4665144 total > - age? 11:? ? 167760 bytes,? ? 4832904 total > - age? 12:? ? 132056 bytes,? ? 4964960 total > - age? 13:? ? 138736 bytes,? ? 5103696 total > - age? 14:? ? 132080 bytes,? ? 5235776 total > - age? 15:? ? 132856 bytes,? ? 5368632 total >? 1124739.058: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 44501, predicted base time: 51.94 ms, remaining time: > 148.06 ms, target pause time: 200.00 ms] >? 1124739.058: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 87 regions, survivors: 8 regions, predicted young region > time: 4.37 ms] >? 1124739.058: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 87 regions, survivors: 8 regions, old: 0 regions, predicted > pause time: 56.32 ms, target pause time: 200.00 ms] >? 1124739.060: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: region allocation request failed, allocation request: 1048576 > bytes] >? 1124739.060: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 1048576 bytes, attempted expansion amount: 2097152 > bytes] >? 1124739.060: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] >? 1124739.252: [G1Ergonomics (Concurrent Cycles) request concurrent > cycle initiation, reason: occupancy higher than threshold, occupancy: > 4294967296 bytes, allocation request: 0 bytes, threshold: 1932735240 > bytes (45.00 %), source: end of GC] >? (to-space exhausted), 0.1936102 secs] >? ? [Parallel Time: 146.6 ms, GC Workers: 4] >? ? ? [GC Worker Start (ms): Min: 1124739058.5, Avg: 1124739061.6, > Max: 1124739063.0, Diff: 4.4] >? ? ? [Ext Root Scanning (ms): Min: 0.2, Avg: 7.0, Max: 14.3, Diff: > 14.0, Sum: 28.2] >? ? ? [Update RS (ms): Min: 4.8, Avg: 10.7, Max: 17.6, Diff: 12.8, > Sum: 42.7] >? ? ? ? ? [Processed Buffers: Min: 47, Avg: 56.3, Max: 69, Diff: 22, > Sum: 225] >? ? ? [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.2] >? ? ? [Object Copy (ms): Min: 113.1, Avg: 125.6, Max: 137.6, Diff: > 24.5, Sum: 502.5] >? ? ? [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: > 0.2] >? ? ? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > Sum: 0.1] >? ? ? [GC Worker Total (ms): Min: 142.1, Avg: 143.5, Max: 146.5, Diff: > 4.4, Sum: 573.8] >? ? ? [GC Worker End (ms): Min: 1124739205.1, Avg: 1124739205.1, Max: > 1124739205.1, Diff: 0.0] >? ? [Code Root Fixup: 0.0 ms] >? ? [Clear CT: 0.1 ms] >? ? [Other: 46.9 ms] >? ? ? [Choose CSet: 0.0 ms] >? ? ? [Ref Proc: 1.0 ms] >? ? ? [Ref Enq: 0.1 ms] >? ? ? [Free CSet: 0.2 ms] >? ? [Eden: 174.0M(188.0M)->0.0B(204.0M) Survivors: 16.0M->0.0B Heap: > 4095.6M(4096.0M)->4095.6M(4096.0M)] >? [Times: user=0.36 sys=0.00, real=0.19 secs] >? 1124739.259: [G1Ergonomics (Concurrent Cycles) initiate concurrent > cycle, reason: concurrent cycle initiation requested] > 2014-08-06T04:49:57.898-0700: 1124739.259: [GC pause (young) > (initial-mark) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) >? 1124739.259: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 322560, predicted base time: 205.33 ms, remaining > time: 0.00 ms, target pause time: 200.00 ms] >? 1124739.259: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 0 regions, survivors: 0 regions, predicted young region > time: 0.00 ms] >? 1124739.259: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 0 regions, survivors: 0 regions, old: 0 regions, predicted pause > time: 205.33 ms, target pause time: 200.00 ms] > , 0.0347198 secs] >? ? [Parallel Time: 33.1 ms, GC Workers: 4] >? ? ? [GC Worker Start (ms): Min: 1124739259.3, Avg: 1124739259.3, > Max: 1124739259.3, Diff: 0.0] >? ? ? [Ext Root Scanning (ms): Min: 5.5, Avg: 7.7, Max: 11.0, Diff: > 5.4, Sum: 30.6] >? ? ? [Update RS (ms): Min: 18.4, Avg: 19.8, Max: 20.6, Diff: 2.2, > Sum: 79.4] >? ? ? ? ? [Processed Buffers: Min: 293, Avg: 315.3, Max: 350, Diff: 57, > Sum: 1261] >? ? ? [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] >? ? ? [Object Copy (ms): Min: 1.6, Avg: 5.4, Max: 6.9, Diff: 5.3, Sum: > 21.7] >? ? ? [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: > 0.4] >? ? ? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > Sum: 0.1] >? ? ? [GC Worker Total (ms): Min: 33.0, Avg: 33.0, Max: 33.1, Diff: > 0.1, Sum: 132.1] >? ? ? [GC Worker End (ms): Min: 1124739292.3, Avg: 1124739292.3, Max: > 1124739292.3, Diff: 0.0] >? ? [Code Root Fixup: 0.0 ms] >? ? [Clear CT: 0.1 ms] >? ? [Other: 1.5 ms] >? ? ? [Choose CSet: 0.0 ms] >? ? ? [Ref Proc: 1.0 ms] >? ? ? [Ref Enq: 0.1 ms] >? ? ? [Free CSet: 0.0 ms] >? ? [Eden: 0.0B(204.0M)->0.0B(204.0M) Survivors: 0.0B->0.0B Heap: > 4095.6M(4096.0M)->4095.6M(4096.0M)] >? [Times: user=0.12 sys=0.00, real=0.04 secs] > 2014-08-06T04:49:57.933-0700: 1124739.294: [GC > concurrent-root-region-scan-start] > 2014-08-06T04:49:57.933-0700: 1124739.294: [GC > concurrent-root-region-scan-end, 0.0000157 secs] > 2014-08-06T04:49:57.933-0700: 1124739.294: [GC concurrent-mark-start] >? 1124739.295: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: allocation request failed, allocation request: 80 bytes] >? 1124739.295: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 2097152 bytes, attempted expansion amount: 2097152 > bytes] >? 1124739.295: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > 2014-08-06T04:49:57.934-0700: 1124739.295: [Full GC > 4095M->2235M(4096M), 10.5341003 secs] >? ? [Eden: 0.0B(204.0M)->0.0B(1436.0M) Survivors: 0.0B->0.0B Heap: > 4095.6M(4096.0M)->2235.4M(4096.0M)] >? [Times: user=13.20 sys=0.03, real=10.52 secs] > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Thu Aug 7 23:30:45 2014 From: yu.zhang at oracle.com (Yu Zhang) Date: Thu, 07 Aug 2014 16:30:45 -0700 Subject: Seeking help regarding Full GCs with G1 In-Reply-To: <1407423255.31358.YahooMailNeo@web140701.mail.bf1.yahoo.com> References: <1407365331.22707.YahooMailNeo@web140705.mail.bf1.yahoo.com> <53E378C3.6040503@finkzeit.at> <1407423255.31358.YahooMailNeo@web140701.mail.bf1.yahoo.com> Message-ID: <53E40C25.3000205@oracle.com> I just added an entry to https://blogs.oracle.com/g1gc/ g1gc logs - Ergonomics -how to print and how to understand Hope this answers your question. Thanks, Jenny On 8/7/2014 7:54 AM, Srini Padman wrote: > Thanks for both the suggestions, Wolfgang. We are going with the > following parameters for the next test run: > > -server -Xms4096m -Xmx4096m -Xss512k -XX:PermSize=128m > -XX:MaxPermSize=128m -XX:+UseG1GC -XX:+ParallelRefProcEnabled > -XX:+DisableExplicitGC -verbose:gc -XX:+PrintAdaptiveSizePolicy > -XX:+UnlockExperimentalVMOptions -XX:G1HeapRegionSize=2m > -XX:G1MixedGCLiveThresholdPercent=75 -XX:G1HeapWastePercent=5 > -XX:InitiatingHeapOccupancyPercent=65 -XX:+UnlockDiagnosticVMOptions > -XX:+G1PrintRegionLivenessInfo > > The expectations being: > > 1\ with a total heap size of 4 GB, an application memory footprint of > 2.2 GB, and an acceptable heap waste of 5%, the "effective" footprint > is 2.2 + 0.05 * 4 GB = 2.4 GB, which is slightly smaller than 60% of > the heap > 2\ setting the initiating occupancy percent to 65% gives us a little > bit of operating room over the effective heap footprint of 60% > 3\ even if the heap is "perfectly" fragmented, that is, even if this > means *all* regions are 60% occupied, all of them will still be > eligible for mixed GCs since the threshold is now 75%. > > Regards, > Srini. > > > On Thursday, August 7, 2014 9:02 AM, Wolfgang Pedot > wrote: > > > Hi again, > > it might also help to to look at how the regions are occupied. > G1PrintRegionLivenessInfo will print the regions during the > marking-phase so you can see how many are OLD or possibly HUMS and how > they are occupied. > This information has helped me quite a bit while tweaking G1 and our > application for optimal performance. > > regards > Wolfgang > > Am 07.08.2014 00:48, schrieb Srini Padman: > > Hello, > > > > I am currently evaluating the use of the G1 Collector for our > > application, to combat the fragmentation issues we ran into while > > using the CMS collector (several cases of failed promotions, followed > > by *really* long pauses). However, I am also having trouble with > > tuning the G1 collector, and am seeing behavior that I can't fully > > understand. I will appreciate any help/insight that you guys can offer. > > > > What I find puzzling from looking at the G1 GC logs from our tests is > > that the concurrent marking phase does not really seem to identify > > many old regions to clean up at all, and the heap usage keeps growing. > > At some point, there is no further room to expand ("heap expansion > > operation failed") and this is followed by a Full GC that lasts about > > 10 seconds. But the Full GC actually brings the memory down by almost > > 50%, from 4095M to 2235M. > > > > If the Full GC can collect this much of the heap, I don't fully > > understand why the concurrent mark phase does not identify these > > (old?) regions for (mixed?) collection subsequently. > > > > On the assumption that we should let the GC ergonomics do its thing > > freely, I initially did not set any parameter other than -Xmx, -Xms, > > and the PermGen sizes. I added the G1HeapRegionSize and > > G1MixedGCLiveThresholdPercent settings (see below) because, when I saw > > the Full GCs with the default settings, I wondered whether we might be > > getting into a situation where all (or most?) regions are roughly 65% > > live so the concurrent marking phase does not identify them for > > collection but a subsequent Full GC is able to. That is, I wondered > > whether our application's heap footprint being 65% of the max heap led > > to these full GCs coincidentally (since G1MixedGCLiveThresholdPercent > > is 65% by default). But I don't know why the same thing happens when I > > set G1MixedGCLiveThresholdPercent down to 40% - even if all regions > > are 40% full, we will only be at about 1.6 GB, and that is far below > > what I think our heap footprint is in the long run (2.2 GB). So I > > don't understand how to ensure that old regions are cleaned up > > regularly so a Full GC is not required. > > > > GC Settings in use: > > > > -server -Xms4096m -Xmx4096m -Xss512k -XX:PermSize=128m > > -XX:MaxPermSize=128m -XX:+UseG1GC -XX:+ParallelRefProcEnabled > > -XX:+DisableExplicitGC -verbose:gc -XX:+PrintAdaptiveSizePolicy > > -XX:+UnlockExperimentalVMOptions -XX:G1HeapRegionSize=2m > > -XX:G1MixedGCLiveThresholdPercent=40 > > > > This is using JRE 1.7.0_55. > > > > I am including a short(ish) GC log snippet for the time leading up to > > the Full GC. I can send the full GC log (about 8 MB, zipped) if > necessary. > > > > Any help will be greatly appreciated! > > > > Regards, > > Srini. > > > > --------------------------- > > > > 2014-08-06T04:46:00.067-0700: 1124501.033: [GC > > concurrent-root-region-scan-start] > > 2014-08-06T04:46:00.081-0700: 1124501.047: [GC > > concurrent-root-region-scan-end, 0.0139487 secs] > > 2014-08-06T04:46:00.081-0700: 1124501.047: [GC concurrent-mark-start] > > 2014-08-06T04:46:10.531-0700: 1124511.514: [GC concurrent-mark-end, > > 10.4675249 secs] > > 2014-08-06T04:46:10.532-0700: 1124511.515: [GC remark > > 2014-08-06T04:46:10.532-0700: 1124511.516: [GC ref-proc, 0.0018819 > > secs], 0.0225253 secs] > > [Times: user=0.01 sys=0.00, real=0.02 secs] > > 2014-08-06T04:46:10.555-0700: 1124511.539: [GC cleanup > > 3922M->3922M(4096M), 0.0098209 secs] > > [Times: user=0.03 sys=0.03, real=0.01 secs] > > 2014-08-06T04:46:49.603-0700: 1124550.652: [GC pause (young) > > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > > - age 1: 1531592 bytes, 1531592 total > > - age 2: 1087648 bytes, 2619240 total > > - age 3: 259480 bytes, 2878720 total > > - age 4: 493976 bytes, 3372696 total > > - age 5: 213472 bytes, 3586168 total > > - age 6: 186104 bytes, 3772272 total > > - age 7: 169832 bytes, 3942104 total > > - age 8: 201968 bytes, 4144072 total > > - age 9: 183752 bytes, 4327824 total > > - age 10: 136480 bytes, 4464304 total > > - age 11: 366208 bytes, 4830512 total > > - age 12: 137296 bytes, 4967808 total > > - age 13: 133592 bytes, 5101400 total > > - age 14: 162232 bytes, 5263632 total > > - age 15: 139984 bytes, 5403616 total > > 1124550.652: [G1Ergonomics (CSet Construction) start choosing CSet, > > _pending_cards: 21647, predicted base time: 37.96 ms, remaining time: > > 162.04 ms, target pause time: 200.00 ms] > > 1124550.652: [G1Ergonomics (CSet Construction) add young regions to > > CSet, eden: 95 regions, survivors: 7 regions, predicted young region > > time: 4.46 ms] > > 1124550.652: [G1Ergonomics (CSet Construction) finish choosing CSet, > > eden: 95 regions, survivors: 7 regions, old: 0 regions, predicted > > pause time: 42.42 ms, target pause time: 200.00 ms] > > 1124550.701: [G1Ergonomics (Concurrent Cycles) do not request > > concurrent cycle initiation, reason: still doing mixed collections, > > occupancy: 4064280576 bytes, allocation request: 0 bytes, threshold: > > 1932735240 bytes (45.00 %), source: end of GC] > > 1124550.701: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: > > candidate old regions available, candidate old regions: 285 regions, > > reclaimable: 430117688 bytes (10.01 %), threshold: 10.00 %] > > , 0.0494015 secs] > > [Parallel Time: 43.7 ms, GC Workers: 4] > > [GC Worker Start (ms): Min: 1124550651.8, Avg: 1124550668.7, > > Max: 1124550674.3, Diff: 22.6] > > [Ext Root Scanning (ms): Min: 0.1, Avg: 6.7, Max: 22.3, Diff: > > 22.2, Sum: 26.8] > > [Update RS (ms): Min: 9.9, Avg: 11.0, Max: 12.3, Diff: 2.5, Sum: > > 44.0] > > [Processed Buffers: Min: 39, Avg: 40.3, Max: 41, Diff: 2, > > Sum: 161] > > [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2] > > [Object Copy (ms): Min: 8.6, Avg: 8.9, Max: 9.6, Diff: 1.0, Sum: > > 35.6] > > [Termination (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: > > 0.3] > > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, > > Sum: 0.2] > > [GC Worker Total (ms): Min: 21.1, Avg: 26.8, Max: 43.7, Diff: > > 22.6, Sum: 107.1] > > [GC Worker End (ms): Min: 1124550695.4, Avg: 1124550695.5, Max: > > 1124550695.5, Diff: 0.0] > > [Code Root Fixup: 0.0 ms] > > [Clear CT: 0.1 ms] > > [Other: 5.6 ms] > > [Choose CSet: 0.0 ms] > > [Ref Proc: 4.5 ms] > > [Ref Enq: 0.1 ms] > > [Free CSet: 0.3 ms] > > [Eden: 190.0M(190.0M)->0.0B(190.0M) Survivors: 14.0M->14.0M Heap: > > 4077.0M(4096.0M)->3887.1M(4096.0M)] > > [Times: user=0.11 sys=0.00, real=0.05 secs] > > 2014-08-06T04:47:45.545-0700: 1124606.686: [GC pause (mixed) > > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > > - age 1: 1323232 bytes, 1323232 total > > - age 2: 716576 bytes, 2039808 total > > - age 3: 1058584 bytes, 3098392 total > > - age 4: 225208 bytes, 3323600 total > > - age 5: 447688 bytes, 3771288 total > > - age 6: 195112 bytes, 3966400 total > > - age 7: 178000 bytes, 4144400 total > > - age 8: 156904 bytes, 4301304 total > > - age 9: 193424 bytes, 4494728 total > > - age 10: 176272 bytes, 4671000 total > > - age 11: 134768 bytes, 4805768 total > > - age 12: 138896 bytes, 4944664 total > > - age 13: 132272 bytes, 5076936 total > > - age 14: 132856 bytes, 5209792 total > > - age 15: 161912 bytes, 5371704 total > > 1124606.686: [G1Ergonomics (CSet Construction) start choosing CSet, > > _pending_cards: 20335, predicted base time: 38.61 ms, remaining time: > > 161.39 ms, target pause time: 200.00 ms] > > 1124606.686: [G1Ergonomics (CSet Construction) add young regions to > > CSet, eden: 95 regions, survivors: 7 regions, predicted young region > > time: 4.53 ms] > > 1124606.686: [G1Ergonomics (CSet Construction) finish adding old > > regions to CSet, reason: reclaimable percentage not over threshold, > > old: 1 regions, max: 205 regions, reclaimable: 428818280 bytes (9.98 > > %), threshold: 10.00 %] > > 1124606.686: [G1Ergonomics (CSet Construction) finish choosing CSet, > > eden: 95 regions, survivors: 7 regions, old: 1 regions, predicted > > pause time: 45.72 ms, target pause time: 200.00 ms] > > 1124606.731: [G1Ergonomics (Heap Sizing) attempt heap expansion, > > reason: region allocation request failed, allocation request: 1048576 > > bytes] > > 1124606.731: [G1Ergonomics (Heap Sizing) expand the heap, requested > > expansion amount: 1048576 bytes, attempted expansion amount: 2097152 > > bytes] > > 1124606.731: [G1Ergonomics (Heap Sizing) did not expand the heap, > > reason: heap expansion operation failed] > > 1124606.743: [G1Ergonomics (Concurrent Cycles) do not request > > concurrent cycle initiation, reason: still doing mixed collections, > > occupancy: 4095737856 bytes, allocation request: 0 bytes, threshold: > > 1932735240 bytes (45.00 %), source: end of GC] > > 1124606.743: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, > > reason: reclaimable percentage not over threshold, candidate old > > regions: 284 regions, reclaimable: 428818280 bytes (9.98 %), > > threshold: 10.00 %] > > (to-space exhausted), 0.0568178 secs] > > [Parallel Time: 40.4 ms, GC Workers: 4] > > [GC Worker Start (ms): Min: 1124606686.1, Avg: 1124606701.7, > > Max: 1124606723.8, Diff: 37.7] > > [Ext Root Scanning (ms): Min: 0.1, Avg: 6.3, Max: 16.1, Diff: > > 16.1, Sum: 25.4] > > [Update RS (ms): Min: 0.0, Avg: 9.6, Max: 13.3, Diff: 13.3, Sum: > > 38.6] > > [Processed Buffers: Min: 0, Avg: 37.5, Max: 52, Diff: 52, > > Sum: 150] > > [Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 0.4, Diff: 0.4, Sum: 1.1] > > [Object Copy (ms): Min: 2.6, Avg: 8.4, Max: 11.1, Diff: 8.5, > > Sum: 33.7] > > [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: > > 0.0] > > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > > Sum: 0.1] > > [GC Worker Total (ms): Min: 2.7, Avg: 24.7, Max: 40.4, Diff: > > 37.7, Sum: 98.9] > > [GC Worker End (ms): Min: 1124606726.5, Avg: 1124606726.5, Max: > > 1124606726.5, Diff: 0.0] > > [Code Root Fixup: 0.0 ms] > > [Clear CT: 0.1 ms] > > [Other: 16.3 ms] > > [Choose CSet: 0.0 ms] > > [Ref Proc: 7.7 ms] > > [Ref Enq: 0.2 ms] > > [Free CSet: 0.3 ms] > > [Eden: 190.0M(190.0M)->0.0B(188.0M) Survivors: 14.0M->16.0M Heap: > > 4077.1M(4096.0M)->3921.6M(4096.0M)] > > [Times: user=0.11 sys=0.00, real=0.06 secs] > > 2014-08-06T04:49:57.698-0700: 1124739.058: [GC pause (young) > > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > > - age 1: 1130192 bytes, 1130192 total > > - age 2: 492816 bytes, 1623008 total > > - age 3: 675240 bytes, 2298248 total > > - age 4: 1038536 bytes, 3336784 total > > - age 5: 208048 bytes, 3544832 total > > - age 6: 436520 bytes, 3981352 total > > - age 7: 184528 bytes, 4165880 total > > - age 8: 165376 bytes, 4331256 total > > - age 9: 154872 bytes, 4486128 total > > - age 10: 179016 bytes, 4665144 total > > - age 11: 167760 bytes, 4832904 total > > - age 12: 132056 bytes, 4964960 total > > - age 13: 138736 bytes, 5103696 total > > - age 14: 132080 bytes, 5235776 total > > - age 15: 132856 bytes, 5368632 total > > 1124739.058: [G1Ergonomics (CSet Construction) start choosing CSet, > > _pending_cards: 44501, predicted base time: 51.94 ms, remaining time: > > 148.06 ms, target pause time: 200.00 ms] > > 1124739.058: [G1Ergonomics (CSet Construction) add young regions to > > CSet, eden: 87 regions, survivors: 8 regions, predicted young region > > time: 4.37 ms] > > 1124739.058: [G1Ergonomics (CSet Construction) finish choosing CSet, > > eden: 87 regions, survivors: 8 regions, old: 0 regions, predicted > > pause time: 56.32 ms, target pause time: 200.00 ms] > > 1124739.060: [G1Ergonomics (Heap Sizing) attempt heap expansion, > > reason: region allocation request failed, allocation request: 1048576 > > bytes] > > 1124739.060: [G1Ergonomics (Heap Sizing) expand the heap, requested > > expansion amount: 1048576 bytes, attempted expansion amount: 2097152 > > bytes] > > 1124739.060: [G1Ergonomics (Heap Sizing) did not expand the heap, > > reason: heap expansion operation failed] > > 1124739.252: [G1Ergonomics (Concurrent Cycles) request concurrent > > cycle initiation, reason: occupancy higher than threshold, occupancy: > > 4294967296 bytes, allocation request: 0 bytes, threshold: 1932735240 > > bytes (45.00 %), source: end of GC] > > (to-space exhausted), 0.1936102 secs] > > [Parallel Time: 146.6 ms, GC Workers: 4] > > [GC Worker Start (ms): Min: 1124739058.5, Avg: 1124739061.6, > > Max: 1124739063.0, Diff: 4.4] > > [Ext Root Scanning (ms): Min: 0.2, Avg: 7.0, Max: 14.3, Diff: > > 14.0, Sum: 28.2] > > [Update RS (ms): Min: 4.8, Avg: 10.7, Max: 17.6, Diff: 12.8, > > Sum: 42.7] > > [Processed Buffers: Min: 47, Avg: 56.3, Max: 69, Diff: 22, > > Sum: 225] > > [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.2] > > [Object Copy (ms): Min: 113.1, Avg: 125.6, Max: 137.6, Diff: > > 24.5, Sum: 502.5] > > [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: > > 0.2] > > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > > Sum: 0.1] > > [GC Worker Total (ms): Min: 142.1, Avg: 143.5, Max: 146.5, Diff: > > 4.4, Sum: 573.8] > > [GC Worker End (ms): Min: 1124739205.1, Avg: 1124739205.1, Max: > > 1124739205.1, Diff: 0.0] > > [Code Root Fixup: 0.0 ms] > > [Clear CT: 0.1 ms] > > [Other: 46.9 ms] > > [Choose CSet: 0.0 ms] > > [Ref Proc: 1.0 ms] > > [Ref Enq: 0.1 ms] > > [Free CSet: 0.2 ms] > > [Eden: 174.0M(188.0M)->0.0B(204.0M) Survivors: 16.0M->0.0B Heap: > > 4095.6M(4096.0M)->4095.6M(4096.0M)] > > [Times: user=0.36 sys=0.00, real=0.19 secs] > > 1124739.259: [G1Ergonomics (Concurrent Cycles) initiate concurrent > > cycle, reason: concurrent cycle initiation requested] > > 2014-08-06T04:49:57.898-0700: 1124739.259: [GC pause (young) > > (initial-mark) > > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > > 1124739.259: [G1Ergonomics (CSet Construction) start choosing CSet, > > _pending_cards: 322560, predicted base time: 205.33 ms, remaining > > time: 0.00 ms, target pause time: 200.00 ms] > > 1124739.259: [G1Ergonomics (CSet Construction) add young regions to > > CSet, eden: 0 regions, survivors: 0 regions, predicted young region > > time: 0.00 ms] > > 1124739.259: [G1Ergonomics (CSet Construction) finish choosing CSet, > > eden: 0 regions, survivors: 0 regions, old: 0 regions, predicted pause > > time: 205.33 ms, target pause time: 200.00 ms] > > , 0.0347198 secs] > > [Parallel Time: 33.1 ms, GC Workers: 4] > > [GC Worker Start (ms): Min: 1124739259.3, Avg: 1124739259.3, > > Max: 1124739259.3, Diff: 0.0] > > [Ext Root Scanning (ms): Min: 5.5, Avg: 7.7, Max: 11.0, Diff: > > 5.4, Sum: 30.6] > > [Update RS (ms): Min: 18.4, Avg: 19.8, Max: 20.6, Diff: 2.2, > > Sum: 79.4] > > [Processed Buffers: Min: 293, Avg: 315.3, Max: 350, Diff: 57, > > Sum: 1261] > > [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] > > [Object Copy (ms): Min: 1.6, Avg: 5.4, Max: 6.9, Diff: 5.3, Sum: > > 21.7] > > [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: > > 0.4] > > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > > Sum: 0.1] > > [GC Worker Total (ms): Min: 33.0, Avg: 33.0, Max: 33.1, Diff: > > 0.1, Sum: 132.1] > > [GC Worker End (ms): Min: 1124739292.3, Avg: 1124739292.3, Max: > > 1124739292.3, Diff: 0.0] > > [Code Root Fixup: 0.0 ms] > > [Clear CT: 0.1 ms] > > [Other: 1.5 ms] > > [Choose CSet: 0.0 ms] > > [Ref Proc: 1.0 ms] > > [Ref Enq: 0.1 ms] > > [Free CSet: 0.0 ms] > > [Eden: 0.0B(204.0M)->0.0B(204.0M) Survivors: 0.0B->0.0B Heap: > > 4095.6M(4096.0M)->4095.6M(4096.0M)] > > [Times: user=0.12 sys=0.00, real=0.04 secs] > > 2014-08-06T04:49:57.933-0700: 1124739.294: [GC > > concurrent-root-region-scan-start] > > 2014-08-06T04:49:57.933-0700: 1124739.294: [GC > > concurrent-root-region-scan-end, 0.0000157 secs] > > 2014-08-06T04:49:57.933-0700: 1124739.294: [GC concurrent-mark-start] > > 1124739.295: [G1Ergonomics (Heap Sizing) attempt heap expansion, > > reason: allocation request failed, allocation request: 80 bytes] > > 1124739.295: [G1Ergonomics (Heap Sizing) expand the heap, requested > > expansion amount: 2097152 bytes, attempted expansion amount: 2097152 > > bytes] > > 1124739.295: [G1Ergonomics (Heap Sizing) did not expand the heap, > > reason: heap expansion operation failed] > > 2014-08-06T04:49:57.934-0700: 1124739.295: [Full GC > > 4095M->2235M(4096M), 10.5341003 secs] > > [Eden: 0.0B(204.0M)->0.0B(1436.0M) Survivors: 0.0B->0.0B Heap: > > 4095.6M(4096.0M)->2235.4M(4096.0M)] > > [Times: user=13.20 sys=0.03, real=10.52 secs] > > > > > > > > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Thu Aug 7 23:33:22 2014 From: yu.zhang at oracle.com (Yu Zhang) Date: Thu, 07 Aug 2014 16:33:22 -0700 Subject: G1gc compaction algorithm In-Reply-To: References: <53D09794.3090806@oracle.com> <1406186302.2920.4.camel@cirrus> <1406186865.2920.8.camel@cirrus> <53DA7B4D.3090000@oracle.com> <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> Message-ID: <53E40CC2.6020609@oracle.com> This is related to how the remember sets are stored in jvm. Coarsening is used to reduce memory footprint. But when coarsening happens, the RSet operations could be expensive. Thanks, Jenny On 8/6/2014 5:18 PM, Martin Makundi wrote: > Hi! > > Meanwhile, I was wondering what G1RSetRegionEntries and > G1RSetSparseRegionEntries do, google didn't give much information > about those. How do they work and which things do they affect? > > ** > Martin > > > 2014-08-07 3:14 GMT+03:00 Martin Makundi > >: > > Hi! > > Thanks. We were currently running 4M with wastepercent=0 but > unfortunately don't have any logs for that. > > Will try "-XX:G1HeapRegionSize=16m -XX:G1HeapWastePercent=10 > -XX:G1RSetRegionEntries=1792 -XX:G1RSetSparseRegionEntries=40" and > post results later back here. > > ** > Martin > > > 2014-08-07 0:53 GMT+03:00 Yu Zhang >: > > Martin, > > Thanks for the logs and following up with us. > > In this chart, the purple line is the ScanRS time for mixed > gc. At the bottom there are grey circles indicating when the > initial mark happens. The white is the ScanRS time for mixed > gc with to-space exhausted. > > > You can see that the first several scanRS after initial-mark > is ok, then they go up to 7000ms. For the 16m region size > runs, you have G1HeapWastePercent=0. (4m region size has > G1HeapWastePercent=1). Because of this, g1 will not stop > mixed gc till there is no candidate regions. From the space > claimed by mixed gc, it claims 2-3g heap, but the price is too > high. Another disadvantage is it does not start marking phases: > "do not request concurrent cycle initiation, reason: still > doing mixed collections, occupancy: 2147483648 > bytes, allocation request: 0 bytes, > threshold: 2147483640 bytes (10.00 %), > source: end of GC]" > > If you look at the claimable heap in MB, 4m heap region size > starts mixed gc at lower reclaimable > > > Another thing is there are coarsening for RSet Entries. > > Can you do one with > -XX:G1HeapRegionSize=16m -XX:G1HeapWastePercent=10 > -XX:G1RSetRegionEntries=1792 -XX:G1RSetSparseRegionEntries=40 > > Thanks, > Jenny > > On 8/4/2014 9:33 AM, Martin Makundi wrote: >> Hi! >> >> Here are the fresh logs: >> >> http://81.22.250.165/log/gc-16m-2014-08-04.log >> >> Today we were hit by quite some number of Full GC's with >> quite short intervals and as can be suspected, not so happy >> users ;) >> >> Any ideas? I will reduce the region size to 4M for now, >> because it resulted in much fewer full gcs. >> >> ** >> Martin >> >> >> 2014-08-01 1:17 GMT+03:00 Martin Makundi >> > >: >> >> Hmm.. ok, I copy pasted if from the mail, it works after >> typing manually, thanks. >> >> Problem seems to have been BOTH a whitespace typo AND >> UnlockDiagnosticOptions was on the right side. >> >> Thanks. >> >> Gathering logs now. >> >> ** >> Martin >> >> >> 2014-08-01 1:01 GMT+03:00 Yu Zhang > >: >> >> maybe some hidden text? >> >> Thanks, >> Jenny >> >> On 7/31/2014 2:52 PM, Martin Makundi wrote: >>> Strange that it is in the property summary but >>> doesn't allow setting it. >>> >>> >>> 2014-08-01 0:39 GMT+03:00 Martin Makundi >>> >> >: >>> >>> Hi! >>> >>> UnlockDiagnosticVMOptions is on (though later >>> (on the right side) in the command line). Jvm >>> version is >>> >>> java version "1.7.0_55" >>> Java(TM) SE Runtime Environment (build 1.7.0_55-b13) >>> Java HotSpot(TM) 64-Bit Server VM (build >>> 24.55-b03, mixed mode) >>> >>> >>> >>> 2014-08-01 0:37 GMT+03:00 Yu Zhang >>> >: >>> >>> Martin, >>> >>> These 2 need to run with >>> -XX:+UnlockDiagnosticVMOptions >>> >>> Thanks, >>> Jenny >>> >>> On 7/31/2014 2:33 PM, Martin Makundi wrote: >>>> Hi! >>>> >>>> G1SummarizeRSetStats does not seem to work, >>>> jvm says: >>>> >>>> Improperly specified VM option >>>> 'G1SummarizeRSetStatsPeriod=10' >>>> Error: Could not create the Java Virtual >>>> Machine. >>>> Error: A fatal exception has occurred. >>>> Program will exit. >>>> >>>> Same for both new options >>>> >>>> >>>> >>>> 2014-07-31 20:22 GMT+03:00 Yu Zhang >>>> >>> >: >>>> >>>> Martin, >>>> >>>> The ScanRS for mixed gc is extremely >>>> long, 1000-9000ms. Because it is over >>>> pause time goal, minimum old regions >>>> can be added to CSet. So mixed gc is >>>> not keeping up. >>>> >>>> Can do a run keeping 16m region size, >>>> no G1PrintRegionLivenessInfo, no >>>> PrintHeapAtGC. But >>>> -XX:+G1SummarizeRSetStats >>>> -XX:G1SummarizeRSetStatsPeriod=10 >>>> >>>> This should tell us more about RSet >>>> information. >>>> >>>> While the UpdateRS is not as bad as >>>> ScanRS, we can try to push it to the >>>> concurrent threads. Can you add >>>> -XX:G1RSetUpdatingPauseTimePercent=5. >>>> I am hoping this brings the UpdateRS >>>> down to 50ms. >>>> >>>> >>>> Thanks, >>>> Jenny >>>> >>>> On 7/28/2014 8:27 PM, Martin Makundi wrote: >>>> >>>> Hi! >>>> >>>> We suffered a couple of Full GC's >>>> using regionsize 5M (it seems to be >>>> exact looking at logged actual >>>> parameters) and we tried the 16M >>>> option and this resulted in more >>>> severe Full GC behavior. >>>> >>>> Here is the promised log for 16 M >>>> setting: >>>> http://81.22.250.165/log/gc-16m.log >>>> >>>> We switch back to 5M hoping it will >>>> behave more nicely. >>>> >>>> ** >>>> Martin >>>> >>>> >>>> >>> >>> >>> >> >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 22281 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 19368 bytes Desc: not available URL: From martin.makundi at koodaripalvelut.com Fri Aug 8 01:21:52 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Fri, 8 Aug 2014 04:21:52 +0300 Subject: G1gc compaction algorithm In-Reply-To: <53E40CC2.6020609@oracle.com> References: <53D09794.3090806@oracle.com> <1406186302.2920.4.camel@cirrus> <1406186865.2920.8.camel@cirrus> <53DA7B4D.3090000@oracle.com> <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> Message-ID: Is there any auto tuning parameter that could be activated something like adaptive sizing policy that would tune all these parameters automatically? It seems like there is some logic behind tuning and statistics maybe it could be automatic? ** Martin 2014-08-08 2:33 GMT+03:00 Yu Zhang : > This is related to how the remember sets are stored in jvm. Coarsening > is used to reduce memory footprint. But when coarsening happens, the RSet > operations could be expensive. > > Thanks, > Jenny > > On 8/6/2014 5:18 PM, Martin Makundi wrote: > > Hi! > > Meanwhile, I was wondering what G1RSetRegionEntries and G1RSetSparseRegionEntries > do, google didn't give much information about those. How do they work and > which things do they affect? > > ** > Martin > > > 2014-08-07 3:14 GMT+03:00 Martin Makundi < > martin.makundi at koodaripalvelut.com>: > >> Hi! >> >> Thanks. We were currently running 4M with wastepercent=0 but >> unfortunately don't have any logs for that. >> >> Will try "-XX:G1HeapRegionSize=16m -XX:G1HeapWastePercent=10 >> -XX:G1RSetRegionEntries=1792 -XX:G1RSetSparseRegionEntries=40" and post >> results later back here. >> >> ** >> Martin >> >> >> 2014-08-07 0:53 GMT+03:00 Yu Zhang : >> >> Martin, >>> >>> Thanks for the logs and following up with us. >>> >>> In this chart, the purple line is the ScanRS time for mixed gc. At the >>> bottom there are grey circles indicating when the initial mark happens. >>> The white is the ScanRS time for mixed gc with to-space exhausted. >>> >>> >>> You can see that the first several scanRS after initial-mark is ok, then >>> they go up to 7000ms. For the 16m region size runs, you have >>> G1HeapWastePercent=0. (4m region size has G1HeapWastePercent=1). Because >>> of this, g1 will not stop mixed gc till there is no candidate regions. >>> From the space claimed by mixed gc, it claims 2-3g heap, but the price is >>> too high. Another disadvantage is it does not start marking phases: >>> "do not request concurrent cycle initiation, reason: still doing mixed >>> collections, occupancy: 2147483648 bytes, allocation request: 0 bytes, >>> threshold: 2147483640 bytes (10.00 %), source: end of GC]" >>> >>> If you look at the claimable heap in MB, 4m heap region size starts >>> mixed gc at lower reclaimable >>> >>> >>> Another thing is there are coarsening for RSet Entries. >>> >>> Can you do one with >>> -XX:G1HeapRegionSize=16m -XX:G1HeapWastePercent=10 >>> -XX:G1RSetRegionEntries=1792 -XX:G1RSetSparseRegionEntries=40 >>> >>> Thanks, >>> Jenny >>> >>> On 8/4/2014 9:33 AM, Martin Makundi wrote: >>> >>> Hi! >>> >>> Here are the fresh logs: >>> >>> http://81.22.250.165/log/gc-16m-2014-08-04.log >>> >>> Today we were hit by quite some number of Full GC's with quite short >>> intervals and as can be suspected, not so happy users ;) >>> >>> Any ideas? I will reduce the region size to 4M for now, because it >>> resulted in much fewer full gcs. >>> >>> ** >>> Martin >>> >>> >>> 2014-08-01 1:17 GMT+03:00 Martin Makundi < >>> martin.makundi at koodaripalvelut.com>: >>> >>>> Hmm.. ok, I copy pasted if from the mail, it works after typing >>>> manually, thanks. >>>> >>>> Problem seems to have been BOTH a whitespace typo AND >>>> UnlockDiagnosticOptions was on the right side. >>>> >>>> Thanks. >>>> >>>> Gathering logs now. >>>> >>>> ** >>>> Martin >>>> >>>> >>>> 2014-08-01 1:01 GMT+03:00 Yu Zhang : >>>> >>>> maybe some hidden text? >>>>> >>>>> Thanks, >>>>> Jenny >>>>> >>>>> On 7/31/2014 2:52 PM, Martin Makundi wrote: >>>>> >>>>> Strange that it is in the property summary but doesn't allow setting >>>>> it. >>>>> >>>>> >>>>> 2014-08-01 0:39 GMT+03:00 Martin Makundi < >>>>> martin.makundi at koodaripalvelut.com>: >>>>> >>>>>> Hi! >>>>>> >>>>>> UnlockDiagnosticVMOptions is on (though later (on the right side) >>>>>> in the command line). Jvm version is >>>>>> >>>>>> java version "1.7.0_55" >>>>>> Java(TM) SE Runtime Environment (build 1.7.0_55-b13) >>>>>> Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) >>>>>> >>>>>> >>>>>> >>>>>> 2014-08-01 0:37 GMT+03:00 Yu Zhang : >>>>>> >>>>>> Martin, >>>>>>> >>>>>>> These 2 need to run with -XX:+UnlockDiagnosticVMOptions >>>>>>> >>>>>>> Thanks, >>>>>>> Jenny >>>>>>> >>>>>>> On 7/31/2014 2:33 PM, Martin Makundi wrote: >>>>>>> >>>>>>> Hi! >>>>>>> >>>>>>> G1SummarizeRSetStats does not seem to work, jvm says: >>>>>>> >>>>>>> Improperly specified VM option 'G1SummarizeRSetStatsPeriod=10' >>>>>>> Error: Could not create the Java Virtual Machine. >>>>>>> Error: A fatal exception has occurred. Program will exit. >>>>>>> >>>>>>> Same for both new options >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2014-07-31 20:22 GMT+03:00 Yu Zhang : >>>>>>> >>>>>>>> Martin, >>>>>>>> >>>>>>>> The ScanRS for mixed gc is extremely long, 1000-9000ms. Because it >>>>>>>> is over pause time goal, minimum old regions can be added to CSet. So >>>>>>>> mixed gc is not keeping up. >>>>>>>> >>>>>>>> Can do a run keeping 16m region size, no >>>>>>>> G1PrintRegionLivenessInfo, no PrintHeapAtGC. But -XX:+G1SummarizeRSetStats >>>>>>>> -XX:G1SummarizeRSetStatsPeriod=10 >>>>>>>> >>>>>>>> This should tell us more about RSet information. >>>>>>>> >>>>>>>> While the UpdateRS is not as bad as ScanRS, we can try to push it >>>>>>>> to the concurrent threads. Can you add >>>>>>>> -XX:G1RSetUpdatingPauseTimePercent=5. I am hoping this brings the UpdateRS >>>>>>>> down to 50ms. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Jenny >>>>>>>> >>>>>>>> On 7/28/2014 8:27 PM, Martin Makundi wrote: >>>>>>>> >>>>>>>>> Hi! >>>>>>>>> >>>>>>>>> We suffered a couple of Full GC's using regionsize 5M (it seems to >>>>>>>>> be exact looking at logged actual parameters) and we tried the 16M option >>>>>>>>> and this resulted in more severe Full GC behavior. >>>>>>>>> >>>>>>>>> Here is the promised log for 16 M setting: >>>>>>>>> http://81.22.250.165/log/gc-16m.log >>>>>>>>> >>>>>>>>> We switch back to 5M hoping it will behave more nicely. >>>>>>>>> >>>>>>>>> ** >>>>>>>>> Martin >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 19368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 22281 bytes Desc: not available URL: From martin.makundi at koodaripalvelut.com Mon Aug 11 17:46:45 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Mon, 11 Aug 2014 20:46:45 +0300 Subject: G1gc compaction algorithm In-Reply-To: References: <53D09794.3090806@oracle.com> <1406186302.2920.4.camel@cirrus> <1406186865.2920.8.camel@cirrus> <53DA7B4D.3090000@oracle.com> <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> Message-ID: Hi! Here is our latest log with one Full GC @ 2014-08-11T11:20:02 which is caused by heap full and allocation request: 144 bytes. http://81.22.250.165/log/gc-16m-2014-08-11.log Any ideas how to mitigate this kind of situation? The Full GC makes quite a difference to the situation but causes a painful pause also. ** Martin 2014-08-08 4:21 GMT+03:00 Martin Makundi : > Is there any auto tuning parameter that could be activated something like > adaptive sizing policy that would tune all these parameters automatically? > It seems like there is some logic behind tuning and statistics maybe it > could be automatic? > > ** > Martin > > > 2014-08-08 2:33 GMT+03:00 Yu Zhang : > > This is related to how the remember sets are stored in jvm. Coarsening >> is used to reduce memory footprint. But when coarsening happens, the RSet >> operations could be expensive. >> >> Thanks, >> Jenny >> >> On 8/6/2014 5:18 PM, Martin Makundi wrote: >> >> Hi! >> >> Meanwhile, I was wondering what G1RSetRegionEntries and G1RSetSparseRegionEntries >> do, google didn't give much information about those. How do they work and >> which things do they affect? >> >> ** >> Martin >> >> >> 2014-08-07 3:14 GMT+03:00 Martin Makundi < >> martin.makundi at koodaripalvelut.com>: >> >>> Hi! >>> >>> Thanks. We were currently running 4M with wastepercent=0 but >>> unfortunately don't have any logs for that. >>> >>> Will try "-XX:G1HeapRegionSize=16m -XX:G1HeapWastePercent=10 >>> -XX:G1RSetRegionEntries=1792 -XX:G1RSetSparseRegionEntries=40" and post >>> results later back here. >>> >>> ** >>> Martin >>> >>> >>> 2014-08-07 0:53 GMT+03:00 Yu Zhang : >>> >>> Martin, >>>> >>>> Thanks for the logs and following up with us. >>>> >>>> In this chart, the purple line is the ScanRS time for mixed gc. At the >>>> bottom there are grey circles indicating when the initial mark happens. >>>> The white is the ScanRS time for mixed gc with to-space exhausted. >>>> >>>> >>>> You can see that the first several scanRS after initial-mark is ok, >>>> then they go up to 7000ms. For the 16m region size runs, you have >>>> G1HeapWastePercent=0. (4m region size has G1HeapWastePercent=1). Because >>>> of this, g1 will not stop mixed gc till there is no candidate regions. >>>> From the space claimed by mixed gc, it claims 2-3g heap, but the price is >>>> too high. Another disadvantage is it does not start marking phases: >>>> "do not request concurrent cycle initiation, reason: still doing mixed >>>> collections, occupancy: 2147483648 bytes, allocation request: 0 bytes, >>>> threshold: 2147483640 bytes (10.00 %), source: end of GC]" >>>> >>>> If you look at the claimable heap in MB, 4m heap region size starts >>>> mixed gc at lower reclaimable >>>> >>>> >>>> Another thing is there are coarsening for RSet Entries. >>>> >>>> Can you do one with >>>> -XX:G1HeapRegionSize=16m -XX:G1HeapWastePercent=10 >>>> -XX:G1RSetRegionEntries=1792 -XX:G1RSetSparseRegionEntries=40 >>>> >>>> Thanks, >>>> Jenny >>>> >>>> On 8/4/2014 9:33 AM, Martin Makundi wrote: >>>> >>>> Hi! >>>> >>>> Here are the fresh logs: >>>> >>>> http://81.22.250.165/log/gc-16m-2014-08-04.log >>>> >>>> Today we were hit by quite some number of Full GC's with quite short >>>> intervals and as can be suspected, not so happy users ;) >>>> >>>> Any ideas? I will reduce the region size to 4M for now, because it >>>> resulted in much fewer full gcs. >>>> >>>> ** >>>> Martin >>>> >>>> >>>> 2014-08-01 1:17 GMT+03:00 Martin Makundi < >>>> martin.makundi at koodaripalvelut.com>: >>>> >>>>> Hmm.. ok, I copy pasted if from the mail, it works after typing >>>>> manually, thanks. >>>>> >>>>> Problem seems to have been BOTH a whitespace typo AND >>>>> UnlockDiagnosticOptions was on the right side. >>>>> >>>>> Thanks. >>>>> >>>>> Gathering logs now. >>>>> >>>>> ** >>>>> Martin >>>>> >>>>> >>>>> 2014-08-01 1:01 GMT+03:00 Yu Zhang : >>>>> >>>>> maybe some hidden text? >>>>>> >>>>>> Thanks, >>>>>> Jenny >>>>>> >>>>>> On 7/31/2014 2:52 PM, Martin Makundi wrote: >>>>>> >>>>>> Strange that it is in the property summary but doesn't allow setting >>>>>> it. >>>>>> >>>>>> >>>>>> 2014-08-01 0:39 GMT+03:00 Martin Makundi < >>>>>> martin.makundi at koodaripalvelut.com>: >>>>>> >>>>>>> Hi! >>>>>>> >>>>>>> UnlockDiagnosticVMOptions is on (though later (on the right side) >>>>>>> in the command line). Jvm version is >>>>>>> >>>>>>> java version "1.7.0_55" >>>>>>> Java(TM) SE Runtime Environment (build 1.7.0_55-b13) >>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2014-08-01 0:37 GMT+03:00 Yu Zhang : >>>>>>> >>>>>>> Martin, >>>>>>>> >>>>>>>> These 2 need to run with -XX:+UnlockDiagnosticVMOptions >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Jenny >>>>>>>> >>>>>>>> On 7/31/2014 2:33 PM, Martin Makundi wrote: >>>>>>>> >>>>>>>> Hi! >>>>>>>> >>>>>>>> G1SummarizeRSetStats does not seem to work, jvm says: >>>>>>>> >>>>>>>> Improperly specified VM option 'G1SummarizeRSetStatsPeriod=10' >>>>>>>> Error: Could not create the Java Virtual Machine. >>>>>>>> Error: A fatal exception has occurred. Program will exit. >>>>>>>> >>>>>>>> Same for both new options >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2014-07-31 20:22 GMT+03:00 Yu Zhang : >>>>>>>> >>>>>>>>> Martin, >>>>>>>>> >>>>>>>>> The ScanRS for mixed gc is extremely long, 1000-9000ms. Because >>>>>>>>> it is over pause time goal, minimum old regions can be added to CSet. So >>>>>>>>> mixed gc is not keeping up. >>>>>>>>> >>>>>>>>> Can do a run keeping 16m region size, no >>>>>>>>> G1PrintRegionLivenessInfo, no PrintHeapAtGC. But -XX:+G1SummarizeRSetStats >>>>>>>>> -XX:G1SummarizeRSetStatsPeriod=10 >>>>>>>>> >>>>>>>>> This should tell us more about RSet information. >>>>>>>>> >>>>>>>>> While the UpdateRS is not as bad as ScanRS, we can try to push it >>>>>>>>> to the concurrent threads. Can you add >>>>>>>>> -XX:G1RSetUpdatingPauseTimePercent=5. I am hoping this brings the UpdateRS >>>>>>>>> down to 50ms. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Jenny >>>>>>>>> >>>>>>>>> On 7/28/2014 8:27 PM, Martin Makundi wrote: >>>>>>>>> >>>>>>>>>> Hi! >>>>>>>>>> >>>>>>>>>> We suffered a couple of Full GC's using regionsize 5M (it seems >>>>>>>>>> to be exact looking at logged actual parameters) and we tried the 16M >>>>>>>>>> option and this resulted in more severe Full GC behavior. >>>>>>>>>> >>>>>>>>>> Here is the promised log for 16 M setting: >>>>>>>>>> http://81.22.250.165/log/gc-16m.log >>>>>>>>>> >>>>>>>>>> We switch back to 5M hoping it will behave more nicely. >>>>>>>>>> >>>>>>>>>> ** >>>>>>>>>> Martin >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 22281 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 19368 bytes Desc: not available URL: From yu.zhang at oracle.com Tue Aug 12 05:17:34 2014 From: yu.zhang at oracle.com (Yu Zhang) Date: Mon, 11 Aug 2014 22:17:34 -0700 Subject: G1gc compaction algorithm In-Reply-To: References: <1406186865.2920.8.camel@cirrus> <53DA7B4D.3090000@oracle.com> <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> Message-ID: <53E9A36E.80600@oracle.com> Martin, Based on this one, can you do one with -XX:G1MaxNewSizePercent=30 -XX:InitiatingHeapOccupancyPercent=20 added? The reason for G1MaxNewSizePercent(default=60) is to set an upper limit to Eden size. It seems the Eden size grows to 17g before Full gc, then a bunch of humongous allocation happened, and there is not enough old gen. The following log entry seems not right: The Eden Size is over 60% of the heap. "2014-08-11T11:13:05.487+0300: 193238.308: [GC pause (young) (initial-mark) 193238.308: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 769041, predicted base time: 673.25 ms, remaining time: 326.75 ms, target pause time: 1000.00 ms] 193238.308: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1 regions, survivors: 21 regions, predicted young region time: 145.63 ms] 193238.308: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1 regions, survivors: 21 regions, old: 0 regions, predicted pause time: 818.88 ms, target pause time: 1000.00 ms], 0.7559550 secs] [Parallel Time: 563.9 ms, GC Workers: 13] [GC Worker Start (ms): Min: 193238308.1, Avg: 193238318.0, Max: 193238347.6, Diff: 39.5] [Ext Root Scanning (ms): Min: 0.0, Avg: 13.0, Max: 35.8, Diff: 35.8, Sum: 168.4] [Update RS (ms): Min: 399.2, Avg: 416.8, Max: 442.8, Diff: 43.6, Sum: 5418.0] [Processed Buffers: Min: 162, Avg: 232.0, Max: 326, Diff: 164, Sum: 3016] [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Object Copy (ms): Min: 79.9, Avg: 104.8, Max: 152.4, Diff: 72.5, Sum: 1363.0] [Termination (ms): Min: 0.0, Avg: 19.1, Max: 27.3, Diff: 27.3, Sum: 248.9] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.3] [GC Worker Total (ms): Min: 524.1, Avg: 553.8, Max: 563.7, Diff: 39.6, Sum: 7198.8] [GC Worker End (ms): Min: 193238871.7, Avg: 193238871.8, Max: 193238871.8, Diff: 0.1] [Code Root Fixup: 0.0 ms] [Clear CT: 0.3 ms] [Other: 191.7 ms] [Choose CSet: 0.0 ms] [Ref Proc: 190.1 ms] [Ref Enq: 0.3 ms] [Free CSet: 0.2 ms] [Eden: 16.0M(2464.0M)->0.0B(22.9G) Survivors: 336.0M->240.0M Heap: 14.1G(28.7G)->14.1G(28.7G)] [Times: user=8.45 sys=0.04, real=0.75 secs]" The reason for increasing InitiatingHeapOccupancyPercent to 20 from 10 is we are wasting some concurrent cycles. We will see how this goes. We might increase G1ReservePercent to handle this kind of allocation if it is not enough. Thanks, Jenny Thanks, Jenny On 8/11/2014 10:46 AM, Martin Makundi wrote: > Hi! > > Here is our latest log with one Full GC @ 2014-08-11T11:20:02 which is > caused by heap full and allocation request: 144 bytes. > > http://81.22.250.165/log/gc-16m-2014-08-11.log > > Any ideas how to mitigate this kind of situation? The Full GC makes > quite a difference to the situation but causes a painful pause also. > > ** > Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.makundi at koodaripalvelut.com Tue Aug 12 06:29:22 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Tue, 12 Aug 2014 09:29:22 +0300 Subject: G1gc compaction algorithm In-Reply-To: <53E9A36E.80600@oracle.com> References: <1406186865.2920.8.camel@cirrus> <53DA7B4D.3090000@oracle.com> <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> <53E9A36E.80600@oracle.com> Message-ID: Hi! I tried the new parameters: > Based on this one, can you do one with -XX:G1MaxNewSizePercent=30 > -XX:InitiatingHeapOccupancyPercent=20 added? > This seems to hang the whole system.... we have lots of mostly short lived (ehcache timeToIdleSeconds="900") large java object trees 1M-10M each (data reports loaded into cache). Maybe eden should be even bigger instead of smaller? Here is the log from today, it hung up quite early, I suspect the gc: http://81.22.250.165/log/gc-16m-2014-08-12.log The process ate most of the cpu cacacity and we had to kill it and restart without -XX:G1MaxNewSizePercent=30. What you suggest? ** Martin > The reason for G1MaxNewSizePercent(default=60) is to set an upper limit to > Eden size. It seems the Eden size grows to 17g before Full gc, then a > bunch of humongous allocation happened, and there is not enough old gen. > > The following log entry seems not right: The Eden Size is over 60% of the > heap. > "2014-08-11T11:13:05.487+0300: 193238.308: [GC pause (young) > (initial-mark) 193238.308: [G1Ergonomics (CSet Construction) start choosing > CSet, _pending_cards: 769041, predicted base time: 673.25 ms, remaining > time: 326.75 ms, target pause time: 1000.00 ms] 193238.308: [G1Ergonomics > (CSet Construction) add young regions to CSet, eden: 1 regions, survivors: > 21 regions, predicted young region time: 145.63 ms] 193238.308: > [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1 regions, > survivors: 21 regions, old: 0 regions, predicted pause time: 818.88 ms, > target pause time: 1000.00 ms], 0.7559550 secs] [Parallel Time: 563.9 ms, > GC Workers: 13] [GC Worker Start (ms): Min: 193238308.1, Avg: > 193238318.0, Max: 193238347.6, Diff: 39.5] [Ext Root Scanning (ms): > Min: 0.0, Avg: 13.0, Max: 35.8, Diff: 35.8, Sum: 168.4] [Update RS > (ms): Min: 399.2, Avg: 416.8, Max: 442.8, Diff: 43.6, Sum: 5418.0] > [Processed Buffers: Min: 162, Avg: 232.0, Max: 326, Diff: 164, Sum: > 3016] [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: > 0.1] [Object Copy (ms): Min: 79.9, Avg: 104.8, Max: 152.4, Diff: 72.5, > Sum: 1363.0] [Termination (ms): Min: 0.0, Avg: 19.1, Max: 27.3, Diff: > 27.3, Sum: 248.9] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, > Diff: 0.0, Sum: 0.3] [GC Worker Total (ms): Min: 524.1, Avg: 553.8, > Max: 563.7, Diff: 39.6, Sum: 7198.8] [GC Worker End (ms): Min: > 193238871.7, Avg: 193238871.8, Max: 193238871.8, Diff: 0.1] > [Code Root Fixup: 0.0 ms] > [Clear CT: 0.3 ms] > [Other: 191.7 ms] > [Choose CSet: 0.0 ms] [Ref Proc: 190.1 ms] [Ref Enq: 0.3 > ms] [Free CSet: 0.2 ms] > [Eden: 16.0M(2464.0M)->0.0B(22.9G) Survivors: 336.0M->240.0M Heap: > 14.1G(28.7G)->14.1G(28.7G)] > [Times: user=8.45 sys=0.04, real=0.75 secs]" > > The reason for increasing InitiatingHeapOccupancyPercent to 20 from 10 is > we are wasting some concurrent cycles. > > We will see how this goes. We might increase G1ReservePercent to handle > this kind of allocation if it is not enough. > > Thanks, > Jenny > > Thanks, > Jenny > > On 8/11/2014 10:46 AM, Martin Makundi wrote: > > Hi! > > Here is our latest log with one Full GC @ 2014-08-11T11:20:02 which is > caused by heap full and allocation request: 144 bytes. > > http://81.22.250.165/log/gc-16m-2014-08-11.log > > Any ideas how to mitigate this kind of situation? The Full GC makes > quite a difference to the situation but causes a painful pause also. > > ** > Martin > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.makundi at koodaripalvelut.com Fri Aug 15 10:20:43 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Fri, 15 Aug 2014 13:20:43 +0300 Subject: G1gc compaction algorithm In-Reply-To: References: <1406186865.2920.8.camel@cirrus> <53DA7B4D.3090000@oracle.com> <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> <53E9A36E.80600@oracle.com> Message-ID: Hi! Here is our latest logs with bit more heap and suggested parameters. We tried eden max 85% with this run because 30% was unsuccessful earlier. Here is the log: http://81.22.250.165/log/gc-16m-2014-08-15.log It has a couple of full gc hits during a busy day, any new ideas? ** Martin 2014-08-13 9:39 GMT+03:00 Martin Makundi : >> >> Thanks. At the end, the system cpu is very high. I guess there are page >> faults due to the heap expansion around timestamp 12002.692. Is the memory >> tight on your system? > > > For performance sake we use compressedoops which limits the memory upper > bound, this means we could go to bit under 32g, 32255M. Going above > compressedoops will increase memory footprint and slow down processing so we > would prefer just tuning the gc while within compressedoops. > >> >> can you afford to start with -Xms30g -Xmx30g -XX:+AlwaysPreTouch? > > > Thanks, I'll try -Xms32255M -Xmx32255M -XX:+AlwaysPreTouch > > ** > Martin > >> >> Thanks, >> Jenny >> >> On 8/11/2014 11:29 PM, Martin Makundi wrote: >> >> Hi! >> >> I tried the new parameters: >> >>> >>> Based on this one, can you do one with -XX:G1MaxNewSizePercent=30 >>> -XX:InitiatingHeapOccupancyPercent=20 added? >> >> >> This seems to hang the whole system.... we have lots of mostly short lived >> (ehcache timeToIdleSeconds="900") large java object trees 1M-10M each (data >> reports loaded into cache). >> >> Maybe eden should be even bigger instead of smaller? >> >> Here is the log from today, it hung up quite early, I suspect the gc: >> http://81.22.250.165/log/gc-16m-2014-08-12.log >> >> The process ate most of the cpu cacacity and we had to kill it and restart >> without -XX:G1MaxNewSizePercent=30. >> >> What you suggest? >> >> ** >> Martin >> >>> >>> The reason for G1MaxNewSizePercent(default=60) is to set an upper limit >>> to Eden size. It seems the Eden size grows to 17g before Full gc, then a >>> bunch of humongous allocation happened, and there is not enough old gen. >>> >>> The following log entry seems not right: The Eden Size is over 60% of the >>> heap. >>> "2014-08-11T11:13:05.487+0300: 193238.308: [GC pause (young) >>> (initial-mark) 193238.308: [G1Ergonomics (CSet Construction) start choosing >>> CSet, _pending_cards: 769041, predicted base time: 673.25 ms, remaining >>> time: 326.75 ms, target pause time: 1000.00 ms] 193238.308: [G1Ergonomics >>> (CSet Construction) add young regions to CSet, eden: 1 regions, survivors: >>> 21 regions, predicted young region time: 145.63 ms] 193238.308: >>> [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1 regions, >>> survivors: 21 regions, old: 0 regions, predicted pause time: 818.88 ms, >>> target pause time: 1000.00 ms], 0.7559550 secs] [Parallel Time: 563.9 ms, >>> GC Workers: 13] [GC Worker Start (ms): Min: 193238308.1, Avg: >>> 193238318.0, Max: 193238347.6, Diff: 39.5] [Ext Root Scanning (ms): >>> Min: 0.0, Avg: 13.0, Max: 35.8, Diff: 35.8, Sum: 168.4] [Update RS >>> (ms): Min: 399.2, Avg: 416.8, Max: 442.8, Diff: 43.6, Sum: 5418.0] >>> [Processed Buffers: Min: 162, Avg: 232.0, Max: 326, Diff: 164, Sum: 3016] >>> [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] >>> [Object Copy (ms): Min: 79.9, Avg: 104.8, Max: 152.4, Diff: 72.5, Sum: >>> 1363.0] [Termination (ms): Min: 0.0, Avg: 19.1, Max: 27.3, Diff: 27.3, >>> Sum: 248.9] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: >>> 0.0, Sum: 0.3] [GC Worker Total (ms): Min: 524.1, Avg: 553.8, Max: >>> 563.7, Diff: 39.6, Sum: 7198.8] [GC Worker End (ms): Min: 193238871.7, >>> Avg: 193238871.8, Max: 193238871.8, Diff: 0.1] >>> [Code Root Fixup: 0.0 ms] >>> [Clear CT: 0.3 ms] >>> [Other: 191.7 ms] >>> [Choose CSet: 0.0 ms] [Ref Proc: 190.1 ms] [Ref Enq: 0.3 >>> ms] [Free CSet: 0.2 ms] >>> [Eden: 16.0M(2464.0M)->0.0B(22.9G) Survivors: 336.0M->240.0M Heap: >>> 14.1G(28.7G)->14.1G(28.7G)] >>> [Times: user=8.45 sys=0.04, real=0.75 secs]" >>> >>> The reason for increasing InitiatingHeapOccupancyPercent to 20 from 10 is >>> we are wasting some concurrent cycles. >>> >>> We will see how this goes. We might increase G1ReservePercent to handle >>> this kind of allocation if it is not enough. >>> >>> Thanks, >>> Jenny >>> >>> Thanks, >>> Jenny >>> >>> On 8/11/2014 10:46 AM, Martin Makundi wrote: >>> >>> Hi! >>> >>> Here is our latest log with one Full GC @ 2014-08-11T11:20:02 which is >>> caused by heap full and allocation request: 144 bytes. >>> >>> http://81.22.250.165/log/gc-16m-2014-08-11.log >>> >>> Any ideas how to mitigate this kind of situation? The Full GC makes quite >>> a difference to the situation but causes a painful pause also. >>> >>> ** >>> Martin >>> >>> >> >> > From Kannan.Krishnamurthy at contractor.cengage.com Mon Aug 18 20:34:09 2014 From: Kannan.Krishnamurthy at contractor.cengage.com (Krishnamurthy, Kannan) Date: Mon, 18 Aug 2014 20:34:09 +0000 Subject: Unexplained long stop the world pauses during concurrent marking step in G1 Collector Message-ID: Greetings, We are experiencing unexplained/unknown long pauses (8 seconds) during concurrent marking step of G1 collector. 2014-08-07T13:42:30.552-0400: 92183.303: [GC concurrent-root-region-scan-start] 2014-08-07T13:42:30.555-0400: 92183.305: [GC concurrent-root-region-scan-end, 0.0025230 secs] **2014-08-07T13:42:30.555-0400: 92183.305: [GC concurrent-mark-start]** **2014-08-07T13:42:39.598-0400: 92192.348: Application time: 9.0448670 seconds** 2014-08-07T13:42:39.601-0400: 92192.351: Total time for which application threads were stopped: 0.0029740 seconds 2014-08-07T13:42:39.603-0400: 92192.353: [GC pause (G1 Evacuation Pause) (young) 92192.354: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 7980, predicted base time: 28.19 ms, remaining time: 71.81 ms, target pause time: 100.00 ms `2014-08-07T13:42:30.555-0400: 92183.305` is when the concurrent mark starts, approximately after 2 seconds of this step the application starts to pause. However the GC logs claims the application was not paused during this window. Linux "top" shows single CPU at 100% and rest of the CPUs at 0% during the pause. Any help in understanding the root cause of this issue is appreciated. Our target JVMS: java version "1.7.0_04" Java(TM) SE Runtime Environment (build 1.7.0_04-b20) Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) java version "1.8.0_11" Java(TM) SE Runtime Environment (build 1.8.0_11-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode) Our JVM options : -Xms20G -Xmx20G -Xss10M -XX:PermSize=128M -XX:MaxPermSize=128M -XX:MarkStackSize=16M -XX:+UnlockDiagnosticVMOptions -XX:+G1PrintRegionLivenessInfo -XX:+TraceGCTaskThread -XX:+G1SummarizeConcMark -XX:+G1SummarizeRSetStats -XX:+G1TraceConcRefinement -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=65 -XX:ParallelGCThreads=24 -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/common/logs/ocean-partition-gc.log Thanks and regards, Kannan -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Tue Aug 19 16:56:30 2014 From: yu.zhang at oracle.com (Yu Zhang) Date: Tue, 19 Aug 2014 09:56:30 -0700 Subject: G1gc compaction algorithm In-Reply-To: References: <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> <53E9A36E.80600@oracle.com> Message-ID: <53F381BE.70803@oracle.com> Martin, Comparing 2 logs 08-15 vs 08-11, the allocation pattern seems changed. In 08-15, there are 4 requests allocation 109M ' 112484.582: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 109816768 bytes]' But in 08-11, the max allocation request is 48M '117595.621: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 47906832 bytes]' Does the application change the allocation when performance changes? Thanks, Jenny On 8/15/2014 3:20 AM, Martin Makundi wrote: > Hi! > > Here is our latest logs with bit more heap and suggested parameters. > We tried eden max 85% with this run because 30% was unsuccessful > earlier. > > Here is the log:http://81.22.250.165/log/gc-16m-2014-08-15.log > > It has a couple of full gc hits during a busy day, any new ideas? > > ** > Martin > > > 2014-08-13 9:39 GMT+03:00 Martin Makundi: >>> Thanks. At the end, the system cpu is very high. I guess there are page >>> faults due to the heap expansion around timestamp 12002.692. Is the memory >>> tight on your system? >> For performance sake we use compressedoops which limits the memory upper >> bound, this means we could go to bit under 32g, 32255M. Going above >> compressedoops will increase memory footprint and slow down processing so we >> would prefer just tuning the gc while within compressedoops. >> >>> can you afford to start with -Xms30g -Xmx30g -XX:+AlwaysPreTouch? >> Thanks, I'll try -Xms32255M -Xmx32255M -XX:+AlwaysPreTouch >> >> ** >> Martin >> >>> Thanks, >>> Jenny >>> >>> On 8/11/2014 11:29 PM, Martin Makundi wrote: >>> >>> Hi! >>> >>> I tried the new parameters: >>> >>>> Based on this one, can you do one with -XX:G1MaxNewSizePercent=30 >>>> -XX:InitiatingHeapOccupancyPercent=20 added? >>> This seems to hang the whole system.... we have lots of mostly short lived >>> (ehcache timeToIdleSeconds="900") large java object trees 1M-10M each (data >>> reports loaded into cache). >>> >>> Maybe eden should be even bigger instead of smaller? >>> >>> Here is the log from today, it hung up quite early, I suspect the gc: >>> http://81.22.250.165/log/gc-16m-2014-08-12.log >>> >>> The process ate most of the cpu cacacity and we had to kill it and restart >>> without -XX:G1MaxNewSizePercent=30. >>> >>> What you suggest? >>> >>> ** >>> Martin >>> >>>> The reason for G1MaxNewSizePercent(default=60) is to set an upper limit >>>> to Eden size. It seems the Eden size grows to 17g before Full gc, then a >>>> bunch of humongous allocation happened, and there is not enough old gen. >>>> >>>> The following log entry seems not right: The Eden Size is over 60% of the >>>> heap. >>>> "2014-08-11T11:13:05.487+0300: 193238.308: [GC pause (young) >>>> (initial-mark) 193238.308: [G1Ergonomics (CSet Construction) start choosing >>>> CSet, _pending_cards: 769041, predicted base time: 673.25 ms, remaining >>>> time: 326.75 ms, target pause time: 1000.00 ms] 193238.308: [G1Ergonomics >>>> (CSet Construction) add young regions to CSet, eden: 1 regions, survivors: >>>> 21 regions, predicted young region time: 145.63 ms] 193238.308: >>>> [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1 regions, >>>> survivors: 21 regions, old: 0 regions, predicted pause time: 818.88 ms, >>>> target pause time: 1000.00 ms], 0.7559550 secs] [Parallel Time: 563.9 ms, >>>> GC Workers: 13] [GC Worker Start (ms): Min: 193238308.1, Avg: >>>> 193238318.0, Max: 193238347.6, Diff: 39.5] [Ext Root Scanning (ms): >>>> Min: 0.0, Avg: 13.0, Max: 35.8, Diff: 35.8, Sum: 168.4] [Update RS >>>> (ms): Min: 399.2, Avg: 416.8, Max: 442.8, Diff: 43.6, Sum: 5418.0] >>>> [Processed Buffers: Min: 162, Avg: 232.0, Max: 326, Diff: 164, Sum: 3016] >>>> [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] >>>> [Object Copy (ms): Min: 79.9, Avg: 104.8, Max: 152.4, Diff: 72.5, Sum: >>>> 1363.0] [Termination (ms): Min: 0.0, Avg: 19.1, Max: 27.3, Diff: 27.3, >>>> Sum: 248.9] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: >>>> 0.0, Sum: 0.3] [GC Worker Total (ms): Min: 524.1, Avg: 553.8, Max: >>>> 563.7, Diff: 39.6, Sum: 7198.8] [GC Worker End (ms): Min: 193238871.7, >>>> Avg: 193238871.8, Max: 193238871.8, Diff: 0.1] >>>> [Code Root Fixup: 0.0 ms] >>>> [Clear CT: 0.3 ms] >>>> [Other: 191.7 ms] >>>> [Choose CSet: 0.0 ms] [Ref Proc: 190.1 ms] [Ref Enq: 0.3 >>>> ms] [Free CSet: 0.2 ms] >>>> [Eden: 16.0M(2464.0M)->0.0B(22.9G) Survivors: 336.0M->240.0M Heap: >>>> 14.1G(28.7G)->14.1G(28.7G)] >>>> [Times: user=8.45 sys=0.04, real=0.75 secs]" >>>> >>>> The reason for increasing InitiatingHeapOccupancyPercent to 20 from 10 is >>>> we are wasting some concurrent cycles. >>>> >>>> We will see how this goes. We might increase G1ReservePercent to handle >>>> this kind of allocation if it is not enough. >>>> >>>> Thanks, >>>> Jenny >>>> >>>> Thanks, >>>> Jenny >>>> >>>> On 8/11/2014 10:46 AM, Martin Makundi wrote: >>>> >>>> Hi! >>>> >>>> Here is our latest log with one Full GC @ 2014-08-11T11:20:02 which is >>>> caused by heap full and allocation request: 144 bytes. >>>> >>>> http://81.22.250.165/log/gc-16m-2014-08-11.log >>>> >>>> Any ideas how to mitigate this kind of situation? The Full GC makes quite >>>> a difference to the situation but causes a painful pause also. >>>> >>>> ** >>>> Martin >>>> >>>> From martin.makundi at koodaripalvelut.com Tue Aug 19 17:11:40 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Tue, 19 Aug 2014 20:11:40 +0300 Subject: G1gc compaction algorithm In-Reply-To: <53F381BE.70803@oracle.com> References: <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> <53E9A36E.80600@oracle.com> <53F381BE.70803@oracle.com> Message-ID: Hi! I suspect the application does not do the humongous allocations, I suspect it's the gc itself that does these. We have an allocationrecorder that never sees these humongous allocations within the application itself...assuming they are in a form that the allocation hook can detect. >From within the application we have lots of objects that are kept in ehcache, so ehcache manages the usage. I am not familiar with ehcache internals but I don't think it uses humongous objects in any way. The max allocation request varies on the objects that are loaded, it is possible that some request is 48m so yes it can vary depending on who logs in and what he/she does... not very reproducible. My main concern is that IF full gc can clean up the memory, there should be a mechanic that does just the same as full gc but without blocking for long time...concurrent full gc, do the 30-60 second operation for example 10% overhead until whole full gc is done (that would take 30-60/10% = 300-600 seconds). And if this is not feasible at the moment...what can we tune to mitigate the peaking in the garbage accumulation and thus avoid the full gc's. Help =) ** Martin 2014-08-19 19:56 GMT+03:00 Yu Zhang : > Martin, > > Comparing 2 logs 08-15 vs 08-11, the allocation pattern seems changed. In > 08-15, there are 4 requests allocation 109M > ' 112484.582: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > humongous allocation request failed, allocation request: 109816768 bytes]' > But in 08-11, the max allocation request is 48M > '117595.621: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > humongous allocation request failed, allocation request: 47906832 bytes]' > > Does the application change the allocation when performance changes? > > > Thanks, > Jenny > > > On 8/15/2014 3:20 AM, Martin Makundi wrote: > >> Hi! >> >> Here is our latest logs with bit more heap and suggested parameters. >> We tried eden max 85% with this run because 30% was unsuccessful >> earlier. >> >> Here is the log:http://81.22.250.165/log/gc-16m-2014-08-15.log >> >> It has a couple of full gc hits during a busy day, any new ideas? >> >> ** >> Martin >> >> >> 2014-08-13 9:39 GMT+03:00 Martin Makundi> koodaripalvelut.com>: >> >>> Thanks. At the end, the system cpu is very high. I guess there are page >>>> faults due to the heap expansion around timestamp 12002.692. Is the >>>> memory >>>> tight on your system? >>>> >>> For performance sake we use compressedoops which limits the memory upper >>> bound, this means we could go to bit under 32g, 32255M. Going above >>> compressedoops will increase memory footprint and slow down processing >>> so we >>> would prefer just tuning the gc while within compressedoops. >>> >>> can you afford to start with -Xms30g -Xmx30g -XX:+AlwaysPreTouch? >>>> >>> Thanks, I'll try -Xms32255M -Xmx32255M -XX:+AlwaysPreTouch >>> >>> ** >>> Martin >>> >>> Thanks, >>>> Jenny >>>> >>>> On 8/11/2014 11:29 PM, Martin Makundi wrote: >>>> >>>> Hi! >>>> >>>> I tried the new parameters: >>>> >>>> Based on this one, can you do one with -XX:G1MaxNewSizePercent=30 >>>>> -XX:InitiatingHeapOccupancyPercent=20 added? >>>>> >>>> This seems to hang the whole system.... we have lots of mostly short >>>> lived >>>> (ehcache timeToIdleSeconds="900") large java object trees 1M-10M each >>>> (data >>>> reports loaded into cache). >>>> >>>> Maybe eden should be even bigger instead of smaller? >>>> >>>> Here is the log from today, it hung up quite early, I suspect the gc: >>>> http://81.22.250.165/log/gc-16m-2014-08-12.log >>>> >>>> The process ate most of the cpu cacacity and we had to kill it and >>>> restart >>>> without -XX:G1MaxNewSizePercent=30. >>>> >>>> What you suggest? >>>> >>>> ** >>>> Martin >>>> >>>> The reason for G1MaxNewSizePercent(default=60) is to set an upper >>>>> limit >>>>> to Eden size. It seems the Eden size grows to 17g before Full gc, >>>>> then a >>>>> bunch of humongous allocation happened, and there is not enough old >>>>> gen. >>>>> >>>>> The following log entry seems not right: The Eden Size is over 60% of >>>>> the >>>>> heap. >>>>> "2014-08-11T11:13:05.487+0300: 193238.308: [GC pause (young) >>>>> (initial-mark) 193238.308: [G1Ergonomics (CSet Construction) start >>>>> choosing >>>>> CSet, _pending_cards: 769041, predicted base time: 673.25 ms, remaining >>>>> time: 326.75 ms, target pause time: 1000.00 ms] 193238.308: >>>>> [G1Ergonomics >>>>> (CSet Construction) add young regions to CSet, eden: 1 regions, >>>>> survivors: >>>>> 21 regions, predicted young region time: 145.63 ms] 193238.308: >>>>> [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1 >>>>> regions, >>>>> survivors: 21 regions, old: 0 regions, predicted pause time: 818.88 ms, >>>>> target pause time: 1000.00 ms], 0.7559550 secs] [Parallel Time: >>>>> 563.9 ms, >>>>> GC Workers: 13] [GC Worker Start (ms): Min: 193238308.1, Avg: >>>>> 193238318.0, Max: 193238347.6, Diff: 39.5] [Ext Root Scanning >>>>> (ms): >>>>> Min: 0.0, Avg: 13.0, Max: 35.8, Diff: 35.8, Sum: 168.4] [Update RS >>>>> (ms): Min: 399.2, Avg: 416.8, Max: 442.8, Diff: 43.6, Sum: 5418.0] >>>>> [Processed Buffers: Min: 162, Avg: 232.0, Max: 326, Diff: 164, Sum: >>>>> 3016] >>>>> [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] >>>>> [Object Copy (ms): Min: 79.9, Avg: 104.8, Max: 152.4, Diff: 72.5, Sum: >>>>> 1363.0] [Termination (ms): Min: 0.0, Avg: 19.1, Max: 27.3, Diff: >>>>> 27.3, >>>>> Sum: 248.9] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, >>>>> Diff: >>>>> 0.0, Sum: 0.3] [GC Worker Total (ms): Min: 524.1, Avg: 553.8, Max: >>>>> 563.7, Diff: 39.6, Sum: 7198.8] [GC Worker End (ms): Min: >>>>> 193238871.7, >>>>> Avg: 193238871.8, Max: 193238871.8, Diff: 0.1] >>>>> [Code Root Fixup: 0.0 ms] >>>>> [Clear CT: 0.3 ms] >>>>> [Other: 191.7 ms] >>>>> [Choose CSet: 0.0 ms] [Ref Proc: 190.1 ms] [Ref Enq: >>>>> 0.3 >>>>> ms] [Free CSet: 0.2 ms] >>>>> [Eden: 16.0M(2464.0M)->0.0B(22.9G) Survivors: 336.0M->240.0M Heap: >>>>> 14.1G(28.7G)->14.1G(28.7G)] >>>>> [Times: user=8.45 sys=0.04, real=0.75 secs]" >>>>> >>>>> The reason for increasing InitiatingHeapOccupancyPercent to 20 from 10 >>>>> is >>>>> we are wasting some concurrent cycles. >>>>> >>>>> We will see how this goes. We might increase G1ReservePercent to >>>>> handle >>>>> this kind of allocation if it is not enough. >>>>> >>>>> Thanks, >>>>> Jenny >>>>> >>>>> Thanks, >>>>> Jenny >>>>> >>>>> On 8/11/2014 10:46 AM, Martin Makundi wrote: >>>>> >>>>> Hi! >>>>> >>>>> Here is our latest log with one Full GC @ 2014-08-11T11:20:02 which is >>>>> caused by heap full and allocation request: 144 bytes. >>>>> >>>>> http://81.22.250.165/log/gc-16m-2014-08-11.log >>>>> >>>>> Any ideas how to mitigate this kind of situation? The Full GC makes >>>>> quite >>>>> a difference to the situation but causes a painful pause also. >>>>> >>>>> ** >>>>> Martin >>>>> >>>>> >>>>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.makundi at koodaripalvelut.com Wed Aug 20 08:35:34 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Wed, 20 Aug 2014 11:35:34 +0300 Subject: G1gc compaction algorithm In-Reply-To: References: <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> <53E9A36E.80600@oracle.com> <53F381BE.70803@oracle.com> Message-ID: Hi! Here is one more recent log: http://81.22.250.165/log/gc-2014-08-20.log with a couple of Full GC's. ** Martin 2014-08-19 20:11 GMT+03:00 Martin Makundi < martin.makundi at koodaripalvelut.com>: > Hi! > > I suspect the application does not do the humongous allocations, I suspect > it's the gc itself that does these. We have an allocationrecorder that > never sees these humongous allocations within the application > itself...assuming they are in a form that the allocation hook can detect. > > From within the application we have lots of objects that are kept in > ehcache, so ehcache manages the usage. I am not familiar with ehcache > internals but I don't think it uses humongous objects in any way. > > The max allocation request varies on the objects that are loaded, it is > possible that some request is 48m so yes it can vary depending on who logs > in and what he/she does... not very reproducible. > > My main concern is that IF full gc can clean up the memory, there should > be a mechanic that does just the same as full gc but without blocking for > long time...concurrent full gc, do the 30-60 second operation for example > 10% overhead until whole full gc is done (that would take 30-60/10% = > 300-600 seconds). > > And if this is not feasible at the moment...what can we tune to mitigate > the peaking in the garbage accumulation and thus avoid the full gc's. > > Help =) > > ** > Martin > > > 2014-08-19 19:56 GMT+03:00 Yu Zhang : > > Martin, >> >> Comparing 2 logs 08-15 vs 08-11, the allocation pattern seems changed. >> In 08-15, there are 4 requests allocation 109M >> ' 112484.582: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: >> humongous allocation request failed, allocation request: 109816768 bytes]' >> But in 08-11, the max allocation request is 48M >> '117595.621: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: >> humongous allocation request failed, allocation request: 47906832 bytes]' >> >> Does the application change the allocation when performance changes? >> >> >> Thanks, >> Jenny >> >> >> On 8/15/2014 3:20 AM, Martin Makundi wrote: >> >>> Hi! >>> >>> Here is our latest logs with bit more heap and suggested parameters. >>> We tried eden max 85% with this run because 30% was unsuccessful >>> earlier. >>> >>> Here is the log:http://81.22.250.165/log/gc-16m-2014-08-15.log >>> >>> It has a couple of full gc hits during a busy day, any new ideas? >>> >>> ** >>> Martin >>> >>> >>> 2014-08-13 9:39 GMT+03:00 Martin Makundi>> koodaripalvelut.com>: >>> >>>> Thanks. At the end, the system cpu is very high. I guess there are >>>>> page >>>>> faults due to the heap expansion around timestamp 12002.692. Is the >>>>> memory >>>>> tight on your system? >>>>> >>>> For performance sake we use compressedoops which limits the memory upper >>>> bound, this means we could go to bit under 32g, 32255M. Going above >>>> compressedoops will increase memory footprint and slow down processing >>>> so we >>>> would prefer just tuning the gc while within compressedoops. >>>> >>>> can you afford to start with -Xms30g -Xmx30g -XX:+AlwaysPreTouch? >>>>> >>>> Thanks, I'll try -Xms32255M -Xmx32255M -XX:+AlwaysPreTouch >>>> >>>> ** >>>> Martin >>>> >>>> Thanks, >>>>> Jenny >>>>> >>>>> On 8/11/2014 11:29 PM, Martin Makundi wrote: >>>>> >>>>> Hi! >>>>> >>>>> I tried the new parameters: >>>>> >>>>> Based on this one, can you do one with -XX:G1MaxNewSizePercent=30 >>>>>> -XX:InitiatingHeapOccupancyPercent=20 added? >>>>>> >>>>> This seems to hang the whole system.... we have lots of mostly short >>>>> lived >>>>> (ehcache timeToIdleSeconds="900") large java object trees 1M-10M each >>>>> (data >>>>> reports loaded into cache). >>>>> >>>>> Maybe eden should be even bigger instead of smaller? >>>>> >>>>> Here is the log from today, it hung up quite early, I suspect the gc: >>>>> http://81.22.250.165/log/gc-16m-2014-08-12.log >>>>> >>>>> The process ate most of the cpu cacacity and we had to kill it and >>>>> restart >>>>> without -XX:G1MaxNewSizePercent=30. >>>>> >>>>> What you suggest? >>>>> >>>>> ** >>>>> Martin >>>>> >>>>> The reason for G1MaxNewSizePercent(default=60) is to set an upper >>>>>> limit >>>>>> to Eden size. It seems the Eden size grows to 17g before Full gc, >>>>>> then a >>>>>> bunch of humongous allocation happened, and there is not enough old >>>>>> gen. >>>>>> >>>>>> The following log entry seems not right: The Eden Size is over 60% of >>>>>> the >>>>>> heap. >>>>>> "2014-08-11T11:13:05.487+0300: 193238.308: [GC pause (young) >>>>>> (initial-mark) 193238.308: [G1Ergonomics (CSet Construction) start >>>>>> choosing >>>>>> CSet, _pending_cards: 769041, predicted base time: 673.25 ms, >>>>>> remaining >>>>>> time: 326.75 ms, target pause time: 1000.00 ms] 193238.308: >>>>>> [G1Ergonomics >>>>>> (CSet Construction) add young regions to CSet, eden: 1 regions, >>>>>> survivors: >>>>>> 21 regions, predicted young region time: 145.63 ms] 193238.308: >>>>>> [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1 >>>>>> regions, >>>>>> survivors: 21 regions, old: 0 regions, predicted pause time: 818.88 >>>>>> ms, >>>>>> target pause time: 1000.00 ms], 0.7559550 secs] [Parallel Time: >>>>>> 563.9 ms, >>>>>> GC Workers: 13] [GC Worker Start (ms): Min: 193238308.1, Avg: >>>>>> 193238318.0, Max: 193238347.6, Diff: 39.5] [Ext Root Scanning >>>>>> (ms): >>>>>> Min: 0.0, Avg: 13.0, Max: 35.8, Diff: 35.8, Sum: 168.4] [Update >>>>>> RS >>>>>> (ms): Min: 399.2, Avg: 416.8, Max: 442.8, Diff: 43.6, Sum: 5418.0] >>>>>> [Processed Buffers: Min: 162, Avg: 232.0, Max: 326, Diff: 164, Sum: >>>>>> 3016] >>>>>> [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] >>>>>> [Object Copy (ms): Min: 79.9, Avg: 104.8, Max: 152.4, Diff: 72.5, Sum: >>>>>> 1363.0] [Termination (ms): Min: 0.0, Avg: 19.1, Max: 27.3, Diff: >>>>>> 27.3, >>>>>> Sum: 248.9] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, >>>>>> Diff: >>>>>> 0.0, Sum: 0.3] [GC Worker Total (ms): Min: 524.1, Avg: 553.8, >>>>>> Max: >>>>>> 563.7, Diff: 39.6, Sum: 7198.8] [GC Worker End (ms): Min: >>>>>> 193238871.7, >>>>>> Avg: 193238871.8, Max: 193238871.8, Diff: 0.1] >>>>>> [Code Root Fixup: 0.0 ms] >>>>>> [Clear CT: 0.3 ms] >>>>>> [Other: 191.7 ms] >>>>>> [Choose CSet: 0.0 ms] [Ref Proc: 190.1 ms] [Ref Enq: >>>>>> 0.3 >>>>>> ms] [Free CSet: 0.2 ms] >>>>>> [Eden: 16.0M(2464.0M)->0.0B(22.9G) Survivors: 336.0M->240.0M Heap: >>>>>> 14.1G(28.7G)->14.1G(28.7G)] >>>>>> [Times: user=8.45 sys=0.04, real=0.75 secs]" >>>>>> >>>>>> The reason for increasing InitiatingHeapOccupancyPercent to 20 from >>>>>> 10 is >>>>>> we are wasting some concurrent cycles. >>>>>> >>>>>> We will see how this goes. We might increase G1ReservePercent to >>>>>> handle >>>>>> this kind of allocation if it is not enough. >>>>>> >>>>>> Thanks, >>>>>> Jenny >>>>>> >>>>>> Thanks, >>>>>> Jenny >>>>>> >>>>>> On 8/11/2014 10:46 AM, Martin Makundi wrote: >>>>>> >>>>>> Hi! >>>>>> >>>>>> Here is our latest log with one Full GC @ 2014-08-11T11:20:02 which is >>>>>> caused by heap full and allocation request: 144 bytes. >>>>>> >>>>>> http://81.22.250.165/log/gc-16m-2014-08-11.log >>>>>> >>>>>> Any ideas how to mitigate this kind of situation? The Full GC makes >>>>>> quite >>>>>> a difference to the situation but causes a painful pause also. >>>>>> >>>>>> ** >>>>>> Martin >>>>>> >>>>>> >>>>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.makundi at koodaripalvelut.com Thu Aug 21 14:21:51 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Thu, 21 Aug 2014 17:21:51 +0300 Subject: G1gc compaction algorithm In-Reply-To: References: <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> <53E9A36E.80600@oracle.com> <53F381BE.70803@oracle.com> Message-ID: Hi! Why doesn't the g1gc operate like azul c4 is described to operate: http://www.azulsystems.com/technology/c4-garbage-collector "fully concurrent, so it never falls back to a stop-the-world compaction" ? ** Martin 2014-08-20 11:35 GMT+03:00 Martin Makundi < martin.makundi at koodaripalvelut.com>: > Hi! > > Here is one more recent log: > > http://81.22.250.165/log/gc-2014-08-20.log > > with a couple of Full GC's. > > ** > Martin > > > 2014-08-19 20:11 GMT+03:00 Martin Makundi < > martin.makundi at koodaripalvelut.com>: > > Hi! >> >> I suspect the application does not do the humongous allocations, I >> suspect it's the gc itself that does these. We have an allocationrecorder >> that never sees these humongous allocations within the application >> itself...assuming they are in a form that the allocation hook can detect. >> >> From within the application we have lots of objects that are kept in >> ehcache, so ehcache manages the usage. I am not familiar with ehcache >> internals but I don't think it uses humongous objects in any way. >> >> The max allocation request varies on the objects that are loaded, it is >> possible that some request is 48m so yes it can vary depending on who logs >> in and what he/she does... not very reproducible. >> >> My main concern is that IF full gc can clean up the memory, there should >> be a mechanic that does just the same as full gc but without blocking for >> long time...concurrent full gc, do the 30-60 second operation for example >> 10% overhead until whole full gc is done (that would take 30-60/10% = >> 300-600 seconds). >> >> And if this is not feasible at the moment...what can we tune to mitigate >> the peaking in the garbage accumulation and thus avoid the full gc's. >> >> Help =) >> >> ** >> Martin >> >> >> 2014-08-19 19:56 GMT+03:00 Yu Zhang : >> >> Martin, >>> >>> Comparing 2 logs 08-15 vs 08-11, the allocation pattern seems changed. >>> In 08-15, there are 4 requests allocation 109M >>> ' 112484.582: [G1Ergonomics (Heap Sizing) attempt heap expansion, >>> reason: humongous allocation request failed, allocation request: 109816768 >>> bytes]' >>> But in 08-11, the max allocation request is 48M >>> '117595.621: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: >>> humongous allocation request failed, allocation request: 47906832 bytes]' >>> >>> Does the application change the allocation when performance changes? >>> >>> >>> Thanks, >>> Jenny >>> >>> >>> On 8/15/2014 3:20 AM, Martin Makundi wrote: >>> >>>> Hi! >>>> >>>> Here is our latest logs with bit more heap and suggested parameters. >>>> We tried eden max 85% with this run because 30% was unsuccessful >>>> earlier. >>>> >>>> Here is the log:http://81.22.250.165/log/gc-16m-2014-08-15.log >>>> >>>> It has a couple of full gc hits during a busy day, any new ideas? >>>> >>>> ** >>>> Martin >>>> >>>> >>>> 2014-08-13 9:39 GMT+03:00 Martin Makundi>>> koodaripalvelut.com>: >>>> >>>>> Thanks. At the end, the system cpu is very high. I guess there are >>>>>> page >>>>>> faults due to the heap expansion around timestamp 12002.692. Is the >>>>>> memory >>>>>> tight on your system? >>>>>> >>>>> For performance sake we use compressedoops which limits the memory >>>>> upper >>>>> bound, this means we could go to bit under 32g, 32255M. Going above >>>>> compressedoops will increase memory footprint and slow down processing >>>>> so we >>>>> would prefer just tuning the gc while within compressedoops. >>>>> >>>>> can you afford to start with -Xms30g -Xmx30g -XX:+AlwaysPreTouch? >>>>>> >>>>> Thanks, I'll try -Xms32255M -Xmx32255M -XX:+AlwaysPreTouch >>>>> >>>>> ** >>>>> Martin >>>>> >>>>> Thanks, >>>>>> Jenny >>>>>> >>>>>> On 8/11/2014 11:29 PM, Martin Makundi wrote: >>>>>> >>>>>> Hi! >>>>>> >>>>>> I tried the new parameters: >>>>>> >>>>>> Based on this one, can you do one with -XX:G1MaxNewSizePercent=30 >>>>>>> -XX:InitiatingHeapOccupancyPercent=20 added? >>>>>>> >>>>>> This seems to hang the whole system.... we have lots of mostly short >>>>>> lived >>>>>> (ehcache timeToIdleSeconds="900") large java object trees 1M-10M each >>>>>> (data >>>>>> reports loaded into cache). >>>>>> >>>>>> Maybe eden should be even bigger instead of smaller? >>>>>> >>>>>> Here is the log from today, it hung up quite early, I suspect the gc: >>>>>> http://81.22.250.165/log/gc-16m-2014-08-12.log >>>>>> >>>>>> The process ate most of the cpu cacacity and we had to kill it and >>>>>> restart >>>>>> without -XX:G1MaxNewSizePercent=30. >>>>>> >>>>>> What you suggest? >>>>>> >>>>>> ** >>>>>> Martin >>>>>> >>>>>> The reason for G1MaxNewSizePercent(default=60) is to set an upper >>>>>>> limit >>>>>>> to Eden size. It seems the Eden size grows to 17g before Full gc, >>>>>>> then a >>>>>>> bunch of humongous allocation happened, and there is not enough old >>>>>>> gen. >>>>>>> >>>>>>> The following log entry seems not right: The Eden Size is over 60% >>>>>>> of the >>>>>>> heap. >>>>>>> "2014-08-11T11:13:05.487+0300: 193238.308: [GC pause (young) >>>>>>> (initial-mark) 193238.308: [G1Ergonomics (CSet Construction) start >>>>>>> choosing >>>>>>> CSet, _pending_cards: 769041, predicted base time: 673.25 ms, >>>>>>> remaining >>>>>>> time: 326.75 ms, target pause time: 1000.00 ms] 193238.308: >>>>>>> [G1Ergonomics >>>>>>> (CSet Construction) add young regions to CSet, eden: 1 regions, >>>>>>> survivors: >>>>>>> 21 regions, predicted young region time: 145.63 ms] 193238.308: >>>>>>> [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1 >>>>>>> regions, >>>>>>> survivors: 21 regions, old: 0 regions, predicted pause time: 818.88 >>>>>>> ms, >>>>>>> target pause time: 1000.00 ms], 0.7559550 secs] [Parallel Time: >>>>>>> 563.9 ms, >>>>>>> GC Workers: 13] [GC Worker Start (ms): Min: 193238308.1, Avg: >>>>>>> 193238318.0, Max: 193238347.6, Diff: 39.5] [Ext Root Scanning >>>>>>> (ms): >>>>>>> Min: 0.0, Avg: 13.0, Max: 35.8, Diff: 35.8, Sum: 168.4] [Update >>>>>>> RS >>>>>>> (ms): Min: 399.2, Avg: 416.8, Max: 442.8, Diff: 43.6, Sum: 5418.0] >>>>>>> [Processed Buffers: Min: 162, Avg: 232.0, Max: 326, Diff: 164, Sum: >>>>>>> 3016] >>>>>>> [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] >>>>>>> [Object Copy (ms): Min: 79.9, Avg: 104.8, Max: 152.4, Diff: 72.5, >>>>>>> Sum: >>>>>>> 1363.0] [Termination (ms): Min: 0.0, Avg: 19.1, Max: 27.3, >>>>>>> Diff: 27.3, >>>>>>> Sum: 248.9] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: >>>>>>> 0.0, Diff: >>>>>>> 0.0, Sum: 0.3] [GC Worker Total (ms): Min: 524.1, Avg: 553.8, >>>>>>> Max: >>>>>>> 563.7, Diff: 39.6, Sum: 7198.8] [GC Worker End (ms): Min: >>>>>>> 193238871.7, >>>>>>> Avg: 193238871.8, Max: 193238871.8, Diff: 0.1] >>>>>>> [Code Root Fixup: 0.0 ms] >>>>>>> [Clear CT: 0.3 ms] >>>>>>> [Other: 191.7 ms] >>>>>>> [Choose CSet: 0.0 ms] [Ref Proc: 190.1 ms] [Ref >>>>>>> Enq: 0.3 >>>>>>> ms] [Free CSet: 0.2 ms] >>>>>>> [Eden: 16.0M(2464.0M)->0.0B(22.9G) Survivors: 336.0M->240.0M >>>>>>> Heap: >>>>>>> 14.1G(28.7G)->14.1G(28.7G)] >>>>>>> [Times: user=8.45 sys=0.04, real=0.75 secs]" >>>>>>> >>>>>>> The reason for increasing InitiatingHeapOccupancyPercent to 20 from >>>>>>> 10 is >>>>>>> we are wasting some concurrent cycles. >>>>>>> >>>>>>> We will see how this goes. We might increase G1ReservePercent to >>>>>>> handle >>>>>>> this kind of allocation if it is not enough. >>>>>>> >>>>>>> Thanks, >>>>>>> Jenny >>>>>>> >>>>>>> Thanks, >>>>>>> Jenny >>>>>>> >>>>>>> On 8/11/2014 10:46 AM, Martin Makundi wrote: >>>>>>> >>>>>>> Hi! >>>>>>> >>>>>>> Here is our latest log with one Full GC @ 2014-08-11T11:20:02 which >>>>>>> is >>>>>>> caused by heap full and allocation request: 144 bytes. >>>>>>> >>>>>>> http://81.22.250.165/log/gc-16m-2014-08-11.log >>>>>>> >>>>>>> Any ideas how to mitigate this kind of situation? The Full GC makes >>>>>>> quite >>>>>>> a difference to the situation but causes a painful pause also. >>>>>>> >>>>>>> ** >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Thu Aug 21 23:20:01 2014 From: yu.zhang at oracle.com (Yu Zhang) Date: Thu, 21 Aug 2014 16:20:01 -0700 Subject: G1gc compaction algorithm In-Reply-To: References: <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> <53E9A36E.80600@oracle.com> <53F381BE.70803@oracle.com> Message-ID: <53F67EA1.3080406@oracle.com> Adding Thomas as my understanding of how G1ReservePercent my not be accurate. Comments in line. Thanks, Jenny On 8/19/2014 10:11 AM, Martin Makundi wrote: > Hi! > > I suspect the application does not do the humongous allocations, I > suspect it's the gc itself that does these. We have an > allocationrecorder that never sees these humongous allocations within > the application itself...assuming they are in a form that the > allocation hook can detect. It is unlikely g1 is doing this. Is it possible that the one you recorded did not have those requests? > > From within the application we have lots of objects that are kept in > ehcache, so ehcache manages the usage. I am not familiar with ehcache > internals but I don't think it uses humongous objects in any way. > > The max allocation request varies on the objects that are loaded, it > is possible that some request is 48m so yes it can vary depending on > who logs in and what he/she does... not very reproducible. Usually we set -Xms and -Xmx the same to avoid the heap expansion and shrinkage. But maybe it does not apply to your case. The humongous objects are allocated in batch at some point of the log. With -Xms -Xmx the same, g1 can not find contiguous space for the objects and has to do a full gc. But when -Xms and -Xmx is different, g1 has more head room for expansion to handle those humongous objects. My first impression is G1ReservePercent should help if we increase from default(10). But I am not sure due to the following: The 1st Full gc happened due to humongous allocation, g1 can not find 7 consecutive regions to satisfy that allocation. If G1 leaves G1ReservePercent not touched, then it should be able to find 7 regions. In other words, G1ReservePercent=10 should be enough, unless the reserved is not kept in chunk. > > My main concern is that IF full gc can clean up the memory, there > should be a mechanic that does just the same as full gc but without > blocking for long time...concurrent full gc, do the 30-60 second > operation for example 10% overhead until whole full gc is done (that > would take 30-60/10% = 300-600 seconds). The reason for the 1st Full gc in 08-15 log: "2014-08-15T10:25:10.637+0300: 112485.906: [Full GC 20G->15G(31G), 58.5538840 secs] [Eden: 0.0B(1984.0M)->0.0B(320.0M) Survivors: 192.0M->0.0B Heap: 20.9G(31.5G)->15.9G(31.5G)]" G1 tries to meet the humongous allocation requests, but could not find continuous empty regions. Note that the heap usage is only 20.9G. But there is no consecutive regions to hold 109816768 bytes. The rest of the Full gc happened due to 'to-space exhausted'. It could be the heap usage is that high. Note after the 2nd full gc, the heap usage is 27g, and the young gcs before that can not clean at all. Another reason for full gc can clean more, is classes are not unloaded till a full gc. This is fixed in later jdk8 and jdk9 versions. > > And if this is not feasible at the moment...what can we tune to > mitigate the peaking in the garbage accumulation and thus avoid the > full gc's. I am still tempted to try lower young gen size. With that, you have more frequent young gc, more chance for mixed gc to kick in. You mentioned you can not lower MaxNewSizePercent. With lower MaxNewSizePercent and AlwaysPreTouch, you are still seeing high system activities? > > Help =) > > ** > Martin > > > 2014-08-19 19:56 GMT+03:00 Yu Zhang >: > > Martin, > > Comparing 2 logs 08-15 vs 08-11, the allocation pattern seems > changed. In 08-15, there are 4 requests allocation 109M > ' 112484.582: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: humongous allocation request failed, allocation request: > 109816768 bytes]' > But in 08-11, the max allocation request is 48M > '117595.621: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: humongous allocation request failed, allocation request: > 47906832 bytes]' > > Does the application change the allocation when performance changes? > > > Thanks, > Jenny > > > On 8/15/2014 3:20 AM, Martin Makundi wrote: > > Hi! > > Here is our latest logs with bit more heap and suggested > parameters. > We tried eden max 85% with this run because 30% was unsuccessful > earlier. > > Here is the log:http://81.22.250.165/log/gc-16m-2014-08-15.log > > It has a couple of full gc hits during a busy day, any new ideas? > > ** > Martin > > > 2014-08-13 9:39 GMT+03:00 Martin > Makundi >: > > Thanks. At the end, the system cpu is very high. I > guess there are page > faults due to the heap expansion around timestamp > 12002.692. Is the memory > tight on your system? > > For performance sake we use compressedoops which limits > the memory upper > bound, this means we could go to bit under 32g, 32255M. > Going above > compressedoops will increase memory footprint and slow > down processing so we > would prefer just tuning the gc while within compressedoops. > > can you afford to start with -Xms30g -Xmx30g > -XX:+AlwaysPreTouch? > > Thanks, I'll try -Xms32255M -Xmx32255M -XX:+AlwaysPreTouch > > ** > Martin > > Thanks, > Jenny > > On 8/11/2014 11:29 PM, Martin Makundi wrote: > > Hi! > > I tried the new parameters: > > Based on this one, can you do one with > -XX:G1MaxNewSizePercent=30 > -XX:InitiatingHeapOccupancyPercent=20 added? > > This seems to hang the whole system.... we have lots > of mostly short lived > (ehcache timeToIdleSeconds="900") large java object > trees 1M-10M each (data > reports loaded into cache). > > Maybe eden should be even bigger instead of smaller? > > Here is the log from today, it hung up quite early, I > suspect the gc: > http://81.22.250.165/log/gc-16m-2014-08-12.log > > The process ate most of the cpu cacacity and we had to > kill it and restart > without -XX:G1MaxNewSizePercent=30. > > What you suggest? > > ** > Martin > > The reason for G1MaxNewSizePercent(default=60) is > to set an upper limit > to Eden size. It seems the Eden size grows to 17g > before Full gc, then a > bunch of humongous allocation happened, and there > is not enough old gen. > > The following log entry seems not right: The Eden > Size is over 60% of the > heap. > "2014-08-11T11:13:05.487+0300: 193238.308: [GC > pause (young) > (initial-mark) 193238.308: [G1Ergonomics (CSet > Construction) start choosing > CSet, _pending_cards: 769041, predicted base time: > 673.25 ms, remaining > time: 326.75 ms, target pause time: 1000.00 ms] > 193238.308: [G1Ergonomics > (CSet Construction) add young regions to CSet, > eden: 1 regions, survivors: > 21 regions, predicted young region time: 145.63 > ms] 193238.308: > [G1Ergonomics (CSet Construction) finish choosing > CSet, eden: 1 regions, > survivors: 21 regions, old: 0 regions, predicted > pause time: 818.88 ms, > target pause time: 1000.00 ms], 0.7559550 secs] > [Parallel Time: 563.9 ms, > GC Workers: 13] [GC Worker Start (ms): Min: > 193238308.1, Avg: > 193238318.0, Max: 193238347.6, Diff: 39.5] [Ext > Root Scanning (ms): > Min: 0.0, Avg: 13.0, Max: 35.8, Diff: 35.8, Sum: > 168.4] [Update RS > (ms): Min: 399.2, Avg: 416.8, Max: 442.8, Diff: > 43.6, Sum: 5418.0] > [Processed Buffers: Min: 162, Avg: 232.0, Max: > 326, Diff: 164, Sum: 3016] > [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: > 0.0, Sum: 0.1] > [Object Copy (ms): Min: 79.9, Avg: 104.8, Max: > 152.4, Diff: 72.5, Sum: > 1363.0] [Termination (ms): Min: 0.0, Avg: > 19.1, Max: 27.3, Diff: 27.3, > Sum: 248.9] [GC Worker Other (ms): Min: 0.0, > Avg: 0.0, Max: 0.0, Diff: > 0.0, Sum: 0.3] [GC Worker Total (ms): Min: > 524.1, Avg: 553.8, Max: > 563.7, Diff: 39.6, Sum: 7198.8] [GC Worker > End (ms): Min: 193238871.7, > Avg: 193238871.8, Max: 193238871.8, Diff: 0.1] > [Code Root Fixup: 0.0 ms] > [Clear CT: 0.3 ms] > [Other: 191.7 ms] > [Choose CSet: 0.0 ms] [Ref Proc: 190.1 > ms] [Ref Enq: 0.3 > ms] [Free CSet: 0.2 ms] > [Eden: 16.0M(2464.0M)->0.0B(22.9G) Survivors: > 336.0M->240.0M Heap: > 14.1G(28.7G)->14.1G(28.7G)] > [Times: user=8.45 sys=0.04, real=0.75 secs]" > > The reason for increasing > InitiatingHeapOccupancyPercent to 20 from 10 is > we are wasting some concurrent cycles. > > We will see how this goes. We might increase > G1ReservePercent to handle > this kind of allocation if it is not enough. > > Thanks, > Jenny > > Thanks, > Jenny > > On 8/11/2014 10:46 AM, Martin Makundi wrote: > > Hi! > > Here is our latest log with one Full GC @ > 2014-08-11T11:20:02 which is > caused by heap full and allocation request: 144 bytes. > > http://81.22.250.165/log/gc-16m-2014-08-11.log > > Any ideas how to mitigate this kind of situation? > The Full GC makes quite > a difference to the situation but causes a painful > pause also. > > ** > Martin > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.makundi at koodaripalvelut.com Fri Aug 22 02:12:22 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Fri, 22 Aug 2014 05:12:22 +0300 Subject: G1gc compaction algorithm In-Reply-To: <53F67EA1.3080406@oracle.com> References: <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> <53E9A36E.80600@oracle.com> <53F381BE.70803@oracle.com> <53F67EA1.3080406@oracle.com> Message-ID: > > > I suspect the application does not do the humongous allocations, I > suspect it's the gc itself that does these. We have an allocationrecorder > that never sees these humongous allocations within the application > itself...assuming they are in a form that the allocation hook can detect. > > It is unlikely g1 is doing this. Is it possible that the one you recorded > did not have those requests? > It is possible that the allocations are so fragmented that the jvm allocation recorder (https://code.google.com/p/java-allocation-instrumenter/) does not see them as a humongous object. > > From within the application we have lots of objects that are kept in > ehcache, so ehcache manages the usage. I am not familiar with ehcache > internals but I don't think it uses humongous objects in any way. > > The max allocation request varies on the objects that are loaded, it is > possible that some request is 48m so yes it can vary depending on who logs > in and what he/she does... not very reproducible. > > Usually we set -Xms and -Xmx the same to avoid the heap expansion and > shrinkage. But maybe it does not apply to your case. The humongous > objects are allocated in batch at some point of the log. With -Xms -Xmx > the same, g1 can not find contiguous space for the objects and has to do a > full gc. But when -Xms and -Xmx is different, g1 has more head room for > expansion to handle those humongous objects. > Ok, will try this -Xms5G -Xmx30G. Is there some option to make jvm shrink/release memory usage more aggressively? Will try options -XX:MaxHeapFreeRatio=35 -XX:MinHeapFreeRatio=10 though I am not sure if they have effect only after full gc. > My first impression is G1ReservePercent should help if we increase from > default(10). But I am not sure due to the following: The 1st Full gc > happened due to humongous allocation, g1 can not find 7 consecutive regions > to satisfy that allocation. If G1 leaves G1ReservePercent not touched, > then it should be able to find 7 regions. In other words, > G1ReservePercent=10 should be enough, unless the reserved is not kept in > chunk. > What does G1ReservePercent affect? Is it reducing fragmentation i.e., whenever new allocations are made they are attempted below the G1ReservePercent or is it a hard limit for the available memory? I.e., how is it different from simply reducing Xmx? > My main concern is that IF full gc can clean up the memory, there should > be a mechanic that does just the same as full gc but without blocking for > long time...concurrent full gc, do the 30-60 second operation for example > 10% overhead until whole full gc is done (that would take 30-60/10% = > 300-600 seconds). > > The reason for the 1st Full gc in 08-15 log: > "2014-08-15T10:25:10.637+0300: 112485.906: [Full GC 20G->15G(31G), > 58.5538840 secs] > [Eden: 0.0B(1984.0M)->0.0B(320.0M) Survivors: 192.0M->0.0B Heap: > 20.9G(31.5G)->15.9G(31.5G)]" > G1 tries to meet the humongous allocation requests, but could not find > continuous empty regions. Note that the heap usage is only 20.9G. But > there is no consecutive regions to hold 109816768 bytes. > > The rest of the Full gc happened due to 'to-space exhausted'. It could be > the heap usage is that high. Note after the 2nd full gc, the heap usage is > 27g, and the young gcs before that can not clean at all. > > Another reason for full gc can clean more, is classes are not unloaded > till a full gc. This is fixed in later jdk8 and jdk9 versions. > Class unloading is disabled in our setup so this should not affect. I still think incremental full gc should be happening concurrently all the time instead of an intermittent long pause ( http://www.azulsystems.com/technology/c4-garbage-collector). > > And if this is not feasible at the moment...what can we tune to mitigate > the peaking in the garbage accumulation and thus avoid the full gc's. > > I am still tempted to try lower young gen size. With that, you have more > frequent young gc, more chance for mixed gc to kick in. You mentioned you > can not lower MaxNewSizePercent. With lower MaxNewSizePercent and > AlwaysPreTouch, you are still seeing high system activities? > Ok, I haven't tried them together.. will try that today. ** Martin > > > > Help =) > > ** > Martin > > > 2014-08-19 19:56 GMT+03:00 Yu Zhang : > >> Martin, >> >> Comparing 2 logs 08-15 vs 08-11, the allocation pattern seems changed. >> In 08-15, there are 4 requests allocation 109M >> ' 112484.582: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: >> humongous allocation request failed, allocation request: 109816768 bytes]' >> But in 08-11, the max allocation request is 48M >> '117595.621: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: >> humongous allocation request failed, allocation request: 47906832 bytes]' >> >> Does the application change the allocation when performance changes? >> >> >> Thanks, >> Jenny >> >> >> On 8/15/2014 3:20 AM, Martin Makundi wrote: >> >>> Hi! >>> >>> Here is our latest logs with bit more heap and suggested parameters. >>> We tried eden max 85% with this run because 30% was unsuccessful >>> earlier. >>> >>> Here is the log:http://81.22.250.165/log/gc-16m-2014-08-15.log >>> >>> It has a couple of full gc hits during a busy day, any new ideas? >>> >>> ** >>> Martin >>> >>> >>> 2014-08-13 9:39 GMT+03:00 Martin Makundi< >>> martin.makundi at koodaripalvelut.com>: >>> >>>> Thanks. At the end, the system cpu is very high. I guess there are >>>>> page >>>>> faults due to the heap expansion around timestamp 12002.692. Is the >>>>> memory >>>>> tight on your system? >>>>> >>>> For performance sake we use compressedoops which limits the memory upper >>>> bound, this means we could go to bit under 32g, 32255M. Going above >>>> compressedoops will increase memory footprint and slow down processing >>>> so we >>>> would prefer just tuning the gc while within compressedoops. >>>> >>>> can you afford to start with -Xms30g -Xmx30g -XX:+AlwaysPreTouch? >>>>> >>>> Thanks, I'll try -Xms32255M -Xmx32255M -XX:+AlwaysPreTouch >>>> >>>> ** >>>> Martin >>>> >>>> Thanks, >>>>> Jenny >>>>> >>>>> On 8/11/2014 11:29 PM, Martin Makundi wrote: >>>>> >>>>> Hi! >>>>> >>>>> I tried the new parameters: >>>>> >>>>> Based on this one, can you do one with -XX:G1MaxNewSizePercent=30 >>>>>> -XX:InitiatingHeapOccupancyPercent=20 added? >>>>>> >>>>> This seems to hang the whole system.... we have lots of mostly short >>>>> lived >>>>> (ehcache timeToIdleSeconds="900") large java object trees 1M-10M each >>>>> (data >>>>> reports loaded into cache). >>>>> >>>>> Maybe eden should be even bigger instead of smaller? >>>>> >>>>> Here is the log from today, it hung up quite early, I suspect the gc: >>>>> http://81.22.250.165/log/gc-16m-2014-08-12.log >>>>> >>>>> The process ate most of the cpu cacacity and we had to kill it and >>>>> restart >>>>> without -XX:G1MaxNewSizePercent=30. >>>>> >>>>> What you suggest? >>>>> >>>>> ** >>>>> Martin >>>>> >>>>> The reason for G1MaxNewSizePercent(default=60) is to set an upper >>>>>> limit >>>>>> to Eden size. It seems the Eden size grows to 17g before Full gc, >>>>>> then a >>>>>> bunch of humongous allocation happened, and there is not enough old >>>>>> gen. >>>>>> >>>>>> The following log entry seems not right: The Eden Size is over 60% of >>>>>> the >>>>>> heap. >>>>>> "2014-08-11T11:13:05.487+0300: 193238.308: [GC pause (young) >>>>>> (initial-mark) 193238.308: [G1Ergonomics (CSet Construction) start >>>>>> choosing >>>>>> CSet, _pending_cards: 769041, predicted base time: 673.25 ms, >>>>>> remaining >>>>>> time: 326.75 ms, target pause time: 1000.00 ms] 193238.308: >>>>>> [G1Ergonomics >>>>>> (CSet Construction) add young regions to CSet, eden: 1 regions, >>>>>> survivors: >>>>>> 21 regions, predicted young region time: 145.63 ms] 193238.308: >>>>>> [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1 >>>>>> regions, >>>>>> survivors: 21 regions, old: 0 regions, predicted pause time: 818.88 >>>>>> ms, >>>>>> target pause time: 1000.00 ms], 0.7559550 secs] [Parallel Time: >>>>>> 563.9 ms, >>>>>> GC Workers: 13] [GC Worker Start (ms): Min: 193238308.1, Avg: >>>>>> 193238318.0, Max: 193238347.6, Diff: 39.5] [Ext Root Scanning >>>>>> (ms): >>>>>> Min: 0.0, Avg: 13.0, Max: 35.8, Diff: 35.8, Sum: 168.4] [Update >>>>>> RS >>>>>> (ms): Min: 399.2, Avg: 416.8, Max: 442.8, Diff: 43.6, Sum: 5418.0] >>>>>> [Processed Buffers: Min: 162, Avg: 232.0, Max: 326, Diff: 164, Sum: >>>>>> 3016] >>>>>> [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] >>>>>> [Object Copy (ms): Min: 79.9, Avg: 104.8, Max: 152.4, Diff: 72.5, Sum: >>>>>> 1363.0] [Termination (ms): Min: 0.0, Avg: 19.1, Max: 27.3, Diff: >>>>>> 27.3, >>>>>> Sum: 248.9] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, >>>>>> Diff: >>>>>> 0.0, Sum: 0.3] [GC Worker Total (ms): Min: 524.1, Avg: 553.8, >>>>>> Max: >>>>>> 563.7, Diff: 39.6, Sum: 7198.8] [GC Worker End (ms): Min: >>>>>> 193238871.7, >>>>>> Avg: 193238871.8, Max: 193238871.8, Diff: 0.1] >>>>>> [Code Root Fixup: 0.0 ms] >>>>>> [Clear CT: 0.3 ms] >>>>>> [Other: 191.7 ms] >>>>>> [Choose CSet: 0.0 ms] [Ref Proc: 190.1 ms] [Ref Enq: >>>>>> 0.3 >>>>>> ms] [Free CSet: 0.2 ms] >>>>>> [Eden: 16.0M(2464.0M)->0.0B(22.9G) Survivors: 336.0M->240.0M Heap: >>>>>> 14.1G(28.7G)->14.1G(28.7G)] >>>>>> [Times: user=8.45 sys=0.04, real=0.75 secs]" >>>>>> >>>>>> The reason for increasing InitiatingHeapOccupancyPercent to 20 from >>>>>> 10 is >>>>>> we are wasting some concurrent cycles. >>>>>> >>>>>> We will see how this goes. We might increase G1ReservePercent to >>>>>> handle >>>>>> this kind of allocation if it is not enough. >>>>>> >>>>>> Thanks, >>>>>> Jenny >>>>>> >>>>>> Thanks, >>>>>> Jenny >>>>>> >>>>>> On 8/11/2014 10:46 AM, Martin Makundi wrote: >>>>>> >>>>>> Hi! >>>>>> >>>>>> Here is our latest log with one Full GC @ 2014-08-11T11:20:02 which is >>>>>> caused by heap full and allocation request: 144 bytes. >>>>>> >>>>>> http://81.22.250.165/log/gc-16m-2014-08-11.log >>>>>> >>>>>> Any ideas how to mitigate this kind of situation? The Full GC makes >>>>>> quite >>>>>> a difference to the situation but causes a painful pause also. >>>>>> >>>>>> ** >>>>>> Martin >>>>>> >>>>>> >>>>>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.makundi at koodaripalvelut.com Fri Aug 22 15:26:05 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Fri, 22 Aug 2014 18:26:05 +0300 Subject: G1gc compaction algorithm In-Reply-To: References: <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> <53E9A36E.80600@oracle.com> <53F381BE.70803@oracle.com> <53F67EA1.3080406@oracle.com> Message-ID: > > >> Another reason for full gc can clean more, is classes are not unloaded >> till a full gc. This is fixed in later jdk8 and jdk9 versions. >> > > Class unloading is disabled in our setup so this should not affect. I > still think incremental full gc should be happening concurrently all the > time instead of an intermittent long pause ( > http://www.azulsystems.com/technology/c4-garbage-collector). > > And here's today's log, couple of Full GC:s: http://81.22.250.165/log/gc-2014-08-22.log ** Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Mon Aug 25 09:06:17 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 25 Aug 2014 11:06:17 +0200 Subject: G1gc compaction algorithm In-Reply-To: References: <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> <53E9A36E.80600@oracle.com> <53F381BE.70803@oracle.com> <53F67EA1.3080406@oracle.com> Message-ID: <1408957577.2684.50.camel@cirrus> Hi all, On Fri, 2014-08-22 at 05:12 +0300, Martin Makundi wrote: > > I suspect the application does not do the humongous > > allocations, I suspect it's the gc itself that does these. > > We have an allocationrecorder that never sees these > > humongous allocations within the application > > itself...assuming they are in a form that the allocation > > hook can detect. > > It is unlikely g1 is doing this. Is it possible that the one > you recorded did not have those requests? The collector is never a source for large object allocations in the Java heap (unless you enable some testing/debug options). > > From within the application we have lots of objects that are > > kept in ehcache, so ehcache manages the usage. I am not > > familiar with ehcache internals but I don't think it uses > > humongous objects in any way. We have never had an issue about the GC misreporting large object allocations. The likelihood for that to be a problem is very low. > Ok, will try this -Xms5G -Xmx30G. Is there some option to make jvm > shrink/release memory usage more aggressively? Will try > options -XX:MaxHeapFreeRatio=35 -XX:MinHeapFreeRatio=10 though I am > not sure if they have effect only after full gc. > > My first impression is G1ReservePercent should help if we > increase from default(10). But I am not sure due to the > following: The 1st Full gc happened due to humongous > allocation, g1 can not find 7 consecutive regions to satisfy > that allocation. If G1 leaves G1ReservePercent not touched, > then it should be able to find 7 regions. In other words, > G1ReservePercent=10 should be enough, unless the reserved is > not kept in chunk. > > What does G1ReservePercent affect? Is it reducing fragmentation i.e., > whenever new allocations are made they are attempted below the > G1ReservePercent or is it a hard limit for the available memory? I.e., > how is it different from simply reducing Xmx? The allocation reserve is some memory kept back to be used by evacuation so that no to-space exhaustion can occur. I.e. the gc is started earlier than required to avoid that it cannot find space for objects that are evacuated. This is not necessarily a contiguous amount of space, so the impact might be minimal here. > > My main concern is that IF full gc can clean up the memory, > > there should be a mechanic that does just the same as full > > gc but without blocking for long time...concurrent full gc, > > do the 30-60 second operation for example 10% overhead until > > whole full gc is done (that would take 30-60/10% = 300-600 > > seconds). > The reason for the 1st Full gc in 08-15 log: > "2014-08-15T10:25:10.637+0300: 112485.906: [Full GC > 20G->15G(31G), 58.5538840 secs] > [Eden: 0.0B(1984.0M)->0.0B(320.0M) Survivors: 192.0M->0.0B > Heap: 20.9G(31.5G)->15.9G(31.5G)]" > G1 tries to meet the humongous allocation requests, but could > not find continuous empty regions. Note that the heap usage > is only 20.9G. But there is no consecutive regions to hold > 109816768 bytes. > > The rest of the Full gc happened due to 'to-space exhausted'. > It could be the heap usage is that high. Note after the 2nd > full gc, the heap usage is 27g, and the young gcs before that > can not clean at all. > > Another reason for full gc can clean more, is classes are not > unloaded till a full gc. This is fixed in later jdk8 and jdk9 > versions. > > > Class unloading is disabled in our setup so this should not affect. I > still think incremental full gc should be happening concurrently all > the time instead of an intermittent long pause > (http://www.azulsystems.com/technology/c4-garbage-collector). The intermittent long pauses are due to g1 not handling the workload you have not well. Please try the recently made available 8u40-b02 EA (https://jdk8.java.net/download.html) which contains a few fixes which will help your application. Some of the tuning suggested so far will have a bad consequences, so it is probably best to more or less start from scratch with g1 options. Thanks, Thomas From srini_was at yahoo.com Mon Aug 25 17:03:08 2014 From: srini_was at yahoo.com (Srini Padman) Date: Mon, 25 Aug 2014 10:03:08 -0700 Subject: Seeking help regarding Full GCs with G1 In-Reply-To: <1407423255.31358.YahooMailNeo@web140701.mail.bf1.yahoo.com> References: <1407365331.22707.YahooMailNeo@web140705.mail.bf1.yahoo.com> <53E378C3.6040503@finkzeit.at> <1407423255.31358.YahooMailNeo@web140701.mail.bf1.yahoo.com> Message-ID: <1408986188.89400.YahooMailNeo@web140703.mail.bf1.yahoo.com> Hello Jenny/Wolfgang, A quick note back to inform you that we started another round of tests with the following parameters, and things are looking very good so far (9 days into the test) - the patterns are very smooth, the footprint stabilized fairly early on, and there have been no Full GCS. -server -Xms4096m -Xmx4096m -Xss512k -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:+DisableExplicitGC -verbose:gc -XX:+PrintAdaptiveSizePolicy -XX:+UnlockExperimentalVMOptions -XX:G1HeapRegionSize=2m -XX:G1MixedGCLiveThresholdPercent=75 -XX:G1HeapWastePercent=5 -XX:InitiatingHeapOccupancyPercent=65 -XX:+UnlockDiagnosticVMOptions -XX:+G1PrintRegionLivenessInfo We are hoping that the same trends will be continue once we go to production with these settings. Thanks very much for the help! Regards, Srini. On Thursday, August 7, 2014 10:54 AM, Srini Padman wrote: Thanks for both the suggestions, Wolfgang. We are going with the following parameters for the next test run: -server -Xms4096m -Xmx4096m -Xss512k -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:+DisableExplicitGC -verbose:gc -XX:+PrintAdaptiveSizePolicy -XX:+UnlockExperimentalVMOptions -XX:G1HeapRegionSize=2m -XX:G1MixedGCLiveThresholdPercent=75 -XX:G1HeapWastePercent=5 -XX:InitiatingHeapOccupancyPercent=65 -XX:+UnlockDiagnosticVMOptions -XX:+G1PrintRegionLivenessInfo The expectations being: 1\ with a total heap size of 4 GB, an application memory footprint of 2.2 GB, and an acceptable heap waste of 5%, the "effective" footprint is 2.2 + 0.05 * 4 GB = 2.4 GB, which is slightly smaller than 60% of the heap 2\ setting the initiating occupancy percent to 65% gives us a little bit of operating room over the effective heap footprint of 60% 3\ even if the heap is "perfectly" fragmented, that is, even if this means *all* regions are 60% occupied, all of them will still be eligible for mixed GCs since the threshold is now 75%. Regards, Srini. On Thursday, August 7, 2014 9:02 AM, Wolfgang Pedot wrote: Hi again, it might also help to to look at how the regions are occupied. G1PrintRegionLivenessInfo will print the regions during the marking-phase so you can see how many are OLD or possibly HUMS and how they are occupied. This information has helped me quite a bit while tweaking G1 and our application for optimal performance. regards Wolfgang Am 07.08.2014 00:48, schrieb Srini Padman: > Hello, > > I am currently evaluating the use of the G1 Collector for our > application, to combat the fragmentation issues we ran into while > using the CMS collector (several cases of failed promotions, followed > by *really* long pauses). However, I am also having trouble with > tuning the G1 collector, and am seeing behavior that I can't fully > understand. I will appreciate any help/insight that you guys can offer. > > What I find puzzling from looking at the G1 GC logs from our tests is > that the concurrent marking phase does not really seem to identify > many old regions to clean up at all, and the heap usage keeps growing. > At some point, there is no further room to expand ("heap expansion > operation failed") and this is followed by a Full GC that lasts about > 10 seconds. But the Full GC actually brings the memory down by almost > 50%, from 4095M to 2235M. > > If the Full GC can collect this much of the heap, I don't fully > understand why the concurrent mark phase does not identify these > (old?) regions for (mixed?) collection subsequently. > > On the assumption that we should let the GC ergonomics do its thing > freely, I initially did not set any parameter other than -Xmx, -Xms, > and the PermGen sizes. I added the G1HeapRegionSize and > G1MixedGCLiveThresholdPercent settings (see below) because, when I saw > the Full GCs with the default settings, I wondered whether we might be > getting into a situation where all (or most?) regions are roughly 65% > live so the concurrent marking phase does not identify them for > collection but a subsequent Full GC is able to. That is, I wondered > whether our application's heap footprint being 65% of the max heap led > to these full GCs coincidentally (since G1MixedGCLiveThresholdPercent > is 65% by default). But I don't know why the same thing happens when I > set G1MixedGCLiveThresholdPercent down to 40% - even if all regions > are 40% full, we will only be at about 1.6 GB, and that is far below > what I think our heap footprint is in the long run (2.2 GB). So I > don't understand how to ensure that old regions are cleaned up > regularly so a Full GC is not required. > > GC Settings in use: > > -server -Xms4096m -Xmx4096m -Xss512k -XX:PermSize=128m > -XX:MaxPermSize=128m -XX:+UseG1GC -XX:+ParallelRefProcEnabled > -XX:+DisableExplicitGC -verbose:gc -XX:+PrintAdaptiveSizePolicy > -XX:+UnlockExperimentalVMOptions -XX:G1HeapRegionSize=2m > -XX:G1MixedGCLiveThresholdPercent=40 > > This is using JRE 1.7.0_55. > > I am including a short(ish) GC log snippet for the time leading up to > the Full GC. I can send the full GC log (about 8 MB, zipped) if necessary. > > Any help will be greatly appreciated! > > Regards, > Srini. > > --------------------------- > > 2014-08-06T04:46:00.067-0700: 1124501.033: [GC > concurrent-root-region-scan-start] > 2014-08-06T04:46:00.081-0700: 1124501.047: [GC > concurrent-root-region-scan-end, 0.0139487 secs] > 2014-08-06T04:46:00.081-0700: 1124501.047: [GC concurrent-mark-start] > 2014-08-06T04:46:10.531-0700: 1124511.514: [GC concurrent-mark-end, > 10.4675249 secs] > 2014-08-06T04:46:10.532-0700: 1124511.515: [GC remark > 2014-08-06T04:46:10.532-0700: 1124511.516: [GC ref-proc, 0.0018819 > secs], 0.0225253 secs] >? [Times: user=0.01 sys=0.00, real=0.02 secs] > 2014-08-06T04:46:10.555-0700: 1124511.539: [GC cleanup > 3922M->3922M(4096M), 0.0098209 secs] >? [Times: user=0.03 sys=0.03, real=0.01 secs] > 2014-08-06T04:46:49.603-0700: 1124550.652: [GC pause (young) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > - age? 1:? ? 1531592 bytes,? ? 1531592 total > - age? 2:? ? 1087648 bytes,? ? 2619240 total > - age? 3:? ? 259480 bytes,? ? 2878720 total > - age? 4:? ? 493976 bytes,? ? 3372696 total > - age? 5:? ? 213472 bytes,? ? 3586168 total > - age? 6:? ? 186104 bytes,? ? 3772272 total > - age? 7:? ? 169832 bytes,? ? 3942104 total > - age? 8:? ? 201968 bytes,? ? 4144072 total > - age? 9:? ? 183752 bytes,? ? 4327824 total > - age? 10:? ? 136480 bytes,? ? 4464304 total > - age? 11:? ? 366208 bytes,? ? 4830512 total > - age? 12:? ? 137296 bytes,? ? 4967808 total > - age? 13:? ? 133592 bytes,? ? 5101400 total > - age? 14:? ? 162232 bytes,? ? 5263632 total > - age? 15:? ? 139984 bytes,? ? 5403616 total >? 1124550.652: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 21647, predicted base time: 37.96 ms, remaining time: > 162.04 ms, target pause time: 200.00 ms] >? 1124550.652: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 95 regions, survivors: 7 regions, predicted young region > time: 4.46 ms] >? 1124550.652: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 95 regions, survivors: 7 regions, old: 0 regions, predicted > pause time: 42.42 ms, target pause time: 200.00 ms] >? 1124550.701: [G1Ergonomics (Concurrent Cycles) do not request > concurrent cycle initiation, reason: still doing mixed collections, > occupancy: 4064280576 bytes, allocation request: 0 bytes, threshold: > 1932735240 bytes (45.00 %), source: end of GC] >? 1124550.701: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: > candidate old regions available, candidate old regions: 285 regions, > reclaimable: 430117688 bytes (10.01 %), threshold: 10.00 %] > , 0.0494015 secs] >? ? [Parallel Time: 43.7 ms, GC Workers: 4] >? ? ? [GC Worker Start (ms): Min: 1124550651.8, Avg: 1124550668.7, > Max: 1124550674.3, Diff: 22.6] >? ? ? [Ext Root Scanning (ms): Min: 0.1, Avg: 6.7, Max: 22.3, Diff: > 22.2, Sum: 26.8] >? ? ? [Update RS (ms): Min: 9.9, Avg: 11.0, Max: 12.3, Diff: 2.5, Sum: > 44.0] >? ? ? ? ? [Processed Buffers: Min: 39, Avg: 40.3, Max: 41, Diff: 2, > Sum: 161] >? ? ? [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2] >? ? ? [Object Copy (ms): Min: 8.6, Avg: 8.9, Max: 9.6, Diff: 1.0, Sum: > 35.6] >? ? ? [Termination (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: > 0.3] >? ? ? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, > Sum: 0.2] >? ? ? [GC Worker Total (ms): Min: 21.1, Avg: 26.8, Max: 43.7, Diff: > 22.6, Sum: 107.1] >? ? ? [GC Worker End (ms): Min: 1124550695.4, Avg: 1124550695.5, Max: > 1124550695.5, Diff: 0.0] >? ? [Code Root Fixup: 0.0 ms] >? ? [Clear CT: 0.1 ms] >? ? [Other: 5.6 ms] >? ? ? [Choose CSet: 0.0 ms] >? ? ? [Ref Proc: 4.5 ms] >? ? ? [Ref Enq: 0.1 ms] >? ? ? [Free CSet: 0.3 ms] >? ? [Eden: 190.0M(190.0M)->0.0B(190.0M) Survivors: 14.0M->14.0M Heap: > 4077.0M(4096.0M)->3887.1M(4096.0M)] >? [Times: user=0.11 sys=0.00, real=0.05 secs] > 2014-08-06T04:47:45.545-0700: 1124606.686: [GC pause (mixed) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > - age? 1:? ? 1323232 bytes,? ? 1323232 total > - age? 2:? ? 716576 bytes,? ? 2039808 total > - age? 3:? ? 1058584 bytes,? ? 3098392 total > - age? 4:? ? 225208 bytes,? ? 3323600 total > - age? 5:? ? 447688 bytes,? ? 3771288 total > - age? 6:? ? 195112 bytes,? ? 3966400 total > - age? 7:? ? 178000 bytes,? ? 4144400 total > - age? 8:? ? 156904 bytes,? ? 4301304 total > - age? 9:? ? 193424 bytes,? ? 4494728 total > - age? 10:? ? 176272 bytes,? ? 4671000 total > - age? 11:? ? 134768 bytes,? ? 4805768 total > - age? 12:? ? 138896 bytes,? ? 4944664 total > - age? 13:? ? 132272 bytes,? ? 5076936 total > - age? 14:? ? 132856 bytes,? ? 5209792 total > - age? 15:? ? 161912 bytes,? ? 5371704 total >? 1124606.686: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 20335, predicted base time: 38.61 ms, remaining time: > 161.39 ms, target pause time: 200.00 ms] >? 1124606.686: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 95 regions, survivors: 7 regions, predicted young region > time: 4.53 ms] >? 1124606.686: [G1Ergonomics (CSet Construction) finish adding old > regions to CSet, reason: reclaimable percentage not over threshold, > old: 1 regions, max: 205 regions, reclaimable: 428818280 bytes (9.98 > %), threshold: 10.00 %] >? 1124606.686: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 95 regions, survivors: 7 regions, old: 1 regions, predicted > pause time: 45.72 ms, target pause time: 200.00 ms] >? 1124606.731: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: region allocation request failed, allocation request: 1048576 > bytes] >? 1124606.731: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 1048576 bytes, attempted expansion amount: 2097152 > bytes] >? 1124606.731: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] >? 1124606.743: [G1Ergonomics (Concurrent Cycles) do not request > concurrent cycle initiation, reason: still doing mixed collections, > occupancy: 4095737856 bytes, allocation request: 0 bytes, threshold: > 1932735240 bytes (45.00 %), source: end of GC] >? 1124606.743: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, > reason: reclaimable percentage not over threshold, candidate old > regions: 284 regions, reclaimable: 428818280 bytes (9.98 %), > threshold: 10.00 %] >? (to-space exhausted), 0.0568178 secs] >? ? [Parallel Time: 40.4 ms, GC Workers: 4] >? ? ? [GC Worker Start (ms): Min: 1124606686.1, Avg: 1124606701.7, > Max: 1124606723.8, Diff: 37.7] >? ? ? [Ext Root Scanning (ms): Min: 0.1, Avg: 6.3, Max: 16.1, Diff: > 16.1, Sum: 25.4] >? ? ? [Update RS (ms): Min: 0.0, Avg: 9.6, Max: 13.3, Diff: 13.3, Sum: > 38.6] >? ? ? ? ? [Processed Buffers: Min: 0, Avg: 37.5, Max: 52, Diff: 52, > Sum: 150] >? ? ? [Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 0.4, Diff: 0.4, Sum: 1.1] >? ? ? [Object Copy (ms): Min: 2.6, Avg: 8.4, Max: 11.1, Diff: 8.5, > Sum: 33.7] >? ? ? [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: > 0.0] >? ? ? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > Sum: 0.1] >? ? ? [GC Worker Total (ms): Min: 2.7, Avg: 24.7, Max: 40.4, Diff: > 37.7, Sum: 98.9] >? ? ? [GC Worker End (ms): Min: 1124606726.5, Avg: 1124606726.5, Max: > 1124606726.5, Diff: 0.0] >? ? [Code Root Fixup: 0.0 ms] >? ? [Clear CT: 0.1 ms] >? ? [Other: 16.3 ms] >? ? ? [Choose CSet: 0.0 ms] >? ? ? [Ref Proc: 7.7 ms] >? ? ? [Ref Enq: 0.2 ms] >? ? ? [Free CSet: 0.3 ms] >? ? [Eden: 190.0M(190.0M)->0.0B(188.0M) Survivors: 14.0M->16.0M Heap: > 4077.1M(4096.0M)->3921.6M(4096.0M)] >? [Times: user=0.11 sys=0.00, real=0.06 secs] > 2014-08-06T04:49:57.698-0700: 1124739.058: [GC pause (young) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) > - age? 1:? ? 1130192 bytes,? ? 1130192 total > - age? 2:? ? 492816 bytes,? ? 1623008 total > - age? 3:? ? 675240 bytes,? ? 2298248 total > - age? 4:? ? 1038536 bytes,? ? 3336784 total > - age? 5:? ? 208048 bytes,? ? 3544832 total > - age? 6:? ? 436520 bytes,? ? 3981352 total > - age? 7:? ? 184528 bytes,? ? 4165880 total > - age? 8:? ? 165376 bytes,? ? 4331256 total > - age? 9:? ? 154872 bytes,? ? 4486128 total > - age? 10:? ? 179016 bytes,? ? 4665144 total > - age? 11:? ? 167760 bytes,? ? 4832904 total > - age? 12:? ? 132056 bytes,? ? 4964960 total > - age? 13:? ? 138736 bytes,? ? 5103696 total > - age? 14:? ? 132080 bytes,? ? 5235776 total > - age? 15:? ? 132856 bytes,? ? 5368632 total >? 1124739.058: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 44501, predicted base time: 51.94 ms, remaining time: > 148.06 ms, target pause time: 200.00 ms] >? 1124739.058: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 87 regions, survivors: 8 regions, predicted young region > time: 4.37 ms] >? 1124739.058: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 87 regions, survivors: 8 regions, old: 0 regions, predicted > pause time: 56.32 ms, target pause time: 200.00 ms] >? 1124739.060: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: region allocation request failed, allocation request: 1048576 > bytes] >? 1124739.060: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 1048576 bytes, attempted expansion amount: 2097152 > bytes] >? 1124739.060: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] >? 1124739.252: [G1Ergonomics (Concurrent Cycles) request concurrent > cycle initiation, reason: occupancy higher than threshold, occupancy: > 4294967296 bytes, allocation request: 0 bytes, threshold: 1932735240 > bytes (45.00 %), source: end of GC] >? (to-space exhausted), 0.1936102 secs] >? ? [Parallel Time: 146.6 ms, GC Workers: 4] >? ? ? [GC Worker Start (ms): Min: 1124739058.5, Avg: 1124739061.6, > Max: 1124739063.0, Diff: 4.4] >? ? ? [Ext Root Scanning (ms): Min: 0.2, Avg: 7.0, Max: 14.3, Diff: > 14.0, Sum: 28.2] >? ? ? [Update RS (ms): Min: 4.8, Avg: 10.7, Max: 17.6, Diff: 12.8, > Sum: 42.7] >? ? ? ? ? [Processed Buffers: Min: 47, Avg: 56.3, Max: 69, Diff: 22, > Sum: 225] >? ? ? [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.2] >? ? ? [Object Copy (ms): Min: 113.1, Avg: 125.6, Max: 137.6, Diff: > 24.5, Sum: 502.5] >? ? ? [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: > 0.2] >? ? ? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > Sum: 0.1] >? ? ? [GC Worker Total (ms): Min: 142.1, Avg: 143.5, Max: 146.5, Diff: > 4.4, Sum: 573.8] >? ? ? [GC Worker End (ms): Min: 1124739205.1, Avg: 1124739205.1, Max: > 1124739205.1, Diff: 0.0] >? ? [Code Root Fixup: 0.0 ms] >? ? [Clear CT: 0.1 ms] >? ? [Other: 46.9 ms] >? ? ? [Choose CSet: 0.0 ms] >? ? ? [Ref Proc: 1.0 ms] >? ? ? [Ref Enq: 0.1 ms] >? ? ? [Free CSet: 0.2 ms] >? ? [Eden: 174.0M(188.0M)->0.0B(204.0M) Survivors: 16.0M->0.0B Heap: > 4095.6M(4096.0M)->4095.6M(4096.0M)] >? [Times: user=0.36 sys=0.00, real=0.19 secs] >? 1124739.259: [G1Ergonomics (Concurrent Cycles) initiate concurrent > cycle, reason: concurrent cycle initiation requested] > 2014-08-06T04:49:57.898-0700: 1124739.259: [GC pause (young) > (initial-mark) > Desired survivor size 13631488 bytes, new threshold 15 (max 15) >? 1124739.259: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 322560, predicted base time: 205.33 ms, remaining > time: 0.00 ms, target pause time: 200.00 ms] >? 1124739.259: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 0 regions, survivors: 0 regions, predicted young region > time: 0.00 ms] >? 1124739.259: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 0 regions, survivors: 0 regions, old: 0 regions, predicted pause > time: 205.33 ms, target pause time: 200.00 ms] > , 0.0347198 secs] >? ? [Parallel Time: 33.1 ms, GC Workers: 4] >? ? ? [GC Worker Start (ms): Min: 1124739259.3, Avg: 1124739259.3, > Max: 1124739259.3, Diff: 0.0] >? ? ? [Ext Root Scanning (ms): Min: 5.5, Avg: 7.7, Max: 11.0, Diff: > 5.4, Sum: 30.6] >? ? ? [Update RS (ms): Min: 18.4, Avg: 19.8, Max: 20.6, Diff: 2.2, > Sum: 79.4] >? ? ? ? ? [Processed Buffers: Min: 293, Avg: 315.3, Max: 350, Diff: 57, > Sum: 1261] >? ? ? [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] >? ? ? [Object Copy (ms): Min: 1.6, Avg: 5.4, Max: 6.9, Diff: 5.3, Sum: > 21.7] >? ? ? [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: > 0.4] >? ? ? [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, > Sum: 0.1] >? ? ? [GC Worker Total (ms): Min: 33.0, Avg: 33.0, Max: 33.1, Diff: > 0.1, Sum: 132.1] >? ? ? [GC Worker End (ms): Min: 1124739292.3, Avg: 1124739292.3, Max: > 1124739292.3, Diff: 0.0] >? ? [Code Root Fixup: 0.0 ms] >? ? [Clear CT: 0.1 ms] >? ? [Other: 1.5 ms] >? ? ? [Choose CSet: 0.0 ms] >? ? ? [Ref Proc: 1.0 ms] >? ? ? [Ref Enq: 0.1 ms] >? ? ? [Free CSet: 0.0 ms] >? ? [Eden: 0.0B(204.0M)->0.0B(204.0M) Survivors: 0.0B->0.0B Heap: > 4095.6M(4096.0M)->4095.6M(4096.0M)] >? [Times: user=0.12 sys=0.00, real=0.04 secs] > 2014-08-06T04:49:57.933-0700: 1124739.294: [GC > concurrent-root-region-scan-start] > 2014-08-06T04:49:57.933-0700: 1124739.294: [GC > concurrent-root-region-scan-end, 0.0000157 secs] > 2014-08-06T04:49:57.933-0700: 1124739.294: [GC concurrent-mark-start] >? 1124739.295: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: allocation request failed, allocation request: 80 bytes] >? 1124739.295: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 2097152 bytes, attempted expansion amount: 2097152 > bytes] >? 1124739.295: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > 2014-08-06T04:49:57.934-0700: 1124739.295: [Full GC > 4095M->2235M(4096M), 10.5341003 secs] >? ? [Eden: 0.0B(204.0M)->0.0B(1436.0M) Survivors: 0.0B->0.0B Heap: > 4095.6M(4096.0M)->2235.4M(4096.0M)] >? [Times: user=13.20 sys=0.03, real=10.52 secs] > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Tue Aug 26 01:48:43 2014 From: yu.zhang at oracle.com (Yu Zhang) Date: Mon, 25 Aug 2014 18:48:43 -0700 Subject: Unexplained long stop the world pauses during concurrent marking step in G1 Collector In-Reply-To: References: Message-ID: <53FBE77B.6060806@oracle.com> Kannan, The concurrent marking is concurrent, meaning it runs concurrently with the application. You may see the time between[GC concurrent-mark-start] and [GC concurrent-mark-stop] very long, maybe even some young gc happened during this time period. This is because the marking threads can be suspended. >*2014-08-07T13:42:39.598-0400: 92192.348: Application time: 9.0448670 seconds** 2014-08-07T13:42:39.601-0400: 92192.351: Total time for which application threads were stopped: 0.0029740 seconds This means the application has run for 9.0448670 seconds. And only being stopped for 0.0029740 seconds. From this log, gc did not stop the application. >Linux "top" shows single CPU at 100% and rest of the CPUs at 0% during the pause. Maybe something in the application or OS is running on 1 cpu and blocking other threads. Thanks, Jenny On 8/18/2014 1:34 PM, Krishnamurthy, Kannan wrote: > Greetings, > > We are experiencing unexplained/unknown long pauses (8 seconds) during > concurrent marking step of G1 collector. > > 2014-08-07T13:42:30.552-0400: 92183.303: [GC > concurrent-root-region-scan-start] > 2014-08-07T13:42:30.555-0400: 92183.305: [GC > concurrent-root-region-scan-end, 0.0025230 secs] > **2014-08-07T13:42:30.555-0400: 92183.305: [GC > concurrent-mark-start]** > **2014-08-07T13:42:39.598-0400: 92192.348: Application time: > 9.0448670 seconds** > 2014-08-07T13:42:39.601-0400: 92192.351: Total time for which > application threads were stopped: 0.0029740 seconds > 2014-08-07T13:42:39.603-0400: 92192.353: [GC pause (G1 Evacuation > Pause) (young) 92192.354: [G1Ergonomics (CSet Construction) start > choosing CSet, _pending_cards: 7980, predicted base time: 28.19 ms, > remaining time: 71.81 ms, target pause time: 100.00 ms > > > `2014-08-07T13:42:30.555-0400: 92183.305` is when the concurrent mark > starts, approximately after 2 seconds of this step the application > starts to pause. However the GC logs claims the application was not > paused during this window. > Linux "top" shows single CPU at 100% and rest of the CPUs at 0% during > the pause. > > Any help in understanding the root cause of this issue is appreciated. > > Our target JVMS: > > java version "1.7.0_04" > Java(TM) SE Runtime Environment (build 1.7.0_04-b20) > Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) > > java version "1.8.0_11" > Java(TM) SE Runtime Environment (build 1.8.0_11-b12) > Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode) > > > Our JVM options : > > -Xms20G -Xmx20G -Xss10M -XX:PermSize=128M -XX:MaxPermSize=128M > -XX:MarkStackSize=16M > -XX:+UnlockDiagnosticVMOptions -XX:+G1PrintRegionLivenessInfo > -XX:+TraceGCTaskThread > -XX:+G1SummarizeConcMark -XX:+G1SummarizeRSetStats > -XX:+G1TraceConcRefinement > -XX:+UseG1GC -XX:MaxGCPauseMillis=100 > -XX:InitiatingHeapOccupancyPercent=65 > -XX:ParallelGCThreads=24 -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy > -XX:+PrintTenuringDistribution > -XX:+PrintGCApplicationStoppedTime > -XX:+PrintGCApplicationConcurrentTime > -Xloggc:/common/logs/ocean-partition-gc.log > > > Thanks and regards, > Kannan > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.makundi at koodaripalvelut.com Tue Aug 26 02:07:25 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Tue, 26 Aug 2014 05:07:25 +0300 Subject: Unexplained long stop the world pauses during concurrent marking step in G1 Collector In-Reply-To: <53FBE77B.6060806@oracle.com> References: <53FBE77B.6060806@oracle.com> Message-ID: Try running command kill -3 to see what's going on with the application threads. ** Martin 2014-08-26 4:48 GMT+03:00 Yu Zhang : > Kannan, > > The concurrent marking is concurrent, meaning it runs concurrently with > the application. > You may see the time between [GC concurrent-mark-start] and > [GC concurrent-mark-stop] very long, maybe even some young gc happened > during this time period. This is because the marking threads can be > suspended. > > >*2014-08-07T13:42:39.598-0400: 92192.348: Application time: 9.0448670 > seconds** > 2014-08-07T13:42:39.601-0400: 92192.351: Total time for which > application threads were stopped: 0.0029740 seconds > This means the application has run for 9.0448670 seconds. And only being > stopped for > 0.0029740 seconds. > > From this log, gc did not stop the application. > > >Linux "top" shows single CPU at 100% and rest of the CPUs at 0% during > the pause. > Maybe something in the application or OS is running on 1 cpu and blocking > other threads. > > Thanks, > Jenny > > On 8/18/2014 1:34 PM, Krishnamurthy, Kannan wrote: > > Greetings, > > We are experiencing unexplained/unknown long pauses (8 seconds) during > concurrent marking step of G1 collector. > > 2014-08-07T13:42:30.552-0400: 92183.303: [GC > concurrent-root-region-scan-start] > 2014-08-07T13:42:30.555-0400: 92183.305: [GC > concurrent-root-region-scan-end, 0.0025230 secs] > **2014-08-07T13:42:30.555-0400: 92183.305: [GC concurrent-mark-start]** > **2014-08-07T13:42:39.598-0400: 92192.348: Application time: 9.0448670 > seconds** > 2014-08-07T13:42:39.601-0400: 92192.351: Total time for which > application threads were stopped: 0.0029740 seconds > 2014-08-07T13:42:39.603-0400: 92192.353: [GC pause (G1 Evacuation > Pause) (young) 92192.354: [G1Ergonomics (CSet Construction) start choosing > CSet, _pending_cards: 7980, predicted base time: 28.19 ms, remaining time: > 71.81 ms, target pause time: 100.00 ms > > > `2014-08-07T13:42:30.555-0400: 92183.305` is when the concurrent mark > starts, approximately after 2 seconds of this step the application starts > to pause. However the GC logs claims the application was not paused during > this window. > Linux "top" shows single CPU at 100% and rest of the CPUs at 0% during > the pause. > > Any help in understanding the root cause of this issue is appreciated. > > Our target JVMS: > > java version "1.7.0_04" > Java(TM) SE Runtime Environment (build 1.7.0_04-b20) > Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) > > java version "1.8.0_11" > Java(TM) SE Runtime Environment (build 1.8.0_11-b12) > Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode) > > > Our JVM options : > > -Xms20G -Xmx20G -Xss10M -XX:PermSize=128M -XX:MaxPermSize=128M > -XX:MarkStackSize=16M > -XX:+UnlockDiagnosticVMOptions -XX:+G1PrintRegionLivenessInfo > -XX:+TraceGCTaskThread > -XX:+G1SummarizeConcMark -XX:+G1SummarizeRSetStats > -XX:+G1TraceConcRefinement > -XX:+UseG1GC -XX:MaxGCPauseMillis=100 > -XX:InitiatingHeapOccupancyPercent=65 > -XX:ParallelGCThreads=24 -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy > -XX:+PrintTenuringDistribution > -XX:+PrintGCApplicationStoppedTime > -XX:+PrintGCApplicationConcurrentTime > -Xloggc:/common/logs/ocean-partition-gc.log > > > Thanks and regards, > Kannan > > > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kannan.Krishnamurthy at contractor.cengage.com Tue Aug 26 18:09:32 2014 From: Kannan.Krishnamurthy at contractor.cengage.com (Krishnamurthy, Kannan) Date: Tue, 26 Aug 2014 18:09:32 +0000 Subject: Unexplained long stop the world pauses during concurrent marking step in G1 Collector In-Reply-To: References: <53FBE77B.6060806@oracle.com>, Message-ID: Thanks for the responses. I agree with Jenny that the gc logs doesn't seem to indicate that the pause was from a GC event. Profiling the application and trying to get some native stack traces/ core dumps. ________________________________ From: Martin Makundi [martin.makundi at koodaripalvelut.com] Sent: Monday, August 25, 2014 10:07 PM To: Yu Zhang Cc: Krishnamurthy, Kannan; hotspot-gc-use at openjdk.java.net; kndkannan at gmail.com Subject: Re: Unexplained long stop the world pauses during concurrent marking step in G1 Collector Try running command kill -3 to see what's going on with the application threads. ** Martin 2014-08-26 4:48 GMT+03:00 Yu Zhang >: Kannan, The concurrent marking is concurrent, meaning it runs concurrently with the application. You may see the time between [GC concurrent-mark-start] and [GC concurrent-mark-stop] very long, maybe even some young gc happened during this time period. This is because the marking threads can be suspended. >*2014-08-07T13:42:39.598-0400: 92192.348: Application time: 9.0448670 seconds** 2014-08-07T13:42:39.601-0400: 92192.351: Total time for which application threads were stopped: 0.0029740 seconds This means the application has run for 9.0448670 seconds. And only being stopped for 0.0029740 seconds. >From this log, gc did not stop the application. >Linux "top" shows single CPU at 100% and rest of the CPUs at 0% during the pause. Maybe something in the application or OS is running on 1 cpu and blocking other threads. Thanks, Jenny On 8/18/2014 1:34 PM, Krishnamurthy, Kannan wrote: Greetings, We are experiencing unexplained/unknown long pauses (8 seconds) during concurrent marking step of G1 collector. 2014-08-07T13:42:30.552-0400: 92183.303: [GC concurrent-root-region-scan-start] 2014-08-07T13:42:30.555-0400: 92183.305: [GC concurrent-root-region-scan-end, 0.0025230 secs] **2014-08-07T13:42:30.555-0400: 92183.305: [GC concurrent-mark-start]** **2014-08-07T13:42:39.598-0400: 92192.348: Application time: 9.0448670 seconds** 2014-08-07T13:42:39.601-0400: 92192.351: Total time for which application threads were stopped: 0.0029740 seconds 2014-08-07T13:42:39.603-0400: 92192.353: [GC pause (G1 Evacuation Pause) (young) 92192.354: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 7980, predicted base time: 28.19 ms, remaining time: 71.81 ms, target pause time: 100.00 ms `2014-08-07T13:42:30.555-0400: 92183.305` is when the concurrent mark starts, approximately after 2 seconds of this step the application starts to pause. However the GC logs claims the application was not paused during this window. Linux "top" shows single CPU at 100% and rest of the CPUs at 0% during the pause. Any help in understanding the root cause of this issue is appreciated. Our target JVMS: java version "1.7.0_04" Java(TM) SE Runtime Environment (build 1.7.0_04-b20) Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) java version "1.8.0_11" Java(TM) SE Runtime Environment (build 1.8.0_11-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode) Our JVM options : -Xms20G -Xmx20G -Xss10M -XX:PermSize=128M -XX:MaxPermSize=128M -XX:MarkStackSize=16M -XX:+UnlockDiagnosticVMOptions -XX:+G1PrintRegionLivenessInfo -XX:+TraceGCTaskThread -XX:+G1SummarizeConcMark -XX:+G1SummarizeRSetStats -XX:+G1TraceConcRefinement -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=65 -XX:ParallelGCThreads=24 -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/common/logs/ocean-partition-gc.log Thanks and regards, Kannan _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.makundi at koodaripalvelut.com Wed Aug 27 03:38:47 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Wed, 27 Aug 2014 06:38:47 +0300 Subject: G1gc compaction algorithm In-Reply-To: References: <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> <53E9A36E.80600@oracle.com> <53F381BE.70803@oracle.com> Message-ID: Good article: http://www.azulsystems.com/sites/default/files/images/c4_paper_acm.pdf 2014-08-21 17:21 GMT+03:00 Martin Makundi < martin.makundi at koodaripalvelut.com>: > Hi! > > Why doesn't the g1gc operate like azul c4 is described to operate: > > http://www.azulsystems.com/technology/c4-garbage-collector > > "fully concurrent, so it never falls back to a stop-the-world compaction" > > ? > > ** > Martin > > > 2014-08-20 11:35 GMT+03:00 Martin Makundi < > martin.makundi at koodaripalvelut.com>: > > Hi! >> >> Here is one more recent log: >> >> http://81.22.250.165/log/gc-2014-08-20.log >> >> with a couple of Full GC's. >> >> ** >> Martin >> >> >> 2014-08-19 20:11 GMT+03:00 Martin Makundi < >> martin.makundi at koodaripalvelut.com>: >> >> Hi! >>> >>> I suspect the application does not do the humongous allocations, I >>> suspect it's the gc itself that does these. We have an allocationrecorder >>> that never sees these humongous allocations within the application >>> itself...assuming they are in a form that the allocation hook can detect. >>> >>> From within the application we have lots of objects that are kept in >>> ehcache, so ehcache manages the usage. I am not familiar with ehcache >>> internals but I don't think it uses humongous objects in any way. >>> >>> The max allocation request varies on the objects that are loaded, it is >>> possible that some request is 48m so yes it can vary depending on who logs >>> in and what he/she does... not very reproducible. >>> >>> My main concern is that IF full gc can clean up the memory, there should >>> be a mechanic that does just the same as full gc but without blocking for >>> long time...concurrent full gc, do the 30-60 second operation for example >>> 10% overhead until whole full gc is done (that would take 30-60/10% = >>> 300-600 seconds). >>> >>> And if this is not feasible at the moment...what can we tune to mitigate >>> the peaking in the garbage accumulation and thus avoid the full gc's. >>> >>> Help =) >>> >>> ** >>> Martin >>> >>> >>> 2014-08-19 19:56 GMT+03:00 Yu Zhang : >>> >>> Martin, >>>> >>>> Comparing 2 logs 08-15 vs 08-11, the allocation pattern seems changed. >>>> In 08-15, there are 4 requests allocation 109M >>>> ' 112484.582: [G1Ergonomics (Heap Sizing) attempt heap expansion, >>>> reason: humongous allocation request failed, allocation request: 109816768 >>>> bytes]' >>>> But in 08-11, the max allocation request is 48M >>>> '117595.621: [G1Ergonomics (Heap Sizing) attempt heap expansion, >>>> reason: humongous allocation request failed, allocation request: 47906832 >>>> bytes]' >>>> >>>> Does the application change the allocation when performance changes? >>>> >>>> >>>> Thanks, >>>> Jenny >>>> >>>> >>>> On 8/15/2014 3:20 AM, Martin Makundi wrote: >>>> >>>>> Hi! >>>>> >>>>> Here is our latest logs with bit more heap and suggested parameters. >>>>> We tried eden max 85% with this run because 30% was unsuccessful >>>>> earlier. >>>>> >>>>> Here is the log:http://81.22.250.165/log/gc-16m-2014-08-15.log >>>>> >>>>> It has a couple of full gc hits during a busy day, any new ideas? >>>>> >>>>> ** >>>>> Martin >>>>> >>>>> >>>>> 2014-08-13 9:39 GMT+03:00 Martin Makundi>>>> koodaripalvelut.com>: >>>>> >>>>>> Thanks. At the end, the system cpu is very high. I guess there are >>>>>>> page >>>>>>> faults due to the heap expansion around timestamp 12002.692. Is the >>>>>>> memory >>>>>>> tight on your system? >>>>>>> >>>>>> For performance sake we use compressedoops which limits the memory >>>>>> upper >>>>>> bound, this means we could go to bit under 32g, 32255M. Going above >>>>>> compressedoops will increase memory footprint and slow down >>>>>> processing so we >>>>>> would prefer just tuning the gc while within compressedoops. >>>>>> >>>>>> can you afford to start with -Xms30g -Xmx30g -XX:+AlwaysPreTouch? >>>>>>> >>>>>> Thanks, I'll try -Xms32255M -Xmx32255M -XX:+AlwaysPreTouch >>>>>> >>>>>> ** >>>>>> Martin >>>>>> >>>>>> Thanks, >>>>>>> Jenny >>>>>>> >>>>>>> On 8/11/2014 11:29 PM, Martin Makundi wrote: >>>>>>> >>>>>>> Hi! >>>>>>> >>>>>>> I tried the new parameters: >>>>>>> >>>>>>> Based on this one, can you do one with -XX:G1MaxNewSizePercent=30 >>>>>>>> -XX:InitiatingHeapOccupancyPercent=20 added? >>>>>>>> >>>>>>> This seems to hang the whole system.... we have lots of mostly short >>>>>>> lived >>>>>>> (ehcache timeToIdleSeconds="900") large java object trees 1M-10M >>>>>>> each (data >>>>>>> reports loaded into cache). >>>>>>> >>>>>>> Maybe eden should be even bigger instead of smaller? >>>>>>> >>>>>>> Here is the log from today, it hung up quite early, I suspect the gc: >>>>>>> http://81.22.250.165/log/gc-16m-2014-08-12.log >>>>>>> >>>>>>> The process ate most of the cpu cacacity and we had to kill it and >>>>>>> restart >>>>>>> without -XX:G1MaxNewSizePercent=30. >>>>>>> >>>>>>> What you suggest? >>>>>>> >>>>>>> ** >>>>>>> Martin >>>>>>> >>>>>>> The reason for G1MaxNewSizePercent(default=60) is to set an upper >>>>>>>> limit >>>>>>>> to Eden size. It seems the Eden size grows to 17g before Full gc, >>>>>>>> then a >>>>>>>> bunch of humongous allocation happened, and there is not enough old >>>>>>>> gen. >>>>>>>> >>>>>>>> The following log entry seems not right: The Eden Size is over 60% >>>>>>>> of the >>>>>>>> heap. >>>>>>>> "2014-08-11T11:13:05.487+0300: 193238.308: [GC pause (young) >>>>>>>> (initial-mark) 193238.308: [G1Ergonomics (CSet Construction) start >>>>>>>> choosing >>>>>>>> CSet, _pending_cards: 769041, predicted base time: 673.25 ms, >>>>>>>> remaining >>>>>>>> time: 326.75 ms, target pause time: 1000.00 ms] 193238.308: >>>>>>>> [G1Ergonomics >>>>>>>> (CSet Construction) add young regions to CSet, eden: 1 regions, >>>>>>>> survivors: >>>>>>>> 21 regions, predicted young region time: 145.63 ms] 193238.308: >>>>>>>> [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1 >>>>>>>> regions, >>>>>>>> survivors: 21 regions, old: 0 regions, predicted pause time: 818.88 >>>>>>>> ms, >>>>>>>> target pause time: 1000.00 ms], 0.7559550 secs] [Parallel Time: >>>>>>>> 563.9 ms, >>>>>>>> GC Workers: 13] [GC Worker Start (ms): Min: 193238308.1, Avg: >>>>>>>> 193238318.0, Max: 193238347.6, Diff: 39.5] [Ext Root Scanning >>>>>>>> (ms): >>>>>>>> Min: 0.0, Avg: 13.0, Max: 35.8, Diff: 35.8, Sum: 168.4] >>>>>>>> [Update RS >>>>>>>> (ms): Min: 399.2, Avg: 416.8, Max: 442.8, Diff: 43.6, Sum: 5418.0] >>>>>>>> [Processed Buffers: Min: 162, Avg: 232.0, Max: 326, Diff: 164, Sum: >>>>>>>> 3016] >>>>>>>> [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] >>>>>>>> [Object Copy (ms): Min: 79.9, Avg: 104.8, Max: 152.4, Diff: 72.5, >>>>>>>> Sum: >>>>>>>> 1363.0] [Termination (ms): Min: 0.0, Avg: 19.1, Max: 27.3, >>>>>>>> Diff: 27.3, >>>>>>>> Sum: 248.9] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: >>>>>>>> 0.0, Diff: >>>>>>>> 0.0, Sum: 0.3] [GC Worker Total (ms): Min: 524.1, Avg: 553.8, >>>>>>>> Max: >>>>>>>> 563.7, Diff: 39.6, Sum: 7198.8] [GC Worker End (ms): Min: >>>>>>>> 193238871.7, >>>>>>>> Avg: 193238871.8, Max: 193238871.8, Diff: 0.1] >>>>>>>> [Code Root Fixup: 0.0 ms] >>>>>>>> [Clear CT: 0.3 ms] >>>>>>>> [Other: 191.7 ms] >>>>>>>> [Choose CSet: 0.0 ms] [Ref Proc: 190.1 ms] [Ref >>>>>>>> Enq: 0.3 >>>>>>>> ms] [Free CSet: 0.2 ms] >>>>>>>> [Eden: 16.0M(2464.0M)->0.0B(22.9G) Survivors: 336.0M->240.0M >>>>>>>> Heap: >>>>>>>> 14.1G(28.7G)->14.1G(28.7G)] >>>>>>>> [Times: user=8.45 sys=0.04, real=0.75 secs]" >>>>>>>> >>>>>>>> The reason for increasing InitiatingHeapOccupancyPercent to 20 from >>>>>>>> 10 is >>>>>>>> we are wasting some concurrent cycles. >>>>>>>> >>>>>>>> We will see how this goes. We might increase G1ReservePercent to >>>>>>>> handle >>>>>>>> this kind of allocation if it is not enough. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Jenny >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Jenny >>>>>>>> >>>>>>>> On 8/11/2014 10:46 AM, Martin Makundi wrote: >>>>>>>> >>>>>>>> Hi! >>>>>>>> >>>>>>>> Here is our latest log with one Full GC @ 2014-08-11T11:20:02 which >>>>>>>> is >>>>>>>> caused by heap full and allocation request: 144 bytes. >>>>>>>> >>>>>>>> http://81.22.250.165/log/gc-16m-2014-08-11.log >>>>>>>> >>>>>>>> Any ideas how to mitigate this kind of situation? The Full GC makes >>>>>>>> quite >>>>>>>> a difference to the situation but causes a painful pause also. >>>>>>>> >>>>>>>> ** >>>>>>>> Martin >>>>>>>> >>>>>>>> >>>>>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Wed Aug 27 14:51:39 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 27 Aug 2014 16:51:39 +0200 Subject: G1gc compaction algorithm In-Reply-To: References: <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> <53E9A36E.80600@oracle.com> <53F381BE.70803@oracle.com> Message-ID: <1409151099.2765.46.camel@cirrus> Hi Martin, On Wed, 2014-08-27 at 06:38 +0300, Martin Makundi wrote: > Good article: > http://www.azulsystems.com/sites/default/files/images/c4_paper_acm.pdf we are aware of this product and a few other papers or implementations for a fully concurrent compacting collector. > 2014-08-21 17:21 GMT+03:00 Martin Makundi > : > Hi! > > Why doesn't the g1gc operate like azul c4 is described to > operate: > http://www.azulsystems.com/technology/c4-garbage-collector > "fully concurrent, so it never falls back to a stop-the-world > compaction"? Fully concurrent vs. STW is a trade-off. Years ago, for G1 it has simply been decided that the STW mechanism has more advantages than benefits. One of the problems of fully concurrent collectors is usually the impact on throughput (~15-20% less), another one is typically highly increased complexity of the code. Iirc C4 does not give a guarantee about always being fully concurrent; actually the paper mentions at least one case where it "stops the world" too. Not sure if they have been fixed (or is considered worth fixing). There may be more of these corner cases, like that the application needs to be at least somewhat cooperative (i.e. in terms of live set, max/avg allocation rate) just like in G1 to avoid the equivalent of full gcs. I.e. most likely "never" with some fine print. I may be wrong, maybe somebody with experience with C4 can chime in. At this time we will continue to work on making G1 even more useful in more applications (larger heaps, providing more consistent lower pause time "guarantees"). :) Thanks, Thomas From martin.makundi at koodaripalvelut.com Wed Aug 27 15:02:37 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Wed, 27 Aug 2014 18:02:37 +0300 Subject: G1gc compaction algorithm In-Reply-To: <1409151099.2765.46.camel@cirrus> References: <53DAB6FD.1050501@oracle.com> <53DABCAE.9060901@oracle.com> <53E2A3CE.6090803@oracle.com> <53E40CC2.6020609@oracle.com> <53E9A36E.80600@oracle.com> <53F381BE.70803@oracle.com> <1409151099.2765.46.camel@cirrus> Message-ID: Hi! Thanks for the comment. Bit biased? ;) I would like to see an AI GC... what I mean by that is artificial intelligence in the sense that WHATEVER you throw at it, it will tune itself to the best it can better and faster than any human can. It could do C4 mode and G1 mode and anything in between. Will that mean increased complexity of code? Yes. But it is justified if you take into account what it costs not to meet latency targets (immagine financial services, immagine security services, etc. where there is practically no tolerance...). I would like to see Open JDK to win the race ;) ** Martin 2014-08-27 17:51 GMT+03:00 Thomas Schatzl : > Hi Martin, > > On Wed, 2014-08-27 at 06:38 +0300, Martin Makundi wrote: > > Good article: > > http://www.azulsystems.com/sites/default/files/images/c4_paper_acm.pdf > > > we are aware of this product and a few other papers or implementations > for a fully concurrent compacting collector. > > > 2014-08-21 17:21 GMT+03:00 Martin Makundi > > : > > Hi! > > > > Why doesn't the g1gc operate like azul c4 is described to > > operate: > > http://www.azulsystems.com/technology/c4-garbage-collector > > "fully concurrent, so it never falls back to a stop-the-world > > compaction"? > > Fully concurrent vs. STW is a trade-off. Years ago, for G1 it has simply > been decided that the STW mechanism has more advantages than benefits. > > One of the problems of fully concurrent collectors is usually the impact > on throughput (~15-20% less), another one is typically highly increased > complexity of the code. > > Iirc C4 does not give a guarantee about always being fully concurrent; > actually the paper mentions at least one case where it "stops the world" > too. Not sure if they have been fixed (or is considered worth fixing). > There may be more of these corner cases, like that the application needs > to be at least somewhat cooperative (i.e. in terms of live set, max/avg > allocation rate) just like in G1 to avoid the equivalent of full gcs. > > I.e. most likely "never" with some fine print. > > I may be wrong, maybe somebody with experience with C4 can chime in. > > At this time we will continue to work on making G1 even more useful in > more applications (larger heaps, providing more consistent lower pause > time "guarantees"). :) > > Thanks, > Thomas > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kannan.Krishnamurthy at contractor.cengage.com Thu Aug 28 19:38:27 2014 From: Kannan.Krishnamurthy at contractor.cengage.com (Krishnamurthy, Kannan) Date: Thu, 28 Aug 2014 19:38:27 +0000 Subject: Unexplained long stop the world pauses during concurrent marking step in G1 Collector In-Reply-To: References: <53FBE77B.6060806@oracle.com>, , Message-ID: We managed to get couple of native stack traces when the pause was happening. >From the thread dump, we don't see any application thread blocking the JVM. We are wondering whether the JVM Safepoint and a single concurrent marking thread are blocking each other. Links to the complete stack traces (~5.5MB each): First : https://drive.google.com/file/d/0B95RUv-vjsAfcE95MlVmMVhKZTQ/edit?usp=sharing Second : https://drive.google.com/file/d/0B95RUv-vjsAfNklxMnZVaEI2UTg/edit?usp=sharing Most of the threads are in some kinds of waiting (pthread_cond_wait@@GLIBC_2.3.2, epoll_wait, or pthread_cond_timedwait). We see that only one thread was doing any work across two successive stack traces 2 seconds aparts. For that thread 1820, in the first stack trace, we see: Thread 1820 (Thread 0x7f266f2f5700 (LWP 17402)): #0 0x00007f26bc55f604 in Monitor::ILock(Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #1 0x00007f26bc55f9bf in Monitor::lock_without_safepoint_check() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc229779 in CMTask::move_entries_to_global_stack() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc230974 in CMTask::push(oopDesc*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #4 0x00007f26bc230b39 in CMTask::deal_with_reference(oopDesc*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #5 0x00007f26bc582ef7 in objArrayKlass::oop_oop_iterate_nv(oopDesc*, G1CMOopClosure*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #6 0x00007f26bc22abc5 in CMTask::drain_local_queue(bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #7 0x00007f26bc233562 in CMBitMapClosure::do_bit(unsigned long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #8 0x00007f26bc22b955 in CMTask::do_marking_step(double, bool, bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #9 0x00007f26bc232d8b in CMConcurrentMarkingTask::work(unsigned int) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #10 0x00007f26bc70eaff in GangWorker::loop() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #11 0x00007f26bc599ff0 in java_start(Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #12 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 #14 0x0000000000000000 in ?? () ---------------------- For the same thread 1820 in the second trace, we see: Thread 1820 (Thread 0x7f266f2f5700 (LWP 17402)): #0 0x00007f26bc208080 in oopDesc::size() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #1 0x00007f26bc230ace in CMTask::deal_with_reference(oopDesc*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc582ef7 in objArrayKlass::oop_oop_iterate_nv(oopDesc*, G1CMOopClosure*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc22abc5 in CMTask::drain_local_queue(bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #4 0x00007f26bc233562 in CMBitMapClosure::do_bit(unsigned long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #5 0x00007f26bc22b955 in CMTask::do_marking_step(double, bool, bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #6 0x00007f26bc232d8b in CMConcurrentMarkingTask::work(unsigned int) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #7 0x00007f26bc70eaff in GangWorker::loop() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #8 0x00007f26bc599ff0 in java_start(Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #9 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 #11 0x0000000000000000 in ?? () *************************************************************************************** The other thread interesting thread VMThread is trying to start a Safepoint Synchronize. Thread 1816 (Thread 0x7f266ec26700 (LWP 17406)): #0 0x00007f26bd31b61c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f26bc594103 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc55ff0f in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc56069e in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #4 0x00007f26bc2249b5 in ConcurrentGCThread::safepoint_synchronize() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #5 0x00007f26bc61d4fb in SafepointSynchronize::begin() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #6 0x00007f26bc700227 in VMThread::loop() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #7 0x00007f26bc7008d0 in VMThread::run() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #8 0x00007f26bc599ff0 in java_start(Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #9 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 #11 0x0000000000000000 in ?? () ********************************************** The complete break down of rest of threads as below: (1632) Threads with stack Thread 1818 (Thread 0x7f266f0f3700 (LWP 17404)): #0 0x00007f26bd31b61c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f26bc594103 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc55ff0f in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc56069e in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so (156) Threads with stack Thread 10 (Thread 0x7f1cf6e31700 (LWP 973)): #0 0x00007f26bd31b61c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f26bc598427 in Parker::park(bool, long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc6d84ad in Unsafe_Park () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so (24) Threads with stack #0 0x00007f26bcc522e3 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f2669eb97ba in Java_sun_nio_ch_EPollArrayWrapper_epollWait () from /usr/java/jdk1.7.0_04/jre/lib/amd64/libnio.so (18) Threads with stack #0 0x00007f26bd31b989 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f26bc598c27 in os::PlatformEvent::park(long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc589880 in ObjectMonitor::wait(long, bool, Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc3febd1 in JVM_MonitorWait () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so (36) Threads with stack #0 0x00007f26bd31b989 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f26bc59847f in Parker::park(bool, long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc6d84ad in Unsafe_Park () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so (1) Thread with stack #0 0x00007f26bd31da00 in sem_wait () from /lib64/libpthread.so.0 #1 0x00007f26bc59881a in check_pending_signals(bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc592945 in signal_thread_entry(JavaThread*, Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc6b8228 in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #4 0x00007f26bc6b8378 in JavaThread::run() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #5 0x00007f26bc599ff0 in java_start(Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #6 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 #8 0x0000000000000000 in ?? () Thanks, Kannan ________________________________________ From: Krishnamurthy, Kannan Sent: Tuesday, August 26, 2014 2:09 PM To: Martin Makundi; Yu Zhang Cc: hotspot-gc-use at openjdk.java.net; kndkannan at gmail.com Subject: RE: Unexplained long stop the world pauses during concurrent marking step in G1 Collector Thanks for the responses. I agree with Jenny that the gc logs doesn't seem to indicate that the pause was from a GC event. Profiling the application and trying to get some native stack traces/ core dumps. ________________________________ From: Martin Makundi [martin.makundi at koodaripalvelut.com] Sent: Monday, August 25, 2014 10:07 PM To: Yu Zhang Cc: Krishnamurthy, Kannan; hotspot-gc-use at openjdk.java.net; kndkannan at gmail.com Subject: Re: Unexplained long stop the world pauses during concurrent marking step in G1 Collector Try running command kill -3 to see what's going on with the application threads. ** Martin 2014-08-26 4:48 GMT+03:00 Yu Zhang >: Kannan, The concurrent marking is concurrent, meaning it runs concurrently with the application. You may see the time between [GC concurrent-mark-start] and [GC concurrent-mark-stop] very long, maybe even some young gc happened during this time period. This is because the marking threads can be suspended. >*2014-08-07T13:42:39.598-0400: 92192.348: Application time: 9.0448670 seconds** 2014-08-07T13:42:39.601-0400: 92192.351: Total time for which application threads were stopped: 0.0029740 seconds This means the application has run for 9.0448670 seconds. And only being stopped for 0.0029740 seconds. >From this log, gc did not stop the application. >Linux "top" shows single CPU at 100% and rest of the CPUs at 0% during the pause. Maybe something in the application or OS is running on 1 cpu and blocking other threads. Thanks, Jenny On 8/18/2014 1:34 PM, Krishnamurthy, Kannan wrote: Greetings, We are experiencing unexplained/unknown long pauses (8 seconds) during concurrent marking step of G1 collector. 2014-08-07T13:42:30.552-0400: 92183.303: [GC concurrent-root-region-scan-start] 2014-08-07T13:42:30.555-0400: 92183.305: [GC concurrent-root-region-scan-end, 0.0025230 secs] **2014-08-07T13:42:30.555-0400: 92183.305: [GC concurrent-mark-start]** **2014-08-07T13:42:39.598-0400: 92192.348: Application time: 9.0448670 seconds** 2014-08-07T13:42:39.601-0400: 92192.351: Total time for which application threads were stopped: 0.0029740 seconds 2014-08-07T13:42:39.603-0400: 92192.353: [GC pause (G1 Evacuation Pause) (young) 92192.354: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 7980, predicted base time: 28.19 ms, remaining time: 71.81 ms, target pause time: 100.00 ms `2014-08-07T13:42:30.555-0400: 92183.305` is when the concurrent mark starts, approximately after 2 seconds of this step the application starts to pause. However the GC logs claims the application was not paused during this window. Linux "top" shows single CPU at 100% and rest of the CPUs at 0% during the pause. Any help in understanding the root cause of this issue is appreciated. Our target JVMS: java version "1.7.0_04" Java(TM) SE Runtime Environment (build 1.7.0_04-b20) Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) java version "1.8.0_11" Java(TM) SE Runtime Environment (build 1.8.0_11-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode) Our JVM options : -Xms20G -Xmx20G -Xss10M -XX:PermSize=128M -XX:MaxPermSize=128M -XX:MarkStackSize=16M -XX:+UnlockDiagnosticVMOptions -XX:+G1PrintRegionLivenessInfo -XX:+TraceGCTaskThread -XX:+G1SummarizeConcMark -XX:+G1SummarizeRSetStats -XX:+G1TraceConcRefinement -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=65 -XX:ParallelGCThreads=24 -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/common/logs/ocean-partition-gc.log Thanks and regards, Kannan _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From Kannan.Krishnamurthy at contractor.cengage.com Thu Aug 28 19:40:37 2014 From: Kannan.Krishnamurthy at contractor.cengage.com (Krishnamurthy, Kannan) Date: Thu, 28 Aug 2014 19:40:37 +0000 Subject: Unexplained long stop the world pauses during concurrent marking step in G1 Collector In-Reply-To: References: <53FBE77B.6060806@oracle.com>, , , Message-ID: We managed to get couple of native stack traces when the pause was happening. >From the thread dump, we don't see any application thread blocking the JVM. We are wondering whether the JVM Safepoint and a single concurrent marking thread are blocking each other. Links to the complete stack traces (~5.5MB each): First : https://drive.google.com/file/d/0B95RUv-vjsAfcE95MlVmMVhKZTQ/edit?usp=sharing Second : https://drive.google.com/file/d/0B95RUv-vjsAfNklxMnZVaEI2UTg/edit?usp=sharing Most of the threads are in some kinds of waiting (pthread_cond_wait@@GLIBC_2.3.2, epoll_wait, or pthread_cond_timedwait). We see that only one thread was doing any work across two successive stack traces 2 seconds aparts. For that thread 1820, in the first stack trace, we see: Thread 1820 (Thread 0x7f266f2f5700 (LWP 17402)): #0 0x00007f26bc55f604 in Monitor::ILock(Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #1 0x00007f26bc55f9bf in Monitor::lock_without_safepoint_check() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc229779 in CMTask::move_entries_to_global_stack() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc230974 in CMTask::push(oopDesc*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #4 0x00007f26bc230b39 in CMTask::deal_with_reference(oopDesc*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #5 0x00007f26bc582ef7 in objArrayKlass::oop_oop_iterate_nv(oopDesc*, G1CMOopClosure*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #6 0x00007f26bc22abc5 in CMTask::drain_local_queue(bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #7 0x00007f26bc233562 in CMBitMapClosure::do_bit(unsigned long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #8 0x00007f26bc22b955 in CMTask::do_marking_step(double, bool, bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #9 0x00007f26bc232d8b in CMConcurrentMarkingTask::work(unsigned int) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #10 0x00007f26bc70eaff in GangWorker::loop() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #11 0x00007f26bc599ff0 in java_start(Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #12 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 #14 0x0000000000000000 in ?? () ---------------------- For the same thread 1820 in the second trace, we see: Thread 1820 (Thread 0x7f266f2f5700 (LWP 17402)): #0 0x00007f26bc208080 in oopDesc::size() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #1 0x00007f26bc230ace in CMTask::deal_with_reference(oopDesc*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc582ef7 in objArrayKlass::oop_oop_iterate_nv(oopDesc*, G1CMOopClosure*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc22abc5 in CMTask::drain_local_queue(bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #4 0x00007f26bc233562 in CMBitMapClosure::do_bit(unsigned long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #5 0x00007f26bc22b955 in CMTask::do_marking_step(double, bool, bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #6 0x00007f26bc232d8b in CMConcurrentMarkingTask::work(unsigned int) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #7 0x00007f26bc70eaff in GangWorker::loop() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #8 0x00007f26bc599ff0 in java_start(Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #9 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 #11 0x0000000000000000 in ?? () *************************************************************************************** The other thread interesting thread VMThread is trying to start a Safepoint Synchronize. Thread 1816 (Thread 0x7f266ec26700 (LWP 17406)): #0 0x00007f26bd31b61c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f26bc594103 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc55ff0f in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc56069e in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #4 0x00007f26bc2249b5 in ConcurrentGCThread::safepoint_synchronize() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #5 0x00007f26bc61d4fb in SafepointSynchronize::begin() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #6 0x00007f26bc700227 in VMThread::loop() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #7 0x00007f26bc7008d0 in VMThread::run() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #8 0x00007f26bc599ff0 in java_start(Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #9 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 #11 0x0000000000000000 in ?? () ********************************************** The complete break down of rest of threads as below: (1632) Threads with stack Thread 1818 (Thread 0x7f266f0f3700 (LWP 17404)): #0 0x00007f26bd31b61c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f26bc594103 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc55ff0f in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc56069e in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so (156) Threads with stack Thread 10 (Thread 0x7f1cf6e31700 (LWP 973)): #0 0x00007f26bd31b61c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f26bc598427 in Parker::park(bool, long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc6d84ad in Unsafe_Park () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so (24) Threads with stack #0 0x00007f26bcc522e3 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f2669eb97ba in Java_sun_nio_ch_EPollArrayWrapper_epollWait () from /usr/java/jdk1.7.0_04/jre/lib/amd64/libnio.so (18) Threads with stack #0 0x00007f26bd31b989 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f26bc598c27 in os::PlatformEvent::park(long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc589880 in ObjectMonitor::wait(long, bool, Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc3febd1 in JVM_MonitorWait () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so (36) Threads with stack #0 0x00007f26bd31b989 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f26bc59847f in Parker::park(bool, long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc6d84ad in Unsafe_Park () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so (1) Thread with stack #0 0x00007f26bd31da00 in sem_wait () from /lib64/libpthread.so.0 #1 0x00007f26bc59881a in check_pending_signals(bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc592945 in signal_thread_entry(JavaThread*, Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc6b8228 in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #4 0x00007f26bc6b8378 in JavaThread::run() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #5 0x00007f26bc599ff0 in java_start(Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #6 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 #8 0x0000000000000000 in ?? () Thanks, Kannan ________________________________________ From: Krishnamurthy, Kannan Sent: Tuesday, August 26, 2014 2:09 PM To: Martin Makundi; Yu Zhang Cc: hotspot-gc-use at openjdk.java.net; kndkannan at gmail.com Subject: RE: Unexplained long stop the world pauses during concurrent marking step in G1 Collector Thanks for the responses. I agree with Jenny that the gc logs doesn't seem to indicate that the pause was from a GC event. Profiling the application and trying to get some native stack traces/ core dumps. ________________________________ From: Martin Makundi [martin.makundi at koodaripalvelut.com] Sent: Monday, August 25, 2014 10:07 PM To: Yu Zhang Cc: Krishnamurthy, Kannan; hotspot-gc-use at openjdk.java.net; kndkannan at gmail.com Subject: Re: Unexplained long stop the world pauses during concurrent marking step in G1 Collector Try running command kill -3 to see what's going on with the application threads. ** Martin 2014-08-26 4:48 GMT+03:00 Yu Zhang >: Kannan, The concurrent marking is concurrent, meaning it runs concurrently with the application. You may see the time between [GC concurrent-mark-start] and [GC concurrent-mark-stop] very long, maybe even some young gc happened during this time period. This is because the marking threads can be suspended. >*2014-08-07T13:42:39.598-0400: 92192.348: Application time: 9.0448670 seconds** 2014-08-07T13:42:39.601-0400: 92192.351: Total time for which application threads were stopped: 0.0029740 seconds This means the application has run for 9.0448670 seconds. And only being stopped for 0.0029740 seconds. >From this log, gc did not stop the application. >Linux "top" shows single CPU at 100% and rest of the CPUs at 0% during the pause. Maybe something in the application or OS is running on 1 cpu and blocking other threads. Thanks, Jenny On 8/18/2014 1:34 PM, Krishnamurthy, Kannan wrote: Greetings, We are experiencing unexplained/unknown long pauses (8 seconds) during concurrent marking step of G1 collector. 2014-08-07T13:42:30.552-0400: 92183.303: [GC concurrent-root-region-scan-start] 2014-08-07T13:42:30.555-0400: 92183.305: [GC concurrent-root-region-scan-end, 0.0025230 secs] **2014-08-07T13:42:30.555-0400: 92183.305: [GC concurrent-mark-start]** **2014-08-07T13:42:39.598-0400: 92192.348: Application time: 9.0448670 seconds** 2014-08-07T13:42:39.601-0400: 92192.351: Total time for which application threads were stopped: 0.0029740 seconds 2014-08-07T13:42:39.603-0400: 92192.353: [GC pause (G1 Evacuation Pause) (young) 92192.354: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 7980, predicted base time: 28.19 ms, remaining time: 71.81 ms, target pause time: 100.00 ms `2014-08-07T13:42:30.555-0400: 92183.305` is when the concurrent mark starts, approximately after 2 seconds of this step the application starts to pause. However the GC logs claims the application was not paused during this window. Linux "top" shows single CPU at 100% and rest of the CPUs at 0% during the pause. Any help in understanding the root cause of this issue is appreciated. Our target JVMS: java version "1.7.0_04" Java(TM) SE Runtime Environment (build 1.7.0_04-b20) Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) java version "1.8.0_11" Java(TM) SE Runtime Environment (build 1.8.0_11-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode) Our JVM options : -Xms20G -Xmx20G -Xss10M -XX:PermSize=128M -XX:MaxPermSize=128M -XX:MarkStackSize=16M -XX:+UnlockDiagnosticVMOptions -XX:+G1PrintRegionLivenessInfo -XX:+TraceGCTaskThread -XX:+G1SummarizeConcMark -XX:+G1SummarizeRSetStats -XX:+G1TraceConcRefinement -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=65 -XX:ParallelGCThreads=24 -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/common/logs/ocean-partition-gc.log Thanks and regards, Kannan _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From ysr1729 at gmail.com Thu Aug 28 23:25:34 2014 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Thu, 28 Aug 2014 16:25:34 -0700 Subject: Unexplained long stop the world pauses during concurrent marking step in G1 Collector In-Reply-To: References: <53FBE77B.6060806@oracle.com> Message-ID: It's been a while since I looked at G1 code and I'm sure it's evolved a bunch sine then... Hi Kannan -- As you surmised, it's likely that the marking step isn't checking at a sufficiently fine granularity whether a safepoint has been requested. Or, equivalently, the marking step is doing too much work in one "step", thus preventing a safepoint while the marking step is in progress. If you have GC logs from the application, you could look at the allocation rates that you observe and compare the rates during the marking phase and outside of the marking phase. I am guessing that because of this, the making phase must be slowing down allocation, and we can get a measure of that from your GC logs. It is clear from your stack traces that the mutators are all blocked for allocation, while a safepoint is waiting for the marking step to yield. It could be (from the stack retrace) that we are scanning from a gigantic obj array and perhaps the marking step can yield only after the entire array has been scanned. In which case, the use of large object arrays (or hash tables) could be a performance anti-pattern for G1. Perhaps we should allow for partial scanning of arrays -- i can't recall if CMS does that for marking -- save the state of the partial scan and resume from that point after the yield (which occurs at a sufficiently fine granularity). This used to be an issue with CMS as well in the early days and we had to refine the granularity of the marking steps (or the so-called "concurrent work yield points" -- points at which the marking will stop to allow a scavenge to proceed). I am guessing we'll need to refine the granularity at which G1 does these yields to allow a young collection to proceed in a timely fashion. -- ramki On Thu, Aug 28, 2014 at 12:38 PM, Krishnamurthy, Kannan < Kannan.Krishnamurthy at contractor.cengage.com> wrote: > We managed to get couple of native stack traces when the pause was > happening. > > From the thread dump, we don't see any application thread blocking the > JVM. We are wondering whether the JVM Safepoint and a single concurrent > marking thread are blocking each other. > > Links to the complete stack traces (~5.5MB each): > First : > https://drive.google.com/file/d/0B95RUv-vjsAfcE95MlVmMVhKZTQ/edit?usp=sharing > Second : > https://drive.google.com/file/d/0B95RUv-vjsAfNklxMnZVaEI2UTg/edit?usp=sharing > > Most of the threads are in some kinds of waiting (pthread_cond_wait@@GLIBC_2.3.2, > epoll_wait, or pthread_cond_timedwait). > > We see that only one thread was doing any work across two successive stack > traces 2 seconds aparts. For that thread 1820, in the first stack trace, > we see: > > Thread 1820 (Thread 0x7f266f2f5700 (LWP 17402)): > #0 0x00007f26bc55f604 in Monitor::ILock(Thread*) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #1 0x00007f26bc55f9bf in Monitor::lock_without_safepoint_check() () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #2 0x00007f26bc229779 in CMTask::move_entries_to_global_stack() () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #3 0x00007f26bc230974 in CMTask::push(oopDesc*) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #4 0x00007f26bc230b39 in CMTask::deal_with_reference(oopDesc*) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #5 0x00007f26bc582ef7 in objArrayKlass::oop_oop_iterate_nv(oopDesc*, > G1CMOopClosure*) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #6 0x00007f26bc22abc5 in CMTask::drain_local_queue(bool) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #7 0x00007f26bc233562 in CMBitMapClosure::do_bit(unsigned long) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #8 0x00007f26bc22b955 in CMTask::do_marking_step(double, bool, bool) () > from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #9 0x00007f26bc232d8b in CMConcurrentMarkingTask::work(unsigned int) () > from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #10 0x00007f26bc70eaff in GangWorker::loop() () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #11 0x00007f26bc599ff0 in java_start(Thread*) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #12 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 > #13 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 > #14 0x0000000000000000 in ?? () > ---------------------- > > For the same thread 1820 in the second trace, we see: > > Thread 1820 (Thread 0x7f266f2f5700 (LWP 17402)): > #0 0x00007f26bc208080 in oopDesc::size() () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #1 0x00007f26bc230ace in CMTask::deal_with_reference(oopDesc*) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #2 0x00007f26bc582ef7 in objArrayKlass::oop_oop_iterate_nv(oopDesc*, > G1CMOopClosure*) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #3 0x00007f26bc22abc5 in CMTask::drain_local_queue(bool) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #4 0x00007f26bc233562 in CMBitMapClosure::do_bit(unsigned long) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #5 0x00007f26bc22b955 in CMTask::do_marking_step(double, bool, bool) () > from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #6 0x00007f26bc232d8b in CMConcurrentMarkingTask::work(unsigned int) () > from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #7 0x00007f26bc70eaff in GangWorker::loop() () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #8 0x00007f26bc599ff0 in java_start(Thread*) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #9 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 > #10 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 > #11 0x0000000000000000 in ?? () > > > > *************************************************************************************** > > The other thread interesting thread VMThread is trying to start a > Safepoint Synchronize. > > Thread 1816 (Thread 0x7f266ec26700 (LWP 17406)): > #0 0x00007f26bd31b61c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f26bc594103 in os::PlatformEvent::park() () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #2 0x00007f26bc55ff0f in Monitor::IWait(Thread*, long) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #3 0x00007f26bc56069e in Monitor::wait(bool, long, bool) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #4 0x00007f26bc2249b5 in ConcurrentGCThread::safepoint_synchronize() () > from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #5 0x00007f26bc61d4fb in SafepointSynchronize::begin() () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #6 0x00007f26bc700227 in VMThread::loop() () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #7 0x00007f26bc7008d0 in VMThread::run() () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #8 0x00007f26bc599ff0 in java_start(Thread*) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #9 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 > #10 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 > #11 0x0000000000000000 in ?? () > > ********************************************** > The complete break down of rest of threads as below: > > (1632) Threads with stack > Thread 1818 (Thread 0x7f266f0f3700 (LWP 17404)): > #0 0x00007f26bd31b61c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f26bc594103 in os::PlatformEvent::park() () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #2 0x00007f26bc55ff0f in Monitor::IWait(Thread*, long) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #3 0x00007f26bc56069e in Monitor::wait(bool, long, bool) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > > > (156) Threads with stack > Thread 10 (Thread 0x7f1cf6e31700 (LWP 973)): > #0 0x00007f26bd31b61c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f26bc598427 in Parker::park(bool, long) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #2 0x00007f26bc6d84ad in Unsafe_Park () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > > > > (24) Threads with stack > #0 0x00007f26bcc522e3 in epoll_wait () from /lib64/libc.so.6 > #1 0x00007f2669eb97ba in Java_sun_nio_ch_EPollArrayWrapper_epollWait () > from /usr/java/jdk1.7.0_04/jre/lib/amd64/libnio.so > > > (18) Threads with stack > #0 0x00007f26bd31b989 in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f26bc598c27 in os::PlatformEvent::park(long) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #2 0x00007f26bc589880 in ObjectMonitor::wait(long, bool, Thread*) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #3 0x00007f26bc3febd1 in JVM_MonitorWait () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > > (36) Threads with stack > #0 0x00007f26bd31b989 in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f26bc59847f in Parker::park(bool, long) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #2 0x00007f26bc6d84ad in Unsafe_Park () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > > (1) Thread with stack > > #0 0x00007f26bd31da00 in sem_wait () from /lib64/libpthread.so.0 > #1 0x00007f26bc59881a in check_pending_signals(bool) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #2 0x00007f26bc592945 in signal_thread_entry(JavaThread*, Thread*) () > from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #3 0x00007f26bc6b8228 in JavaThread::thread_main_inner() () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #4 0x00007f26bc6b8378 in JavaThread::run() () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #5 0x00007f26bc599ff0 in java_start(Thread*) () from > /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so > #6 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 > #7 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 > #8 0x0000000000000000 in ?? () > > > Thanks, > Kannan > > ________________________________________ > From: Krishnamurthy, Kannan > Sent: Tuesday, August 26, 2014 2:09 PM > To: Martin Makundi; Yu Zhang > Cc: hotspot-gc-use at openjdk.java.net; kndkannan at gmail.com > Subject: RE: Unexplained long stop the world pauses during concurrent > marking step in G1 Collector > > Thanks for the responses. I agree with Jenny that the gc logs doesn't seem > to indicate that the pause was from a GC event. > > Profiling the application and trying to get some native stack traces/ core > dumps. > ________________________________ > From: Martin Makundi [martin.makundi at koodaripalvelut.com] > Sent: Monday, August 25, 2014 10:07 PM > To: Yu Zhang > Cc: Krishnamurthy, Kannan; hotspot-gc-use at openjdk.java.net; > kndkannan at gmail.com > Subject: Re: Unexplained long stop the world pauses during concurrent > marking step in G1 Collector > > Try running command kill -3 to see what's going on with the > application threads. > > ** > Martin > > > 2014-08-26 4:48 GMT+03:00 Yu Zhang yu.zhang at oracle.com>>: > Kannan, > > The concurrent marking is concurrent, meaning it runs concurrently with > the application. > You may see the time between [GC concurrent-mark-start] and > [GC concurrent-mark-stop] very long, maybe even some young gc happened > during this time period. This is because the marking threads can be > suspended. > > >*2014-08-07T13:42:39.598-0400: 92192.348: Application time: 9.0448670 > seconds** > 2014-08-07T13:42:39.601-0400: 92192.351: Total time for which > application threads were stopped: 0.0029740 seconds > This means the application has run for 9.0448670 seconds. And only being > stopped for > 0.0029740 seconds. > > From this log, gc did not stop the application. > > >Linux "top" shows single CPU at 100% and rest of the CPUs at 0% during > the pause. > Maybe something in the application or OS is running on 1 cpu and blocking > other threads. > > Thanks, > Jenny > > On 8/18/2014 1:34 PM, Krishnamurthy, Kannan wrote: > Greetings, > > We are experiencing unexplained/unknown long pauses (8 seconds) during > concurrent marking step of G1 collector. > > 2014-08-07T13:42:30.552-0400: 92183.303: [GC > concurrent-root-region-scan-start] > 2014-08-07T13:42:30.555-0400: 92183.305: [GC > concurrent-root-region-scan-end, 0.0025230 secs] > **2014-08-07T13:42:30.555-0400: 92183.305: [GC concurrent-mark-start]** > **2014-08-07T13:42:39.598-0400: 92192.348: Application time: 9.0448670 > seconds** > 2014-08-07T13:42:39.601-0400: 92192.351: Total time for which > application threads were stopped: 0.0029740 seconds > 2014-08-07T13:42:39.603-0400: 92192.353: [GC pause (G1 Evacuation > Pause) (young) 92192.354: [G1Ergonomics (CSet Construction) start choosing > CSet, _pending_cards: 7980, predicted base time: 28.19 ms, remaining time: > 71.81 ms, target pause time: 100.00 ms > > > `2014-08-07T13:42:30.555-0400: 92183.305` is when the concurrent mark > starts, approximately after 2 seconds of this step the application starts > to pause. However the GC logs claims the application was not paused during > this window. > Linux "top" shows single CPU at 100% and rest of the CPUs at 0% during the > pause. > > Any help in understanding the root cause of this issue is appreciated. > > Our target JVMS: > > java version "1.7.0_04" > Java(TM) SE Runtime Environment (build 1.7.0_04-b20) > Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) > > java version "1.8.0_11" > Java(TM) SE Runtime Environment (build 1.8.0_11-b12) > Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode) > > > Our JVM options : > > -Xms20G -Xmx20G -Xss10M -XX:PermSize=128M -XX:MaxPermSize=128M > -XX:MarkStackSize=16M > -XX:+UnlockDiagnosticVMOptions -XX:+G1PrintRegionLivenessInfo > -XX:+TraceGCTaskThread > -XX:+G1SummarizeConcMark -XX:+G1SummarizeRSetStats > -XX:+G1TraceConcRefinement > -XX:+UseG1GC -XX:MaxGCPauseMillis=100 > -XX:InitiatingHeapOccupancyPercent=65 > -XX:ParallelGCThreads=24 -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy > -XX:+PrintTenuringDistribution > -XX:+PrintGCApplicationStoppedTime > -XX:+PrintGCApplicationConcurrentTime > -Xloggc:/common/logs/ocean-partition-gc.log > > > Thanks and regards, > Kannan > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kannan.Krishnamurthy at contractor.cengage.com Fri Aug 29 16:07:30 2014 From: Kannan.Krishnamurthy at contractor.cengage.com (Krishnamurthy, Kannan) Date: Fri, 29 Aug 2014 16:07:30 +0000 Subject: Unexplained long stop the world pauses during concurrent marking step in G1 Collector In-Reply-To: References: <53FBE77B.6060806@oracle.com> , Message-ID: Ramki, Thanks for the detailed explanation. Will continue to profile further and share the finding. Excuse my naivety, so the default value of 10 ms for G1ConcMarkStepDurationMillis doesn't still help in this case ? Will G1RefProcDrainInterval be of any use ? Thanks, Kannan. ________________________________________ From: Srinivas Ramakrishna [ysr1729 at gmail.com] Sent: Thursday, August 28, 2014 7:25 PM To: Krishnamurthy, Kannan Cc: Martin Makundi; Yu Zhang; hotspot-gc-use at openjdk.java.net; kndkannan at gmail.com; Zhou, Jerry Subject: Re: Unexplained long stop the world pauses during concurrent marking step in G1 Collector It's been a while since I looked at G1 code and I'm sure it's evolved a bunch sine then... Hi Kannan -- As you surmised, it's likely that the marking step isn't checking at a sufficiently fine granularity whether a safepoint has been requested. Or, equivalently, the marking step is doing too much work in one "step", thus preventing a safepoint while the marking step is in progress. If you have GC logs from the application, you could look at the allocation rates that you observe and compare the rates during the marking phase and outside of the marking phase. I am guessing that because of this, the making phase must be slowing down allocation, and we can get a measure of that from your GC logs. It is clear from your stack traces that the mutators are all blocked for allocation, while a safepoint is waiting for the marking step to yield. It could be (from the stack retrace) that we are scanning from a gigantic obj array and perhaps the marking step can yield only after the entire array has been scanned. In which case, the use of large object arrays (or hash tables) could be a performance anti-pattern for G1. Perhaps we should allow for partial scanning of arrays -- i can't recall if CMS does that for marking -- save the state of the partial scan and resume from that point after the yield (which occurs at a sufficiently fine granularity). This used to be an issue with CMS as well in the early days and we had to refine the granularity of the marking steps (or the so-called "concurrent work yield points" -- points at which the marking will stop to allow a scavenge to proceed). I am guessing we'll need to refine the granularity at which G1 does these yields to allow a young collection to proceed in a timely fashion. -- ramki On Thu, Aug 28, 2014 at 12:38 PM, Krishnamurthy, Kannan > wrote: We managed to get couple of native stack traces when the pause was happening. >From the thread dump, we don't see any application thread blocking the JVM. We are wondering whether the JVM Safepoint and a single concurrent marking thread are blocking each other. Links to the complete stack traces (~5.5MB each): First : https://drive.google.com/file/d/0B95RUv-vjsAfcE95MlVmMVhKZTQ/edit?usp=sharing Second : https://drive.google.com/file/d/0B95RUv-vjsAfNklxMnZVaEI2UTg/edit?usp=sharing Most of the threads are in some kinds of waiting (pthread_cond_wait@@GLIBC_2.3.2, epoll_wait, or pthread_cond_timedwait). We see that only one thread was doing any work across two successive stack traces 2 seconds aparts. For that thread 1820, in the first stack trace, we see: Thread 1820 (Thread 0x7f266f2f5700 (LWP 17402)): #0 0x00007f26bc55f604 in Monitor::ILock(Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #1 0x00007f26bc55f9bf in Monitor::lock_without_safepoint_check() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc229779 in CMTask::move_entries_to_global_stack() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc230974 in CMTask::push(oopDesc*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #4 0x00007f26bc230b39 in CMTask::deal_with_reference(oopDesc*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #5 0x00007f26bc582ef7 in objArrayKlass::oop_oop_iterate_nv(oopDesc*, G1CMOopClosure*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #6 0x00007f26bc22abc5 in CMTask::drain_local_queue(bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #7 0x00007f26bc233562 in CMBitMapClosure::do_bit(unsigned long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #8 0x00007f26bc22b955 in CMTask::do_marking_step(double, bool, bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #9 0x00007f26bc232d8b in CMConcurrentMarkingTask::work(unsigned int) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #10 0x00007f26bc70eaff in GangWorker::loop() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #11 0x00007f26bc599ff0 in java_start(Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #12 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 #14 0x0000000000000000 in ?? () ---------------------- For the same thread 1820 in the second trace, we see: Thread 1820 (Thread 0x7f266f2f5700 (LWP 17402)): #0 0x00007f26bc208080 in oopDesc::size() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #1 0x00007f26bc230ace in CMTask::deal_with_reference(oopDesc*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc582ef7 in objArrayKlass::oop_oop_iterate_nv(oopDesc*, G1CMOopClosure*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc22abc5 in CMTask::drain_local_queue(bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #4 0x00007f26bc233562 in CMBitMapClosure::do_bit(unsigned long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #5 0x00007f26bc22b955 in CMTask::do_marking_step(double, bool, bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #6 0x00007f26bc232d8b in CMConcurrentMarkingTask::work(unsigned int) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #7 0x00007f26bc70eaff in GangWorker::loop() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #8 0x00007f26bc599ff0 in java_start(Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #9 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 #11 0x0000000000000000 in ?? () *************************************************************************************** The other thread interesting thread VMThread is trying to start a Safepoint Synchronize. Thread 1816 (Thread 0x7f266ec26700 (LWP 17406)): #0 0x00007f26bd31b61c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f26bc594103 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc55ff0f in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc56069e in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #4 0x00007f26bc2249b5 in ConcurrentGCThread::safepoint_synchronize() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #5 0x00007f26bc61d4fb in SafepointSynchronize::begin() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #6 0x00007f26bc700227 in VMThread::loop() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #7 0x00007f26bc7008d0 in VMThread::run() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #8 0x00007f26bc599ff0 in java_start(Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #9 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 #11 0x0000000000000000 in ?? () ********************************************** The complete break down of rest of threads as below: (1632) Threads with stack Thread 1818 (Thread 0x7f266f0f3700 (LWP 17404)): #0 0x00007f26bd31b61c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f26bc594103 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc55ff0f in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc56069e in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so (156) Threads with stack Thread 10 (Thread 0x7f1cf6e31700 (LWP 973)): #0 0x00007f26bd31b61c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f26bc598427 in Parker::park(bool, long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc6d84ad in Unsafe_Park () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so (24) Threads with stack #0 0x00007f26bcc522e3 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f2669eb97ba in Java_sun_nio_ch_EPollArrayWrapper_epollWait () from /usr/java/jdk1.7.0_04/jre/lib/amd64/libnio.so (18) Threads with stack #0 0x00007f26bd31b989 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f26bc598c27 in os::PlatformEvent::park(long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc589880 in ObjectMonitor::wait(long, bool, Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc3febd1 in JVM_MonitorWait () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so (36) Threads with stack #0 0x00007f26bd31b989 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f26bc59847f in Parker::park(bool, long) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc6d84ad in Unsafe_Park () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so (1) Thread with stack #0 0x00007f26bd31da00 in sem_wait () from /lib64/libpthread.so.0 #1 0x00007f26bc59881a in check_pending_signals(bool) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #2 0x00007f26bc592945 in signal_thread_entry(JavaThread*, Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #3 0x00007f26bc6b8228 in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #4 0x00007f26bc6b8378 in JavaThread::run() () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #5 0x00007f26bc599ff0 in java_start(Thread*) () from /usr/java/jdk1.7.0_04/jre/lib/amd64/server/libjvm.so #6 0x00007f26bd3177b6 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f26bcc51c5d in clone () from /lib64/libc.so.6 #8 0x0000000000000000 in ?? () Thanks, Kannan ________________________________________ From: Krishnamurthy, Kannan Sent: Tuesday, August 26, 2014 2:09 PM To: Martin Makundi; Yu Zhang Cc: hotspot-gc-use at openjdk.java.net; kndkannan at gmail.com Subject: RE: Unexplained long stop the world pauses during concurrent marking step in G1 Collector Thanks for the responses. I agree with Jenny that the gc logs doesn't seem to indicate that the pause was from a GC event. Profiling the application and trying to get some native stack traces/ core dumps. ________________________________ From: Martin Makundi [martin.makundi at koodaripalvelut.com] Sent: Monday, August 25, 2014 10:07 PM To: Yu Zhang Cc: Krishnamurthy, Kannan; hotspot-gc-use at openjdk.java.net; kndkannan at gmail.com Subject: Re: Unexplained long stop the world pauses during concurrent marking step in G1 Collector Try running command kill -3 to see what's going on with the application threads. ** Martin 2014-08-26 4:48 GMT+03:00 Yu Zhang >>: Kannan, The concurrent marking is concurrent, meaning it runs concurrently with the application. You may see the time between [GC concurrent-mark-start] and [GC concurrent-mark-stop] very long, maybe even some young gc happened during this time period. This is because the marking threads can be suspended. >*2014-08-07T13:42:39.598-0400: 92192.348: Application time: 9.0448670 seconds** 2014-08-07T13:42:39.601-0400: 92192.351: Total time for which application threads were stopped: 0.0029740 seconds This means the application has run for 9.0448670 seconds. And only being stopped for 0.0029740 seconds. >From this log, gc did not stop the application. >Linux "top" shows single CPU at 100% and rest of the CPUs at 0% during the pause. Maybe something in the application or OS is running on 1 cpu and blocking other threads. Thanks, Jenny On 8/18/2014 1:34 PM, Krishnamurthy, Kannan wrote: Greetings, We are experiencing unexplained/unknown long pauses (8 seconds) during concurrent marking step of G1 collector. 2014-08-07T13:42:30.552-0400: 92183.303: [GC concurrent-root-region-scan-start] 2014-08-07T13:42:30.555-0400: 92183.305: [GC concurrent-root-region-scan-end, 0.0025230 secs] **2014-08-07T13:42:30.555-0400: 92183.305: [GC concurrent-mark-start]** **2014-08-07T13:42:39.598-0400: 92192.348: Application time: 9.0448670 seconds** 2014-08-07T13:42:39.601-0400: 92192.351: Total time for which application threads were stopped: 0.0029740 seconds 2014-08-07T13:42:39.603-0400: 92192.353: [GC pause (G1 Evacuation Pause) (young) 92192.354: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 7980, predicted base time: 28.19 ms, remaining time: 71.81 ms, target pause time: 100.00 ms `2014-08-07T13:42:30.555-0400: 92183.305` is when the concurrent mark starts, approximately after 2 seconds of this step the application starts to pause. However the GC logs claims the application was not paused during this window. Linux "top" shows single CPU at 100% and rest of the CPUs at 0% during the pause. Any help in understanding the root cause of this issue is appreciated. Our target JVMS: java version "1.7.0_04" Java(TM) SE Runtime Environment (build 1.7.0_04-b20) Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) java version "1.8.0_11" Java(TM) SE Runtime Environment (build 1.8.0_11-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode) Our JVM options : -Xms20G -Xmx20G -Xss10M -XX:PermSize=128M -XX:MaxPermSize=128M -XX:MarkStackSize=16M -XX:+UnlockDiagnosticVMOptions -XX:+G1PrintRegionLivenessInfo -XX:+TraceGCTaskThread -XX:+G1SummarizeConcMark -XX:+G1SummarizeRSetStats -XX:+G1TraceConcRefinement -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=65 -XX:ParallelGCThreads=24 -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/common/logs/ocean-partition-gc.log Thanks and regards, Kannan _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use