CMS large objects vs G1 humongous allocations

Mon Jan 30 13:06:34 UTC 2017

On Mon, Jan 30, 2017 at 7:41 AM, Amit Balode <amit.balode at gmail.com> wrote:

> Hi Vitaly,
>
> JRE: 1.8u40
> Settings: -XX:+AlwaysPreTouch -XX:ConcGCThreads=5
> -XX:G1HeapRegionSize=33554432 -XX:InitialHeapSize=17179869184
> <(717)%20986-9184> -XX:MaxGCPauseMillis=40 -XX:MaxHeapSize=17179869184
> <(717)%20986-9184> -XX:MaxNewSize=10301210624
>
> full GC reclaim: What i meant is after heap was exhausted, A full pause
> was done actively to reclaim available pause.
>
Right, but how much of the heap was reclaimed? If very little, then your
max heap size is too low for your working set.  If it reclaimed a bunch,
then it's a matter of concurrent GC not keeping up.  If you can paste the
output of the Full GC phase, it would be helpful.

>
>
> Yeah, the next thing I was going try was increasing ConcGCThreads to
> atleast 16 (given core count is 24). Is it possible that at any given time ConcGCThreads
> and ParallelGCThreads might be running at the same time?
>
Yes, they can run concurrently.  ConcGCThreads service the concurrent
phases of G1's concurrent cycles (e.g. concurrent marking), and the
ParallelGCThreads are used during young collections to perform the parallel
phases of young collections.

>
> Could you elaborate why you think InitiatingHeapOccupancyPercent
> reduction might help? InitiatingHeapOccupancyPercent reduction means more
> frequent concurrent cycles so problem might happen sooner, no? Although
> combination of increase of ConcGCThreads  and reduction of
> InitiatingHeapOccupancyPercent might help.
>
Reduced IHOP makes concurrent cycles kick in sooner, and if the issue is
that concurrent marking cannot keep up, then making it start sooner could
help.  If you start concurrent marking sooner/earlier, it may be able to
complete (and thus clean up) before your Java threads completely fill the
heap and necessitate a Full GC.  That's the thinking behind that
suggestion, but it needs to be processed with respect to your actual
situation.

>
> Pasting more logs as requested.
> {code}
> 2017-01-29T14:52:41.570+0000: 189073.759: [GC pause (G1 Humongous
> Allocation) (young) (initial-mark)
> Desired survivor size 603979776 bytes, new threshold 15 (max 15)
> - age   1:  302507344 bytes,  302507344 total
> - age   2:    4153408 bytes,  306660752 total
> - age   3:    3325224 bytes,  309985976 total
> - age   4:     482672 bytes,  310468648 total
> - age   5:     659992 bytes,  311128640 total
> - age   6:    1189240 bytes,  312317880 total
> - age   7:     645256 bytes,  312963136 total
> - age   8:     438800 bytes,  313401936 total
> - age   9:    1603768 bytes,  315005704 total
>  189073.759: [G1Ergonomics (CSet Construction) start choosing CSet,
> _pending_cards: 24579, predicted base time: 18.93 ms, remaining time: 21.07
> ms, target pause time: 40.00 ms]
>  189073.759: [G1Ergonomics (CSet Construction) add young regions to CSet,
> eden: 170 regions, survivors: 26 regions, predicted young region time:
> 10.94 ms]
>  189073.759: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
> 170 regions, survivors: 26 regions, old: 0 regions, predicted pause time:
> 29.87 ms, target pause time: 40.00 ms]
> , 0.0690000 secs]
>    [Parallel Time: 60.2 ms, GC Workers: 18]
>       [GC Worker Start (ms): Min: 189073760.5, Avg: 189073760.7, Max:
> 189073761.0, Diff: 0.5]
>       [Ext Root Scanning (ms): Min: 3.7, Avg: 4.2, Max: 4.5, Diff: 0.8,
> Sum: 75.1]
>       [Update RS (ms): Min: 6.9, Avg: 7.2, Max: 7.6, Diff: 0.7, Sum: 129.3]
>          [Processed Buffers: Min: 42, Avg: 62.8, Max: 93, Diff: 51, Sum:
> 1131]
>       [Scan RS (ms): Min: 6.4, Avg: 6.8, Max: 6.9, Diff: 0.5, Sum: 122.6]
>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> Sum: 0.0]
>       [Object Copy (ms): Min: 41.4, Avg: 41.5, Max: 41.8, Diff: 0.5, Sum:
> 746.6]
>       [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.3]
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
> 0.7]
>       [GC Worker Total (ms): Min: 59.5, Avg: 59.8, Max: 60.0, Diff: 0.5,
> Sum: 1075.7]
>       [GC Worker End (ms): Min: 189073820.5, Avg: 189073820.5, Max:
> 189073820.5, Diff: 0.1]
>    [Code Root Fixup: 0.2 ms]
>    [Code Root Purge: 0.0 ms]
>    [Clear CT: 0.6 ms]
>    [Other: 8.0 ms]
>       [Choose CSet: 0.0 ms]
>       [Ref Proc: 3.2 ms]
>       [Ref Enq: 0.2 ms]
>       [Redirty Cards: 0.1 ms]
>       [Humongous Reclaim: 1.7 ms]
>       [Free CSet: 0.2 ms]
>    [Eden: 5440.0M(8288.0M)->0.0B(8960.0M) Survivors: 832.0M->864.0M Heap:
> 10.9G(16.0G)->2359.8M(16.0G)]
>  [Times: user=1.07 sys=0.00, real=0.07 secs]
> 2017-01-29T14:52:41.639+0000: 189073.828: [GC concurrent-root-region-scan-
> start]
> 2017-01-29T14:52:41.640+0000: 189073.828: Total time for which application
> threads were stopped: 0.0742949 seconds, Stopping threads took: 0.0004049
> seconds
> 2017-01-29T14:52:41.680+0000: 189073.868: [GC concurrent-root-region-scan-end,
> 0.0402986 secs]
> 2017-01-29T14:52:41.680+0000: 189073.868: [GC concurrent-mark-start]
> 2017-01-29T14:52:41.892+0000: 189074.081: [GC concurrent-mark-end,
> 0.2123457 secs]
> 2017-01-29T14:52:41.896+0000: 189074.085: [GC remark 189074.085: [Finalize
> Marking, 0.0012962 secs] 189074.087: [GC ref-proc, 0.0043022 secs]
> 189074.091: [Unloading, 0.0226266 secs], 0.0300066 secs]
>  [Times: user=0.34 sys=0.01, real=0.03 secs]
> 2017-01-29T14:52:41.927+0000: 189074.116: Total time for which application
> threads were stopped: 0.0351083 seconds, Stopping threads took: 0.0004268
> seconds
> 2017-01-29T14:52:41.931+0000: 189074.120: [GC cleanup 2630M->2630M(16G),
> 0.0020884 secs]
>  [Times: user=0.02 sys=0.00, real=0.00 secs]
>
Let's note a few things here:
1) concurrent marking completes fairly quickly
2) No space is reclaimed.

Another thing to consider, in light of this, is to
tweak G1MixedGCLiveThresholdPercent (it's an experimental feature, so needs
to be unlocked via -XX:+UnlockExperimentalVMOptions).  This value (85 by
default) controls the liveness of a region when it's considered for mixed
collections.  Any region that's X% (or more) garbage, where X is the flag's
value, is selected for mixed collections.  So with the default, a region
has to be at least 85% garbage to be considered for mixed GCs.  You could
experiment with a lower value to see if mixed GC can clean up a bit more.

> 2017-01-29T14:52:41.933+0000: 189074.122: Total time for which application
> threads were stopped: 0.0062040 seconds, Stopping threads took: 0.0002670
> seconds
> 2017-01-29T14:52:42.939+0000: 189075.127: Total time for which application
> threads were stopped: 0.0051245 seconds, Stopping threads took: 0.0004442
> seconds
> 2017-01-29T14:52:43.944+0000: 189076.133: Total time for which application
> threads were stopped: 0.0052855 seconds, Stopping threads took: 0.0004542
> seconds
> 2017-01-29T14:52:46.950+0000: 189079.139: Total time for which application
> threads were stopped: 0.0055019 seconds, Stopping threads took: 0.0006084
> seconds
> 2017-01-29T14:52:47.955+0000: 189080.144: Total time for which application
> threads were stopped: 0.0050989 seconds, Stopping threads took: 0.0004534
> seconds
> 2017-01-29T14:52:49.961+0000: 189082.149: Total time for which application
> threads were stopped: 0.0053802 seconds, Stopping threads took: 0.0006599
> seconds
> 2017-01-29T14:52:51.966+0000: 189084.154: Total time for which application
> threads were stopped: 0.0047913 seconds, Stopping threads took: 0.0004446
> seconds
> 2017-01-29T14:52:53.970+0000: 189086.158: Total time for which application
> threads were stopped: 0.0037714 seconds, Stopping threads took: 0.0002900
> seconds
> 2017-01-29T14:52:55.974+0000: 189088.163: Total time for which application
> threads were stopped: 0.0044301 seconds, Stopping threads took: 0.0003547
> seconds
> 2017-01-29T14:52:56.979+0000: 189089.167: Total time for which application
> threads were stopped: 0.0043758 seconds, Stopping threads took: 0.0003496
> seconds
> 2017-01-29T14:52:57.983+0000: 189090.172: Total time for which application
> threads were stopped: 0.0043504 seconds, Stopping threads took: 0.0003511
> seconds
> 2017-01-29T14:53:06.989+0000: 189099.177: Total time for which application
> threads were stopped: 0.0045634 seconds, Stopping threads took: 0.0004279
> seconds
> 2017-01-29T14:53:08.993+0000: 189101.182: Total time for which application
> threads were stopped: 0.0043401 seconds, Stopping threads took: 0.0003958
> seconds
> 2017-01-29T14:53:11.998+0000: 189104.186: Total time for which application
> threads were stopped: 0.0044074 seconds, Stopping threads took: 0.0003907
> seconds
> 2017-01-29T14:53:14.770+0000: 189106.959: [GC pause (G1 Evacuation Pause)
> (young)
> Desired survivor size 654311424 bytes, new threshold 15 (max 15)
> - age   1:  240262896 bytes,  240262896 total
> - age   2:    3476760 bytes,  243739656 total
> - age   3:    3293240 bytes,  247032896 total
> - age   4:    3147072 bytes,  250179968 total
> - age   5:     420832 bytes,  250600800 total
> - age   6:     614688 bytes,  251215488 total
> - age   7:    1139960 bytes,  252355448 total
> - age   8:     632088 bytes,  252987536 total
> - age   9:     425488 bytes,  253413024 total
> - age  10:    1592608 bytes,  255005632 total
>  189106.960: [G1Ergonomics (CSet Construction) start choosing CSet,
> _pending_cards: 29363, predicted base time: 20.87 ms, remaining time: 19.13
> ms, target pause time: 40.00 ms]
>  189106.960: [G1Ergonomics (CSet Construction) add young regions to CSet,
> eden: 276 regions, survivors: 27 regions, predicted young region time:
> 12.11 ms]
>  189106.960: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
> 276 regions, survivors: 27 regions, old: 0 regions, predicted pause time:
> 32.98 ms, target pause time: 40.00 ms]
>  189106.961: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
> region allocation request failed, allocation request: 5549208 bytes]
>  189106.961: [G1Ergonomics (Heap Sizing) expand the heap, requested
> expansion amount: 5549208 bytes, attempted expansion amount: 33554432 bytes]
>  189106.961: [G1Ergonomics (Heap Sizing) did not expand the heap, reason:
> heap already fully expanded]
>  189114.730: [G1Ergonomics (Concurrent Cycles) do not request concurrent
> cycle initiation, reason: still doing mixed collections, occupancy:
> 13119782912 bytes, allocation request: 0 bytes, threshold: 7730941095 bytes
> (45.00 %), source: end of GC]
>  189114.730: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: candidate
> old regions available, candidate old regions: 45 regions, reclaimable:
> 1026676456 bytes (5.98 %), threshold: 5.00 %]
>
So here it found ~1G reclaimable in old regions, but it's too late - it's
hitting to-space exhaustion right below.  This is what leads me to believe
that G1's concurrent cleanup is "losing the race" against the Java threads
(that are concurrently allocating).  Or, it could be that your max heap is
too small for your allocation rate and current GC params.  This is where
it's useful to see how much memory a Full GC reclaims.

>  (to-space exhausted), 7.7714626 secs]
>    [Parallel Time: 7182.7 ms, GC Workers: 18]
>       [GC Worker Start (ms): Min: 189106960.8, Avg: 189106961.0, Max:
> 189106961.2, Diff: 0.4]
>       [Ext Root Scanning (ms): Min: 1.0, Avg: 1.7, Max: 2.5, Diff: 1.5,
> Sum: 30.9]
>       [Update RS (ms): Min: 2.1, Avg: 8.2, Max: 11.5, Diff: 9.4, Sum:
> 147.0]
>          [Processed Buffers: Min: 15, Avg: 62.9, Max: 130, Diff: 115, Sum:
> 1132]
>       [Scan RS (ms): Min: 4.0, Avg: 6.1, Max: 6.4, Diff: 2.4, Sum: 109.6]
>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> Sum: 0.0]
>       [Object Copy (ms): Min: 7164.5, Avg: 7166.1, Max: 7171.6, Diff: 7.1,
> Sum: 128989.0]
>       [Termination (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 3.9]
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum:
> 0.4]
>       [GC Worker Total (ms): Min: 7182.1, Avg: 7182.3, Max: 7182.5, Diff:
> 0.4, Sum: 129280.9]
>       [GC Worker End (ms): Min: 189114143.2, Avg: 189114143.3, Max:
> 189114143.3, Diff: 0.1]
>    [Code Root Fixup: 0.3 ms]
>    [Code Root Purge: 0.0 ms]
>    [Clear CT: 0.7 ms]
>    [Other: 587.7 ms]
>       [Evacuation Failure: 579.6 ms]
>       [Choose CSet: 0.0 ms]
>       [Ref Proc: 2.8 ms]
>       [Ref Enq: 0.5 ms]
>       [Redirty Cards: 1.4 ms]
>       [Humongous Reclaim: 0.8 ms]
>       [Free CSet: 0.2 ms]
>    [Eden: 8832.0M(8960.0M)->0.0B(800.0M) Survivors: 864.0M->0.0B Heap:
> 13.9G(16.0G)->11.6G(16.0G)]
>  [Times: user=16.27 sys=0.35, real=7.77 secs]
> 2017-01-29T14:53:22.542+0000: 189114.731: Total time for which application
> threads were stopped: 7.7764915 seconds, Stopping threads took: 0.0003511
> seconds
> {code}
>
>
> On Mon, Jan 30, 2017 at 5:14 PM, Vitaly Davidovich <vitalyd at gmail.com>
> wrote:
>
>> I don't believe region sizes >32MB are planned, at least that wasn't the
>> case before - maybe that's changed though.
>>
>> Some tuning options to consider:
>> 1) Increase max heap size (this is always a blunt but frequently
>> effective instrument)
>> 2) Turn InitiatingHeapOccupancyPercent down from its default 45 value.
>> 3) Increase ConcGCThreads
>>
>> What version of the JRE are you running? From the short GC log snippet
>> you pasted, it looks like G1 concurrent cycles are losing the race to your
>> Java threads (i.e. you exhaust the heap before concurrent phase can clean
>> up).
>>
>> How much does a full GC reclaim? You may want to provide more of your GC
>> log - someone is going to ask for that sooner or later :).
>>
>>
>> On Mon, Jan 30, 2017 at 3:24 AM Amit Balode <amit.balode at gmail.com>
>> wrote:
>>
>>> Hello, so the only reason we decided to move to G1 from CMS was due to
>>> fragmentation issues of CMS. After moving to G1, we have started humongous
>>> allocations resulting in Full GC's. We have some large objects which are
>>> allocated in successive order resulting into this issue but the use case
>>> for continious allocation from application side seems genuine. Heap is
>>> getting full so thats explainable why full GC is happening although this
>>> issue did not happen in CMS of full GC.
>>>
>>> a) Currently running with max 32MB region size, humongous allocations as
>>> high as 21MB are happening. So for such large allocation, 32MB seems
>>> smaller, might be 64MB would have been appropriate, is that an option
>>> available in future G1 releases?
>>> b) Given that application behaviour cannot be changed much which can
>>> stop continuous large allocation what are some G1 specific settings to tune
>>> to make it more resilient?
>>>
>>> Below is the snippet from application running with 16GB heap, 40ms pause
>>> time and 32MB region size.
>>> {code}
>>> 2017-01-29T14:53:14.770+0000: 189106.959: [GC pause (G1 Evacuation
>>> Pause) (young)
>>> Desired survivor size 654311424 bytes, new threshold 15 (max 15)
>>> - age   1:  240262896 bytes,  240262896 total
>>> - age   2:    3476760 bytes,  243739656 total
>>> - age   3:    3293240 bytes,  247032896 total
>>> - age   4:    3147072 bytes,  250179968 total
>>> - age   5:     420832 bytes,  250600800 total
>>> - age   6:     614688 bytes,  251215488 total
>>> - age   7:    1139960 bytes,  252355448 total
>>> - age   8:     632088 bytes,  252987536 total
>>> - age   9:     425488 bytes,  253413024 total
>>> - age  10:    1592608 bytes,  255005632 total
>>>  189106.960: [G1Ergonomics (CSet Construction) start choosing CSet,
>>> _pending_cards: 29363, predicted base time: 20.87 ms, remaining time: 19.13
>>> ms, target pause time: 40.00 ms]
>>>  189106.960: [G1Ergonomics (CSet Construction) add young regions to
>>> CSet, eden: 276 regions, survivors: 27 regions, predicted young region
>>> time: 12.11 ms]
>>>  189106.960: [G1Ergonomics (CSet Construction) finish choosing CSet,
>>> eden: 276 regions, survivors: 27 regions, old: 0 regions, predicted pause
>>> time: 32.98 ms, target pause time:
>>> 40.00 ms]
>>>  189106.961: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
>>> region allocation request failed, allocation request: 5549208 bytes]
>>>  189106.961: [G1Ergonomics (Heap Sizing) expand the heap, requested
>>> expansion amount: 5549208 bytes, attempted expansion amount: 33554432 bytes]
>>>  189106.961: [G1Ergonomics (Heap Sizing) did not expand the heap,
>>> reason: heap already fully expanded]
>>>  189114.730: [G1Ergonomics (Concurrent Cycles) do not request concurrent
>>> cycle initiation, reason: still doing mixed collections, occupancy:
>>> 13119782912 bytes, allocation request
>>> : 0 bytes, threshold: 7730941095 bytes (45.00 %), source: end of GC]
>>> 189114.730: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: candidate
>>> old regions available, candidate old regions: 45 regions, reclaimable:
>>> 1026676456 bytes (5.98 %), threshold: 5.00 %]
>>>  (to-space exhausted), *7.7714626 secs*]
>>>    [Parallel Time: 7182.7 ms, GC Workers: 18]
>>> {code}
>>>
>>> --
>>> Thanks & Regards,
>>> Amit
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>> --
>> Sent from my phone
>>
>
>
>
> --
> Thanks & Regards,
> Amit.Balode
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170130/d030667f/attachment-0001.html>