CMS large objects vs G1 humongous allocations

Mon Jan 30 13:38:41 UTC 2017

On Mon, Jan 30, 2017 at 8:19 AM, Amit Balode <amit.balode at gmail.com> wrote:

> Italy, thank you.
>
> There is no explicit line which says Full GC. I thought line printed as
> "(Heap Sizing) did not expand the heap, reason: heap already fully
> expanded" is an implicit indication that G1 have taken all necessary
> actions which full GC would take and it lead to 7.7sec pause.
>
Oh really, there's no Full GC that ensues after those lines?

The (Heap Sizing) output is G1's ergonomic output.  It's saying that it
would like to expand the heap because it failed to allocate a new region
out of the existing heap.  The reason for expansion is G1 may start out
with a heap smaller than capacity (i.e. max heap size), and will try to
expand the heap (if under capacity/max) as needed.  Here, it cannot expand
because you're already at max heap size.

The 7.7s pause is due to to-space exhaustion, which basically means G1 ran
out of space to copy survivors (and you can see that this 7s is all in the
Object Copy phase of the pause).  In my experience, when you start seeing
to-space exhaustion, you pretty soon see a Full GC.

>
> Sorry a bit confused about how can 'G1's concurrent cycles' & 'parallel
> phases of young collections' run concurrently? Does that
> mean ConcGCThreads + ParallelGCThreads cannot be greater than cpu cores.?
>
G1 has concurrent GC cycles, such as concurrent marking - this runs in the
background, on the ConcGCThreads, and marks old regions to gather their
liveness information (so that these regions can either be cleaned up in the
background, if fully empty, or they'll be added to young GCs to make them
mixed GCs).  At the same time, while that concurrent work is ongoing, your
Java threads continue to run and allocate.  You can then hit a young pause,
where ParallelGCThreads will perform some of the parallel work associated
with a young GC.  Young GCs look at eden regions (and do some processing of
them in parallel), while concurrent GC threads can be processing old
regions at the same time.  So yes, you can theoretically hit a case where
you have ConcGCThreads + ParallelGCThreads threads are runnable, and you
may oversubscribe the machine (assuming each of those threads don't stall
themselves due to internal issues, such as lock contention, etc).

>
> Good point about tuning G1MixedGCLiveThresholdPercent.
>
> On Mon, Jan 30, 2017 at 6:36 PM, Vitaly Davidovich <vitalyd at gmail.com>
> wrote:
>
>>
>>
>> On Mon, Jan 30, 2017 at 7:41 AM, Amit Balode <amit.balode at gmail.com>
>> wrote:
>>
>>> Hi Vitaly,
>>>
>>> JRE: 1.8u40
>>> Settings: -XX:+AlwaysPreTouch -XX:ConcGCThreads=5
>>> -XX:G1HeapRegionSize=33554432 -XX:InitialHeapSize=17179869184
>>> <(717)%20986-9184> -XX:MaxGCPauseMillis=40 -XX:MaxHeapSize=17179869184
>>> <(717)%20986-9184> -XX:MaxNewSize=10301210624
>>>
>>> full GC reclaim: What i meant is after heap was exhausted, A full pause
>>> was done actively to reclaim available pause.
>>>
>> Right, but how much of the heap was reclaimed? If very little, then your
>> max heap size is too low for your working set.  If it reclaimed a bunch,
>> then it's a matter of concurrent GC not keeping up.  If you can paste the
>> output of the Full GC phase, it would be helpful.
>>
>>>
>>>
>>> Yeah, the next thing I was going try was increasing ConcGCThreads to
>>> atleast 16 (given core count is 24). Is it possible that at any given time ConcGCThreads
>>> and ParallelGCThreads might be running at the same time?
>>>
>> Yes, they can run concurrently.  ConcGCThreads service the concurrent
>> phases of G1's concurrent cycles (e.g. concurrent marking), and the
>> ParallelGCThreads are used during young collections to perform the parallel
>> phases of young collections.
>>
>>>
>>> Could you elaborate why you think InitiatingHeapOccupancyPercent
>>> reduction might help? InitiatingHeapOccupancyPercent reduction means
>>> more frequent concurrent cycles so problem might happen sooner, no?
>>> Although combination of increase of ConcGCThreads  and reduction of
>>> InitiatingHeapOccupancyPercent might help.
>>>
>> Reduced IHOP makes concurrent cycles kick in sooner, and if the issue is
>> that concurrent marking cannot keep up, then making it start sooner could
>> help.  If you start concurrent marking sooner/earlier, it may be able to
>> complete (and thus clean up) before your Java threads completely fill the
>> heap and necessitate a Full GC.  That's the thinking behind that
>> suggestion, but it needs to be processed with respect to your actual
>> situation.
>>
>>>
>>> Pasting more logs as requested.
>>> {code}
>>> 2017-01-29T14:52:41.570+0000: 189073.759: [GC pause (G1 Humongous
>>> Allocation) (young) (initial-mark)
>>> Desired survivor size 603979776 bytes, new threshold 15 (max 15)
>>> - age   1:  302507344 bytes,  302507344 total
>>> - age   2:    4153408 bytes,  306660752 total
>>> - age   3:    3325224 bytes,  309985976 total
>>> - age   4:     482672 bytes,  310468648 total
>>> - age   5:     659992 bytes,  311128640 total
>>> - age   6:    1189240 bytes,  312317880 total
>>> - age   7:     645256 bytes,  312963136 total
>>> - age   8:     438800 bytes,  313401936 total
>>> - age   9:    1603768 bytes,  315005704 total
>>>  189073.759: [G1Ergonomics (CSet Construction) start choosing CSet,
>>> _pending_cards: 24579, predicted base time: 18.93 ms, remaining time: 21.07
>>> ms, target pause time: 40.00 ms]
>>>  189073.759: [G1Ergonomics (CSet Construction) add young regions to
>>> CSet, eden: 170 regions, survivors: 26 regions, predicted young region
>>> time: 10.94 ms]
>>>  189073.759: [G1Ergonomics (CSet Construction) finish choosing CSet,
>>> eden: 170 regions, survivors: 26 regions, old: 0 regions, predicted pause
>>> time: 29.87 ms, target pause time: 40.00 ms]
>>> , 0.0690000 secs]
>>>    [Parallel Time: 60.2 ms, GC Workers: 18]
>>>       [GC Worker Start (ms): Min: 189073760.5, Avg: 189073760.7, Max:
>>> 189073761.0, Diff: 0.5]
>>>       [Ext Root Scanning (ms): Min: 3.7, Avg: 4.2, Max: 4.5, Diff: 0.8,
>>> Sum: 75.1]
>>>       [Update RS (ms): Min: 6.9, Avg: 7.2, Max: 7.6, Diff: 0.7, Sum:
>>> 129.3]
>>>          [Processed Buffers: Min: 42, Avg: 62.8, Max: 93, Diff: 51, Sum:
>>> 1131]
>>>       [Scan RS (ms): Min: 6.4, Avg: 6.8, Max: 6.9, Diff: 0.5, Sum: 122.6]
>>>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>>> Sum: 0.0]
>>>       [Object Copy (ms): Min: 41.4, Avg: 41.5, Max: 41.8, Diff: 0.5,
>>> Sum: 746.6]
>>>       [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum:
>>> 1.3]
>>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>>> Sum: 0.7]
>>>       [GC Worker Total (ms): Min: 59.5, Avg: 59.8, Max: 60.0, Diff: 0.5,
>>> Sum: 1075.7]
>>>       [GC Worker End (ms): Min: 189073820.5, Avg: 189073820.5, Max:
>>> 189073820.5, Diff: 0.1]
>>>    [Code Root Fixup: 0.2 ms]
>>>    [Code Root Purge: 0.0 ms]
>>>    [Clear CT: 0.6 ms]
>>>    [Other: 8.0 ms]
>>>       [Choose CSet: 0.0 ms]
>>>       [Ref Proc: 3.2 ms]
>>>       [Ref Enq: 0.2 ms]
>>>       [Redirty Cards: 0.1 ms]
>>>       [Humongous Reclaim: 1.7 ms]
>>>       [Free CSet: 0.2 ms]
>>>    [Eden: 5440.0M(8288.0M)->0.0B(8960.0M) Survivors: 832.0M->864.0M
>>> Heap: 10.9G(16.0G)->2359.8M(16.0G)]
>>>  [Times: user=1.07 sys=0.00, real=0.07 secs]
>>> 2017-01-29T14:52:41.639+0000: 189073.828: [GC
>>> concurrent-root-region-scan-start]
>>> 2017-01-29T14:52:41.640+0000: 189073.828: Total time for which
>>> application threads were stopped: 0.0742949 seconds, Stopping threads took:
>>> 0.0004049 seconds
>>> 2017-01-29T14:52:41.680+0000: 189073.868: [GC
>>> concurrent-root-region-scan-end, 0.0402986 secs]
>>> 2017-01-29T14:52:41.680+0000: 189073.868: [GC concurrent-mark-start]
>>> 2017-01-29T14:52:41.892+0000: 189074.081: [GC concurrent-mark-end,
>>> 0.2123457 secs]
>>> 2017-01-29T14:52:41.896+0000: 189074.085: [GC remark 189074.085:
>>> [Finalize Marking, 0.0012962 secs] 189074.087: [GC ref-proc, 0.0043022
>>> secs] 189074.091: [Unloading, 0.0226266 secs], 0.0300066 secs]
>>>  [Times: user=0.34 sys=0.01, real=0.03 secs]
>>> 2017-01-29T14:52:41.927+0000: 189074.116: Total time for which
>>> application threads were stopped: 0.0351083 seconds, Stopping threads took:
>>> 0.0004268 seconds
>>> 2017-01-29T14:52:41.931+0000: 189074.120: [GC cleanup 2630M->2630M(16G),
>>> 0.0020884 secs]
>>>  [Times: user=0.02 sys=0.00, real=0.00 secs]
>>>
>> Let's note a few things here:
>> 1) concurrent marking completes fairly quickly
>> 2) No space is reclaimed.
>>
>> Another thing to consider, in light of this, is to
>> tweak G1MixedGCLiveThresholdPercent (it's an experimental feature, so
>> needs to be unlocked via -XX:+UnlockExperimentalVMOptions).  This value
>> (85 by default) controls the liveness of a region when it's considered for
>> mixed collections.  Any region that's X% (or more) garbage, where X is the
>> flag's value, is selected for mixed collections.  So with the default, a
>> region has to be at least 85% garbage to be considered for mixed GCs.  You
>> could experiment with a lower value to see if mixed GC can clean up a bit
>> more.
>>
>>
>>> 2017-01-29T14:52:41.933+0000: 189074.122: Total time for which
>>> application threads were stopped: 0.0062040 seconds, Stopping threads took:
>>> 0.0002670 seconds
>>> 2017-01-29T14:52:42.939+0000: 189075.127: Total time for which
>>> application threads were stopped: 0.0051245 seconds, Stopping threads took:
>>> 0.0004442 seconds
>>> 2017-01-29T14:52:43.944+0000: 189076.133: Total time for which
>>> application threads were stopped: 0.0052855 seconds, Stopping threads took:
>>> 0.0004542 seconds
>>> 2017-01-29T14:52:46.950+0000: 189079.139: Total time for which
>>> application threads were stopped: 0.0055019 seconds, Stopping threads took:
>>> 0.0006084 seconds
>>> 2017-01-29T14:52:47.955+0000: 189080.144: Total time for which
>>> application threads were stopped: 0.0050989 seconds, Stopping threads took:
>>> 0.0004534 seconds
>>> 2017-01-29T14:52:49.961+0000: 189082.149: Total time for which
>>> application threads were stopped: 0.0053802 seconds, Stopping threads took:
>>> 0.0006599 seconds
>>> 2017-01-29T14:52:51.966+0000: 189084.154: Total time for which
>>> application threads were stopped: 0.0047913 seconds, Stopping threads took:
>>> 0.0004446 seconds
>>> 2017-01-29T14:52:53.970+0000: 189086.158: Total time for which
>>> application threads were stopped: 0.0037714 seconds, Stopping threads took:
>>> 0.0002900 seconds
>>> 2017-01-29T14:52:55.974+0000: 189088.163: Total time for which
>>> application threads were stopped: 0.0044301 seconds, Stopping threads took:
>>> 0.0003547 seconds
>>> 2017-01-29T14:52:56.979+0000: 189089.167: Total time for which
>>> application threads were stopped: 0.0043758 seconds, Stopping threads took:
>>> 0.0003496 seconds
>>> 2017-01-29T14:52:57.983+0000: 189090.172: Total time for which
>>> application threads were stopped: 0.0043504 seconds, Stopping threads took:
>>> 0.0003511 seconds
>>> 2017-01-29T14:53:06.989+0000: 189099.177: Total time for which
>>> application threads were stopped: 0.0045634 seconds, Stopping threads took:
>>> 0.0004279 seconds
>>> 2017-01-29T14:53:08.993+0000: 189101.182: Total time for which
>>> application threads were stopped: 0.0043401 seconds, Stopping threads took:
>>> 0.0003958 seconds
>>> 2017-01-29T14:53:11.998+0000: 189104.186: Total time for which
>>> application threads were stopped: 0.0044074 seconds, Stopping threads took:
>>> 0.0003907 seconds
>>> 2017-01-29T14:53:14.770+0000: 189106.959: [GC pause (G1 Evacuation
>>> Pause) (young)
>>> Desired survivor size 654311424 bytes, new threshold 15 (max 15)
>>> - age   1:  240262896 bytes,  240262896 total
>>> - age   2:    3476760 bytes,  243739656 total
>>> - age   3:    3293240 bytes,  247032896 total
>>> - age   4:    3147072 bytes,  250179968 total
>>> - age   5:     420832 bytes,  250600800 total
>>> - age   6:     614688 bytes,  251215488 total
>>> - age   7:    1139960 bytes,  252355448 total
>>> - age   8:     632088 bytes,  252987536 total
>>> - age   9:     425488 bytes,  253413024 total
>>> - age  10:    1592608 bytes,  255005632 total
>>>  189106.960: [G1Ergonomics (CSet Construction) start choosing CSet,
>>> _pending_cards: 29363, predicted base time: 20.87 ms, remaining time: 19.13
>>> ms, target pause time: 40.00 ms]
>>>  189106.960: [G1Ergonomics (CSet Construction) add young regions to
>>> CSet, eden: 276 regions, survivors: 27 regions, predicted young region
>>> time: 12.11 ms]
>>>  189106.960: [G1Ergonomics (CSet Construction) finish choosing CSet,
>>> eden: 276 regions, survivors: 27 regions, old: 0 regions, predicted pause
>>> time: 32.98 ms, target pause time: 40.00 ms]
>>>  189106.961: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
>>> region allocation request failed, allocation request: 5549208 bytes]
>>>  189106.961: [G1Ergonomics (Heap Sizing) expand the heap, requested
>>> expansion amount: 5549208 bytes, attempted expansion amount: 33554432 bytes]
>>>  189106.961: [G1Ergonomics (Heap Sizing) did not expand the heap,
>>> reason: heap already fully expanded]
>>>  189114.730: [G1Ergonomics (Concurrent Cycles) do not request concurrent
>>> cycle initiation, reason: still doing mixed collections, occupancy:
>>> 13119782912 bytes, allocation request: 0 bytes, threshold: 7730941095 bytes
>>> (45.00 %), source: end of GC]
>>>  189114.730: [G1Ergonomics (Mixed GCs) start mixed GCs, reason:
>>> candidate old regions available, candidate old regions: 45 regions,
>>> reclaimable: 1026676456 bytes (5.98 %), threshold: 5.00 %]
>>>
>> So here it found ~1G reclaimable in old regions, but it's too late - it's
>> hitting to-space exhaustion right below.  This is what leads me to believe
>> that G1's concurrent cleanup is "losing the race" against the Java threads
>> (that are concurrently allocating).  Or, it could be that your max heap is
>> too small for your allocation rate and current GC params.  This is where
>> it's useful to see how much memory a Full GC reclaims.
>>
>>>  (to-space exhausted), 7.7714626 secs]
>>>    [Parallel Time: 7182.7 ms, GC Workers: 18]
>>>       [GC Worker Start (ms): Min: 189106960.8, Avg: 189106961.0, Max:
>>> 189106961.2, Diff: 0.4]
>>>       [Ext Root Scanning (ms): Min: 1.0, Avg: 1.7, Max: 2.5, Diff: 1.5,
>>> Sum: 30.9]
>>>       [Update RS (ms): Min: 2.1, Avg: 8.2, Max: 11.5, Diff: 9.4, Sum:
>>> 147.0]
>>>          [Processed Buffers: Min: 15, Avg: 62.9, Max: 130, Diff: 115,
>>> Sum: 1132]
>>>       [Scan RS (ms): Min: 4.0, Avg: 6.1, Max: 6.4, Diff: 2.4, Sum: 109.6]
>>>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>>> Sum: 0.0]
>>>       [Object Copy (ms): Min: 7164.5, Avg: 7166.1, Max: 7171.6, Diff:
>>> 7.1, Sum: 128989.0]
>>>       [Termination (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum:
>>> 3.9]
>>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>>> Sum: 0.4]
>>>       [GC Worker Total (ms): Min: 7182.1, Avg: 7182.3, Max: 7182.5,
>>> Diff: 0.4, Sum: 129280.9]
>>>       [GC Worker End (ms): Min: 189114143.2, Avg: 189114143.3, Max:
>>> 189114143.3, Diff: 0.1]
>>>    [Code Root Fixup: 0.3 ms]
>>>    [Code Root Purge: 0.0 ms]
>>>    [Clear CT: 0.7 ms]
>>>    [Other: 587.7 ms]
>>>       [Evacuation Failure: 579.6 ms]
>>>       [Choose CSet: 0.0 ms]
>>>       [Ref Proc: 2.8 ms]
>>>       [Ref Enq: 0.5 ms]
>>>       [Redirty Cards: 1.4 ms]
>>>       [Humongous Reclaim: 0.8 ms]
>>>       [Free CSet: 0.2 ms]
>>>    [Eden: 8832.0M(8960.0M)->0.0B(800.0M) Survivors: 864.0M->0.0B Heap:
>>> 13.9G(16.0G)->11.6G(16.0G)]
>>>  [Times: user=16.27 sys=0.35, real=7.77 secs]
>>> 2017-01-29T14:53:22.542+0000: 189114.731: Total time for which
>>> application threads were stopped: 7.7764915 seconds, Stopping threads took:
>>> 0.0003511 seconds
>>> {code}
>>>
>>>
>>> On Mon, Jan 30, 2017 at 5:14 PM, Vitaly Davidovich <vitalyd at gmail.com>
>>> wrote:
>>>
>>>> I don't believe region sizes >32MB are planned, at least that wasn't
>>>> the case before - maybe that's changed though.
>>>>
>>>> Some tuning options to consider:
>>>> 1) Increase max heap size (this is always a blunt but frequently
>>>> effective instrument)
>>>> 2) Turn InitiatingHeapOccupancyPercent down from its default 45 value.
>>>> 3) Increase ConcGCThreads
>>>>
>>>> What version of the JRE are you running? From the short GC log snippet
>>>> you pasted, it looks like G1 concurrent cycles are losing the race to your
>>>> Java threads (i.e. you exhaust the heap before concurrent phase can clean
>>>> up).
>>>>
>>>> How much does a full GC reclaim? You may want to provide more of your
>>>> GC log - someone is going to ask for that sooner or later :).
>>>>
>>>>
>>>> On Mon, Jan 30, 2017 at 3:24 AM Amit Balode <amit.balode at gmail.com>
>>>> wrote:
>>>>
>>>>> Hello, so the only reason we decided to move to G1 from CMS was due to
>>>>> fragmentation issues of CMS. After moving to G1, we have started humongous
>>>>> allocations resulting in Full GC's. We have some large objects which are
>>>>> allocated in successive order resulting into this issue but the use case
>>>>> for continious allocation from application side seems genuine. Heap
>>>>> is getting full so thats explainable why full GC is happening although this
>>>>> issue did not happen in CMS of full GC.
>>>>>
>>>>> a) Currently running with max 32MB region size, humongous allocations
>>>>> as high as 21MB are happening. So for such large allocation, 32MB seems
>>>>> smaller, might be 64MB would have been appropriate, is that an option
>>>>> available in future G1 releases?
>>>>> b) Given that application behaviour cannot be changed much which can
>>>>> stop continuous large allocation what are some G1 specific settings to tune
>>>>> to make it more resilient?
>>>>>
>>>>> Below is the snippet from application running with 16GB heap, 40ms
>>>>> pause time and 32MB region size.
>>>>> {code}
>>>>> 2017-01-29T14:53:14.770+0000: 189106.959: [GC pause (G1 Evacuation
>>>>> Pause) (young)
>>>>> Desired survivor size 654311424 bytes, new threshold 15 (max 15)
>>>>> - age   1:  240262896 bytes,  240262896 total
>>>>> - age   2:    3476760 bytes,  243739656 total
>>>>> - age   3:    3293240 bytes,  247032896 total
>>>>> - age   4:    3147072 bytes,  250179968 total
>>>>> - age   5:     420832 bytes,  250600800 total
>>>>> - age   6:     614688 bytes,  251215488 total
>>>>> - age   7:    1139960 bytes,  252355448 total
>>>>> - age   8:     632088 bytes,  252987536 total
>>>>> - age   9:     425488 bytes,  253413024 total
>>>>> - age  10:    1592608 bytes,  255005632 total
>>>>>  189106.960: [G1Ergonomics (CSet Construction) start choosing CSet,
>>>>> _pending_cards: 29363, predicted base time: 20.87 ms, remaining time: 19.13
>>>>> ms, target pause time: 40.00 ms]
>>>>>  189106.960: [G1Ergonomics (CSet Construction) add young regions to
>>>>> CSet, eden: 276 regions, survivors: 27 regions, predicted young region
>>>>> time: 12.11 ms]
>>>>>  189106.960: [G1Ergonomics (CSet Construction) finish choosing CSet,
>>>>> eden: 276 regions, survivors: 27 regions, old: 0 regions, predicted pause
>>>>> time: 32.98 ms, target pause time:
>>>>> 40.00 ms]
>>>>>  189106.961: [G1Ergonomics (Heap Sizing) attempt heap expansion,
>>>>> reason: region allocation request failed, allocation request: 5549208 bytes]
>>>>>  189106.961: [G1Ergonomics (Heap Sizing) expand the heap, requested
>>>>> expansion amount: 5549208 bytes, attempted expansion amount: 33554432 bytes]
>>>>>  189106.961: [G1Ergonomics (Heap Sizing) did not expand the heap,
>>>>> reason: heap already fully expanded]
>>>>>  189114.730: [G1Ergonomics (Concurrent Cycles) do not request
>>>>> concurrent cycle initiation, reason: still doing mixed collections,
>>>>> occupancy: 13119782912 bytes, allocation request
>>>>> : 0 bytes, threshold: 7730941095 bytes (45.00 %), source: end of GC]
>>>>> 189114.730: [G1Ergonomics (Mixed GCs) start mixed GCs, reason:
>>>>> candidate old regions available, candidate old regions: 45 regions,
>>>>> reclaimable: 1026676456 bytes (5.98 %), threshold: 5.00 %]
>>>>>  (to-space exhausted), *7.7714626 secs*]
>>>>>    [Parallel Time: 7182.7 ms, GC Workers: 18]
>>>>> {code}
>>>>>
>>>>> --
>>>>> Thanks & Regards,
>>>>> Amit
>>>>>
>>>>> _______________________________________________
>>>>> hotspot-gc-use mailing list
>>>>> hotspot-gc-use at openjdk.java.net
>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>
>>>> --
>>>> Sent from my phone
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks & Regards,
>>> Amit.Balode
>>>
>>>
>>
>
>
> --
> Thanks & Regards,
> Amit.Balode
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20170130/530b04cb/attachment-0001.html>