Odd G1GC behavior on 8u91

Thu Aug 25 11:15:11 UTC 2016

Hi Jenny,

Thanks very much for your analysis.  A few comments inline below.

On Wednesday, August 24, 2016, yu.zhang at oracle.com
<javascript:_e(%7B%7D,'cvml','yu.zhang at oracle.com');> <yu.zhang at oracle.com
<javascript:_e(%7B%7D,'cvml','yu.zhang at oracle.com');>> wrote:

> Vitaly,
>
> Thanks for the gc logs.
>
> Before reaching full gc and the entry at
>
> 2016-08-24T15:29:12.302+0000: 17776.029: [GC pause (G1 Evacuation Pause)
> (young)
>
> ...
>
> [Eden: 0.0B(30.0G)->0.0B(30.0G) Survivors: 0.0B->0.0B Heap:
> 95.2G(96.0G)->95.2G(96.0G)]
> There are several young gcs with 'to-space exhausted'. For example
>
> "2016-08-24T15:29:03.936+0000: 17767.663: [GC pause (G1 Evacuation Pause)
> (young)
> Desired survivor size 1795162112 bytes, new threshold 2 (max 2)
>  17767.663: [G1Ergonomics (CSet Construction) start choosing CSet,
> _pending_cards: 185708, predicted base time: 46.71 ms, remaining time:
> 253.29 ms, target pause time: 300.00 ms]
>  17767.663: [G1Ergonomics (CSet Construction) add young regions to CSet,
> eden: 159 regions, survivors: 0 regions, predicted young region time:
> 173.05 ms]
>  17767.663: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
> 159 regions, survivors: 0 regions, old: 0 regions, predicted pause time:
> 219.76 ms, target pause time: 300.00 ms]
>  17767.664: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
> region allocation request failed, allocation request: 16340088 bytes]
>  17767.664: [G1Ergonomics (Heap Sizing) expand the heap, requested
> expansion amount: 16340088 bytes, attempted expansion amount: 33554432
> bytes]
>  17767.664: [G1Ergonomics (Heap Sizing) did not expand the heap, reason:
> heap already fully expanded]
> ...
>
> (to-space exhausted), 8.0293588 secs]
>    [Parallel Time: 6920.2 ms, GC Workers: 23]
> ...
>    [Other: 1106.9 ms]
>       [Evacuation Failure: 1095.6 ms]
>       [Choose CSet: 0.0 ms]
>       [Ref Proc: 4.4 ms]
>       [Ref Enq: 0.4 ms]
>       [Redirty Cards: 4.8 ms]
>       [Humongous Register: 0.2 ms]
>       [Humongous Reclaim: 0.2 ms]
>       [Free CSet: 0.3 ms]
>    [Eden: 5088.0M(30.0G)->0.0B(30.0G) Survivors: 0.0B->0.0B Heap:
> 95.2G(96.0G)->95.2G(96.0G)]
>  [Times: user=68.75 sys=1.66, real=8.03 secs]
>
> Please note the high "Evacuation Failure" time. Some of those young gcs
> might collect some. Some (like this one) do not collect any heap.
>
> (to-space exhausted) means it can not find enough old regions when
> evacuating from the young regions.
>
The above log also says that Eden occupancy went from ~5G to 0.  Given the
to-space exhaustion and Survivors being 0, that implies it was all
garbage.  If that's the case, what exactly did it fail to evacuate? If it
failed to evacuate them, where did they go given Eden + Survivors are 0.
Also, Survivors was apparently 0 to start with, so it's unclear to me what
exactly happened here.

>
>
> There are only 4 marking cycles, 3 of which are at the beginning due to
> meta data space. Mix gc is not trigger at that time since the heap is not
> full enough.
>
> There is only 1 marking cycle requested due to heap occupancy, but it is
> too late. The heap is very full already. No mixed gc found in the log.
> "request concurrent cycle initiation, reason: occupancy higher than
> threshold, occupancy: 88080384000 bytes, allocation request: 0 bytes,
> threshold: 77309411325 bytes (75.00 %), source: end of GC]"
>
Any ideas why the concurrent cycle is initiated when occupancy is already
well above the 75% threshold? Why wouldn't it trigger at very close to the
threshold? It's "late" by about 10G here.

>
>
> To avoid this, as others suggested, you can try
> 1. set a reasonable pause time goal and let g1 decide the young gen size
>
Is there a rule of thumb for picking reasonable pause time goals based on
heap size/alloc rate/etc?

>
> 2. if you have to set the young gen size fixed, reduce the NewSize,
> increase max tenure threshold to let objects die in young gen.
> 3. Reduce InitiatingHeapOccupancyPercent
> The goal is to trigger more mixed gc and avoid to-space exhausted.
>
Right, we're trying with a lower IHOP value (55%).

>
>
> The message after 'to-space exhausted' might be confusing. I need to
> discuss this with dev team. For example, at time stamp
> 2016-08-24T15:28:05.905+0000: 17709.633: [GC pause (G1 Evacuation Pause)
> (young)
> ...
> (to-space exhausted), 2.6149566 secs]
> ...
>
> [Eden: 28.4G(28.4G)->0.0B(28.4G) Survivors: 1664.0M->1664.0M Heap:
> 93.5G(96.0G)->73.9G(96.0G)]
>
> the eden used after gc might not be true. I will do some investigation and
> get back to you.
>
Thanks.  Yeah it's confusing.  I'm still not sure I understand the log
snippet I pasted in my initial email of the young evac immediately
preceding the Full GC - it showed 0 regions in the CSet, so nothing was
evacuated, but it also showed Eden occupancy of 0.  It's then unclear why
the Full GC triggers immediately after due to a 32 byte alloc request.

Do you think that may be a bogus log as well?

>
>
> Thanks
> Jenny
>
Thanks again Jenny, this is very helpful.

>
>
> On 08/24/2016 03:50 PM, Vitaly Davidovich wrote:
>
>
>
> On Wed, Aug 24, 2016 at 6:18 PM, Jenny Zhang <yu.zhang at oracle.com> wrote:
>
>> More comments about the questions
>>
>> Thanks
>> Jenny
>>
>> On 8/24/2016 11:43 AM, Vitaly Davidovich wrote:
>>
>>> Right before the Full GC, ergonomics report a failure to expand the heap
>>> due to an allocation request of 32 bytes.  Is this implying that a mutator
>>> tried to allocate 32 bytes but couldn't? How do I reconcile that with
>>> Eden+Survivor occupancy reported right above that?
>>>
>> Yes, it means the mutator tries to allocate 32byte but can not get it.
>> Heap won't expand as it already reaches max heap.
>>
>> Do you see any humongous objects allocatoin?
>
> As mentioned in my previous email, I don't see any humongous allocations
> recorded.  The Humongous Register/Reclaim output does show non-zero timing,
> so not sure if the log is simply missing them or something else is going on.
>
>>
>>
>>> Young gen is sized to 30GB, total heap is 96GB.  Allocation rate of the
>>> application is roughly 1GB/s.  Am I correct in assuming that allocation is
>>> outpacing concurrent marking, based on the above? What tunable(s) would you
>>> advise to tweak to get G1 to keep up with the allocation rate? I'm ok
>>> taking some throughput hit to mitigate 90s+ pauses.
>>>
>>> The entire log might give a better picture. Especially if the marking
>> cycle is triggered, how well the mixed gc cleans up the heap.
>>
>> Are there particular gc lines you're looking for? I can grep for them
> quickly and provide that for you.
>
>

-- 
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20160825/d3e043ac/attachment-0001.html>