Odd G1GC behavior on 8u91

Thu Aug 25 13:43:08 UTC 2016

So, tried to run with IHOP=55 today, and thinks don't look better - lengthy
Object Copy times, and lengthy Finalize Marking phases.  E.g.:

2016-08-25T11:23:43.191+0000: 3509.335: [GC pause (GCLocker Initiated GC)
(young)
Desired survivor size 1795162112 bytes, new threshold 2 (max 2)
- age   1: 1793677792 bytes, 1793677792 total
- age   2:  641809160 bytes, 2435486952 total
 3509.335: [G1Ergonomics (CSet Construction) start choosing CSet,
_pending_cards: 152524, predicted base time: 45.68 ms, remaining time:
254.32 ms, target pause time: 300.00 ms]
 3509.335: [G1Ergonomics (CSet Construction) add young regions to CSet,
eden: 872 regions, survivors: 89 regions, predicted young region time:
20434.47 ms]
 3509.335: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
872 regions, survivors: 89 regions, old: 0 regions, predicted pause time:
20480.15 ms, target pause time: 300.00 ms]
2016-08-25T11:24:09.459+0000: 3535.603: [SoftReference, 0 refs, 0.0011022
secs]2016-08-25T11:24:09.460+0000: 3535.604: [WeakReference, 1 refs,
0.0006259 secs]2016-08-25T11:24:09.461+0000: 3535.605: [FinalReference,
2131 refs, 0.0008182 secs]2016-08-25T11:24:09.462+0000: 3535.606:
[PhantomReference, 0 refs, 0 refs, 0.0013352
secs]2016-08-25T11:24:09.463+0000: 3535.607: [JNI Weak Reference, 0.0000145
secs] 3535.621: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
recent GC overhead higher than threshold after GC, recent GC overhead:
47.64 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated expansion
amount: 0 bytes (20.00 %)]
, 26.2863292 secs]
   [Parallel Time: 26267.1 ms, GC Workers: 23]
      [GC Worker Start (ms): Min: 3509335.4, Avg: 3509335.5, Max:
3509335.7, Diff: 0.2]
      [Ext Root Scanning (ms): Min: 0.9, Avg: 1.2, Max: 2.6, Diff: 1.7,
Sum: 26.5]
      [Update RS (ms): Min: 22.7, Avg: 23.7, Max: 24.4, Diff: 1.7, Sum:
546.2]
         [Processed Buffers: Min: 22, Avg: 34.3, Max: 54, Diff: 32, Sum:
788]
      [Scan RS (ms): Min: 58.0, Avg: 59.5, Max: 60.3, Diff: 2.3, Sum:
1368.7]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.2]
      [Object Copy (ms): Min: 22450.6, Avg: 22615.0, Max: 22768.2, Diff:
317.5, Sum: 520143.9]
      [Termination (ms): Min: 3413.9, Avg: 3567.2, Max: 3732.4, Diff:
318.5, Sum: 82045.7]
         [Termination Attempts: Min: 601032, Avg: 630093.2, Max: 639709,
Diff: 38677, Sum: 14492144]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.2, Sum:
3.3]
      [GC Worker Total (ms): Min: 26266.6, Avg: 26266.8, Max: 26266.9,
Diff: 0.3, Sum: 604136.0]
      [GC Worker End (ms): Min: 3535602.2, Avg: 3535602.3, Max: 3535602.4,
Diff: 0.2]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 5.3 ms]
   [Other: 13.8 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 4.3 ms]
      [Ref Enq: 0.2 ms]
      [Redirty Cards: 4.5 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 3.4 ms]
   [Eden: 27.2G(27.2G)->0.0B(27.2G) Survivors: 2848.0M->2880.0M Heap:
83.0G(96.0G)->56.5G(96.0G)]
 [Times: user=603.54 sys=0.02, real=26.29 secs]
2016-08-25T11:24:09.477+0000: 3535.621: Total time for which application
threads were stopped: 26.2879077 seconds, Stopping threads took: 0.0002478
seconds
2016-08-25T11:24:10.100+0000: 3536.244: [GC concurrent-mark-end, 43.5578776
secs]
2016-08-25T11:24:10.103+0000: 3536.247: [GC remark
2016-08-25T11:24:10.103+0000: 3536.247: [Finalize Marking, 85.4564628 secs]
2016-08-25T11:25:35.559+0000: 3621.703: [GC
ref-proc2016-08-25T11:25:35.559+0000: 3621.703: [SoftReference, 0 refs,
0.0011470 secs]2016-08-25T11:25:35.561+0000: 3621.705: [WeakReference, 445
refs, 0.0008184 secs]2016-08-25T11:25:35.561+0000: 3621.705:
[FinalReference, 206 refs, 0.0008055 secs]2016-08-25T11:25:35.562+0000:
3621.706: [PhantomReference, 0 refs, 48 refs, 0.0015741
secs]2016-08-25T11:25:35.564+0000: 3621.708: [JNI Weak Reference, 0.0000350
secs], 0.0047317 secs] 2016-08-25T11:25:35.564+0000: 3621.708: [Unloading,
0.0163349 secs], 85.5620546 secs]
 [Times: user=1373.18 sys=588.01, real=85.57 secs]

Jenny, I'll send you the full log offline.

Thanks

On Thu, Aug 25, 2016 at 7:15 AM, Vitaly Davidovich <vitalyd at gmail.com>
wrote:

> Hi Jenny,
>
> Thanks very much for your analysis.  A few comments inline below.
>
>
> On Wednesday, August 24, 2016, yu.zhang at oracle.com <yu.zhang at oracle.com>
> wrote:
>
>> Vitaly,
>>
>> Thanks for the gc logs.
>>
>> Before reaching full gc and the entry at
>>
>> 2016-08-24T15:29:12.302+0000: 17776.029: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>> ...
>>
>> [Eden: 0.0B(30.0G)->0.0B(30.0G) Survivors: 0.0B->0.0B Heap:
>> 95.2G(96.0G)->95.2G(96.0G)]
>> There are several young gcs with 'to-space exhausted'. For example
>>
>> "2016-08-24T15:29:03.936+0000: 17767.663: [GC pause (G1 Evacuation Pause)
>> (young)
>> Desired survivor size 1795162112 bytes, new threshold 2 (max 2)
>>  17767.663: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 185708, predicted base time: 46.71 ms, remaining time:
>> 253.29 ms, target pause time: 300.00 ms]
>>  17767.663: [G1Ergonomics (CSet Construction) add young regions to CSet,
>> eden: 159 regions, survivors: 0 regions, predicted young region time:
>> 173.05 ms]
>>  17767.663: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
>> 159 regions, survivors: 0 regions, old: 0 regions, predicted pause time:
>> 219.76 ms, target pause time: 300.00 ms]
>>  17767.664: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
>> region allocation request failed, allocation request: 16340088 bytes]
>>  17767.664: [G1Ergonomics (Heap Sizing) expand the heap, requested
>> expansion amount: 16340088 bytes, attempted expansion amount: 33554432
>> bytes]
>>  17767.664: [G1Ergonomics (Heap Sizing) did not expand the heap, reason:
>> heap already fully expanded]
>> ...
>>
>> (to-space exhausted), 8.0293588 secs]
>>    [Parallel Time: 6920.2 ms, GC Workers: 23]
>> ...
>>    [Other: 1106.9 ms]
>>       [Evacuation Failure: 1095.6 ms]
>>       [Choose CSet: 0.0 ms]
>>       [Ref Proc: 4.4 ms]
>>       [Ref Enq: 0.4 ms]
>>       [Redirty Cards: 4.8 ms]
>>       [Humongous Register: 0.2 ms]
>>       [Humongous Reclaim: 0.2 ms]
>>       [Free CSet: 0.3 ms]
>>    [Eden: 5088.0M(30.0G)->0.0B(30.0G) Survivors: 0.0B->0.0B Heap:
>> 95.2G(96.0G)->95.2G(96.0G)]
>>  [Times: user=68.75 sys=1.66, real=8.03 secs]
>>
>> Please note the high "Evacuation Failure" time. Some of those young gcs
>> might collect some. Some (like this one) do not collect any heap.
>>
>> (to-space exhausted) means it can not find enough old regions when
>> evacuating from the young regions.
>>
> The above log also says that Eden occupancy went from ~5G to 0.  Given the
> to-space exhaustion and Survivors being 0, that implies it was all
> garbage.  If that's the case, what exactly did it fail to evacuate? If it
> failed to evacuate them, where did they go given Eden + Survivors are 0.
> Also, Survivors was apparently 0 to start with, so it's unclear to me what
> exactly happened here.
>
>>
>>
>> There are only 4 marking cycles, 3 of which are at the beginning due to
>> meta data space. Mix gc is not trigger at that time since the heap is not
>> full enough.
>>
>> There is only 1 marking cycle requested due to heap occupancy, but it is
>> too late. The heap is very full already. No mixed gc found in the log.
>> "request concurrent cycle initiation, reason: occupancy higher than
>> threshold, occupancy: 88080384000 bytes, allocation request: 0 bytes,
>> threshold: 77309411325 bytes (75.00 %), source: end of GC]"
>>
> Any ideas why the concurrent cycle is initiated when occupancy is already
> well above the 75% threshold? Why wouldn't it trigger at very close to the
> threshold? It's "late" by about 10G here.
>
>>
>>
>> To avoid this, as others suggested, you can try
>> 1. set a reasonable pause time goal and let g1 decide the young gen size
>>
> Is there a rule of thumb for picking reasonable pause time goals based on
> heap size/alloc rate/etc?
>
>>
>> 2. if you have to set the young gen size fixed, reduce the NewSize,
>> increase max tenure threshold to let objects die in young gen.
>> 3. Reduce InitiatingHeapOccupancyPercent
>> The goal is to trigger more mixed gc and avoid to-space exhausted.
>>
> Right, we're trying with a lower IHOP value (55%).
>
>>
>>
>> The message after 'to-space exhausted' might be confusing. I need to
>> discuss this with dev team. For example, at time stamp
>> 2016-08-24T15:28:05.905+0000: 17709.633: [GC pause (G1 Evacuation Pause)
>> (young)
>> ...
>> (to-space exhausted), 2.6149566 secs]
>> ...
>>
>> [Eden: 28.4G(28.4G)->0.0B(28.4G) Survivors: 1664.0M->1664.0M Heap:
>> 93.5G(96.0G)->73.9G(96.0G)]
>>
>> the eden used after gc might not be true. I will do some investigation
>> and get back to you.
>>
> Thanks.  Yeah it's confusing.  I'm still not sure I understand the log
> snippet I pasted in my initial email of the young evac immediately
> preceding the Full GC - it showed 0 regions in the CSet, so nothing was
> evacuated, but it also showed Eden occupancy of 0.  It's then unclear why
> the Full GC triggers immediately after due to a 32 byte alloc request.
>
> Do you think that may be a bogus log as well?
>
>>
>>
>> Thanks
>> Jenny
>>
> Thanks again Jenny, this is very helpful.
>
>>
>>
>> On 08/24/2016 03:50 PM, Vitaly Davidovich wrote:
>>
>>
>>
>> On Wed, Aug 24, 2016 at 6:18 PM, Jenny Zhang <yu.zhang at oracle.com> wrote:
>>
>>> More comments about the questions
>>>
>>> Thanks
>>> Jenny
>>>
>>> On 8/24/2016 11:43 AM, Vitaly Davidovich wrote:
>>>
>>>> Right before the Full GC, ergonomics report a failure to expand the
>>>> heap due to an allocation request of 32 bytes.  Is this implying that a
>>>> mutator tried to allocate 32 bytes but couldn't? How do I reconcile that
>>>> with Eden+Survivor occupancy reported right above that?
>>>>
>>> Yes, it means the mutator tries to allocate 32byte but can not get it.
>>> Heap won't expand as it already reaches max heap.
>>>
>>> Do you see any humongous objects allocatoin?
>>
>> As mentioned in my previous email, I don't see any humongous allocations
>> recorded.  The Humongous Register/Reclaim output does show non-zero timing,
>> so not sure if the log is simply missing them or something else is going on.
>>
>>>
>>>
>>>> Young gen is sized to 30GB, total heap is 96GB.  Allocation rate of the
>>>> application is roughly 1GB/s.  Am I correct in assuming that allocation is
>>>> outpacing concurrent marking, based on the above? What tunable(s) would you
>>>> advise to tweak to get G1 to keep up with the allocation rate? I'm ok
>>>> taking some throughput hit to mitigate 90s+ pauses.
>>>>
>>>> The entire log might give a better picture. Especially if the marking
>>> cycle is triggered, how well the mixed gc cleans up the heap.
>>>
>>> Are there particular gc lines you're looking for? I can grep for them
>> quickly and provide that for you.
>>
>>
>
>
>
>
>
>
>
>
>
> --
> Sent from my phone
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20160825/031197ea/attachment.html>