Odd G1GC behavior on 8u91

Wed Aug 24 22:36:13 UTC 2016

Hi Jenny,

Very happy that you and Charlie got wind of this thread -- could use your
expertise :).  I will email you the log directly (it's a bit verbose with
all the safepoint + gc logging) as I believe the mailing list software will
strip it.  To answer/comment on your email ...

I believe the fixing of young gen size (and turning off adaptive sizing)
was done intentionally.  The developers reported that letting G1 manage
this ergonomically caused problems, although that may be because the max
pause time goal is too aggressive (300ms for such a large heap).  This is
something we're also looking at revisiting, but trying to get a handle on
the other issues first.

As for humongous objects, I don't see any trace of them in the log.  We
actually saw some other poor G1 behavior with some older GC settings
whereby the "Finalize Marking" phase was taking hundreds of seconds (same
total heap size, but with a 15GB young), and those gc logs did indicate
very humongous object allocations.  I can certainly try sharing that log
with you as well, but I think that's likely a different issue (it's
possible it's related to the G1 worker threads marking through large arrays
fully, but I'm not sure).  I don't see them here though, and just puzzled
at the log output thus far.

PrintFLSStatistics=1 is likely a leftover from when this app was using
CMS.  As you may have surmised, I'm trying to assist someone else with
troubleshooting the G1 problems, so I don't have the full history of all
the decisions behind the flags and their values.  I can, however, try to
find that out if you find it pertinent.

Thanks

On Wed, Aug 24, 2016 at 6:11 PM, Jenny Zhang <yu.zhang at oracle.com> wrote:

> Hi, Vitaly,
>
> Charlie and I were just talking about your post.
>
> The parameters you used
>
> -XX:MaxNewSize=32212254720 -XX:NewSize=32212254720(Plus
> -XX:-UseAdaptiveSizePolicy ) basically fixed the young gen size. Which is
> reflected in the gc log entries.
>
> As Eric points out, the -XX:MaxTenuringThreshold=2 might promotes objects
> quicker to old gen. But it does not explain the 0 Eden used. Maybe all your
> objects are humongous objects and allocated to old gen directly?
>
> Do you mind post the entire gc log? If it is too big, can I copy it from
> some where?
>
> BTW, -XX:PrintFLSStatistics=1 is for CMS not G1
>
> Thanks
> Jenny
>
> On 8/24/2016 2:32 PM, Vitaly Davidovich wrote:
>
> Hi Eric,
>
> Thanks for the reply.  Yeah, we're experimenting with reducing the
> initiating heap occupancy to 55%, hopefully letting the concurrent marking
> complete before the allocation and promotion rate results in a Full GC.  In
> particular, I'm not sure how to interpret the gc log snippets I posted
> showing a young evac with an empty Young gen, and then followed by a Full
> GC due to heap expansion failure for a 32 byte allocation.  I'm likely
> missing something in my interpretation.
>
> I think you're also on point about the max tenuring threshold being too
> low - need to see how that value was arrived at.
>
> Thanks
>
> On Wed, Aug 24, 2016 at 5:20 PM, Eric Caspole <eric.caspole at oracle.com>
> wrote:
>
>> I have not used G1 in JDK 8 that much but the two trouble spots to me are:
>>
>> -XX:InitiatingHeapOccupancyPercent=75
>> -XX:MaxTenuringThreshold=2
>>
>> So this will tenure very quickly, filling up old gen and start the
>> marking relatively late at 75%. This looks like it is pretty likely to end
>> up in a STW full GC.
>> Since you do have a huge amount of garbage getting collected in the full
>> gc maybe try letting more of it die off in young gen with higher tenuring
>> threshold and also start marking earlier than 75%.
>>
>> good luck,
>> Eric
>>
>>
>>
>>
>> On 08/24/2016 02:43 PM, Vitaly Davidovich wrote:
>>
>>> Hi guys,
>>>
>>> Hoping someone could shed some light on G1 behavior (as seen from the gc
>>> log) that I'm having a hard time understanding.  The root problem is G1
>>> enters a Full GC that takes many tens of seconds, and need some advice
>>> on what could be causing it.
>>>
>>> First, some basic info:
>>> Java HotSpot(TM) 64-Bit Server VM (25.91-b14) for linux-amd64 JRE
>>> (1.8.0_91-b14), built on Apr  1 2016 00:57:21 by "java_re" with gcc
>>> 4.3.0 20080428 (Red Hat 4.3.0-8)
>>> Memory: 4k page, physical 264115728k(108464820k free), swap 0k(0k free)
>>> CommandLine flags: -XX:G1HeapWastePercent=5 -XX:G1MixedGCCountTarget=4
>>> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<some_path>
>>> -XX:InitialCodeCacheSize=104857600 -XX:InitialHeapSize=103079215104
>>> -XX:InitialTenuringThreshold=2 -XX:InitiatingHeapOccupancyPercent=75
>>> -XX:+ManagementServer -XX:MaxGCPauseMillis=300
>>> -XX:MaxHeapSize=103079215104 -XX:MaxNewSize=32212254720
>>> -XX:MaxTenuringThreshold=2 -XX:NewSize=32212254720
>>> -XX:+ParallelRefProcEnabled -XX:+PrintAdaptiveSizePolicy
>>> -XX:+PrintCommandLineFlags -XX:+PrintCompilation
>>> -XX:PrintFLSStatistics=1 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
>>> -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>>> -XX:+PrintPromotionFailure -XX:+PrintReferenceGC
>>> -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1
>>> -XX:+PrintTenuringDistribution -XX:ReservedCodeCacheSize=104857600
>>> -XX:SurvivorRatio=9 -XX:-UseAdaptiveSizePolicy -XX:+UseG1GC
>>>
>>> Swap is disabled.  THP is disabled.
>>>
>>> First issue I have a question about:
>>>
>>> 2016-08-24T15:29:12.302+0000: 17776.029: [GC pause (G1 Evacuation Pause)
>>> (young)
>>> Desired survivor size 1795162112 bytes, new threshold 2 (max 2)
>>>  17776.029: [G1Ergonomics (CSet Construction) start choosing CSet,
>>> _pending_cards: 0, predicted base time: 14.07 ms, remaining time: 285.93
>>> ms, target pause time: 300.00 ms]
>>>  17776.029: [G1Ergonomics (CSet Construction) add young regions to CSet,
>>> eden: 0 regions, survivors: 0 regions, predicted young region time: 0.00
>>> ms]
>>>  17776.029: [G1Ergonomics (CSet Construction) finish choosing CSet,
>>> eden: 0 regions, survivors: 0 regions, old: 0 regions, predicted pause
>>> time: 14.07 ms, target pause time: 300.00 ms]
>>> 2016-08-24T15:29:12.305+0000: 17776.033: [SoftReference, 0 refs,
>>> 0.0012417 secs]2016-08-24T15:29:12.307+0000: 17776.034: [WeakReference,
>>> 0 refs, 0.0007101 secs]2016-08-24T15:29:12.307+0000: 17776.035:
>>> [FinalReference, 0 refs, 0.0007027 secs]2016-08-24T15:29:12.308+0000:
>>> 17776.035: [PhantomReference, 0 refs, 0 refs, 0.0013585
>>> secs]2016-08-24T15:29:12.309+0000: 17776.037: [JNI Weak Reference,
>>> 0.0000118 secs], 0.0089758 secs]
>>>    [Parallel Time: 3.1 ms, GC Workers: 23]
>>>       [GC Worker Start (ms): Min: 17776029.2, Avg: 17776029.3, Max:
>>> 17776029.4, Diff: 0.2]
>>>       [Ext Root Scanning (ms): Min: 0.8, Avg: 1.1, Max: 2.8, Diff: 1.9,
>>> Sum: 24.2]
>>>       [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
>>>          [Processed Buffers: Min: 0, Avg: 0.1, Max: 1, Diff: 1, Sum: 2]
>>>       [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
>>>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>>> Sum: 0.0]
>>>       [Object Copy (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum:
>>> 1.2]
>>>       [Termination (ms): Min: 0.0, Avg: 1.6, Max: 1.8, Diff: 1.8, Sum:
>>> 37.9]
>>>          [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum:
>>> 23]
>>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>>> Sum: 0.2]
>>>       [GC Worker Total (ms): Min: 2.7, Avg: 2.8, Max: 2.9, Diff: 0.2,
>>> Sum: 63.8]
>>>       [GC Worker End (ms): Min: 17776032.0, Avg: 17776032.0, Max:
>>> 17776032.1, Diff: 0.0]
>>>    [Code Root Fixup: 0.2 ms]
>>>    [Code Root Purge: 0.0 ms]
>>>    [Clear CT: 0.4 ms]
>>>    [Other: 5.3 ms]
>>>       [Choose CSet: 0.0 ms]
>>>       [Ref Proc: 4.4 ms]
>>>       [Ref Enq: 0.3 ms]
>>>       [Redirty Cards: 0.3 ms]
>>>       [Humongous Register: 0.1 ms]
>>>       [Humongous Reclaim: 0.1 ms]
>>>       [Free CSet: 0.0 ms]
>>>    *[Eden: 0.0B(30.0G)->0.0B(30.0G) Survivors: 0.0B->0.0B Heap:
>>> 95.2G(96.0G)->95.2G(96.0G)]*
>>>  [Times: user=0.08 sys=0.00, real=0.01 secs]
>>> 2016-08-24T15:29:12.311+0000: 17776.038: Total time for which
>>> application threads were stopped: 0.0103002 seconds, Stopping threads
>>> took: 0.0000566 seconds
>>>  17776.039: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
>>> allocation request failed, allocation request: 32 bytes]
>>>  17776.039: [G1Ergonomics (Heap Sizing) expand the heap, requested
>>> expansion amount: 33554432 bytes, attempted expansion amount: 33554432
>>> bytes]
>>>  17776.039: [G1Ergonomics (Heap Sizing) did not expand the heap, reason:
>>> heap already fully expanded]
>>> 2016-08-24T15:29:12.312+0000: 17776.039: [Full GC (Allocation Failure)
>>> 2016-08-24T15:29:40.727+0000: 17804.454: [SoftReference, 5504 refs,
>>> 0.0012432 secs]2016-08-24T15:29:40.728+0000: 17804.456: [WeakReference,
>>> 1964 refs, 0.0003012 secs]2016-08-24T15:29:40.728+0000: 17804.456:
>>> [FinalReference, 3270 refs, 0.0033290 secs]2016-08-24T15:29:40.732+0000:
>>> 17804.459: [PhantomReference, 0 refs, 75 refs, 0.0000257
>>> secs]2016-08-24T15:29:40.732+0000: 17804.459: [JNI Weak Reference,
>>> 0.0000172 secs] 95G->38G(96G), 95.5305034 secs]
>>>    [Eden: 0.0B(30.0G)->0.0B(30.0G) Survivors: 0.0B->0.0B Heap:
>>> 95.2G(96.0G)->38.9G(96.0G)], [Metaspace: 104180K->103365K(106496K)]
>>> * [Times: user=157.02 sys=0.28, real=95.54 secs] *
>>> *
>>> *
>>> So here we have a lengthy full GC pause that collects quite a bit of old
>>> gen (expected).  Right before this is a young evac pause.
>>>
>>> Why is the heap sizing (bolded) reported after the evac pause showing
>>> empty Eden+Survivor?
>>>
>>> Why is ergonomic info reporting 0 regions selected (i.e. what's
>>> evacuated then)?
>>>
>>> Right before the Full GC, ergonomics report a failure to expand the heap
>>> due to an allocation request of 32 bytes.  Is this implying that a
>>> mutator tried to allocate 32 bytes but couldn't? How do I reconcile that
>>> with Eden+Survivor occupancy reported right above that?
>>>
>>> Young gen is sized to 30GB, total heap is 96GB.  Allocation rate of the
>>> application is roughly 1GB/s.  Am I correct in assuming that allocation
>>> is outpacing concurrent marking, based on the above? What tunable(s)
>>> would you advise to tweak to get G1 to keep up with the allocation rate?
>>> I'm ok taking some throughput hit to mitigate 90s+ pauses.
>>>
>>> Let me know if any additional info is needed (I have the full GC log,
>>> and can attach that if desired).
>>>
>>> Thanks
>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20160824/b2f3cfa4/attachment.html>