G1OldCSetRegionThresholdPercent under ExperimentalFlag

Sundara Mohan M email.sundarms at gmail.com
Thu Jun 22 22:11:48 UTC 2017


Thanks for the insights on Ergo.

I was trying to migrate from CMS to G1GC, the app has a low memory
handler ( the thread which finds memory utilization from
Runtime.getFreememory and removes some data from in memory if it
exceeds the threshold).

In CMS this handler was not invoked frequently (for ex: When I have
60K objects it will kick in remove ~5K LRU objects and continue
regular operation) when i moved to G1GC this handler started kicking
in frequently(ex: When i have 60K object it will remove 5K LRU objects
and immediately after some time it will kick in and remove another 5K
and goes till 10K objects are left).

So, i was trying to find out why did mixed GC doesn't cleanup quick
enough before my low memory handler kicks in.

Though i see number of young gen collection and time taken to clean
has came down by ~40%.

Another issue (may be this is expected) is after increasing
G1OldCSetRegionThresholdPercent to 20% from 10% i am started seeing
few mixed GC taking 1s (most of the time is spent on UpdateRS,
MaxPause=500ms). Will get back once i have more understanding on what
is happening..

Thanks,
Sundar

On Thu, Jun 22, 2017 at 2:16 PM, Thomas Schatzl
<thomas.schatzl at oracle.com> wrote:
> Hi Sundara,
>
> On Thu, 2017-06-22 at 09:49 -0700, Sundara Mohan M wrote:
>> Hi Thomas,
>>    Thanks for the explanation.
>>
>>
>> I was trying to debug why it is not including some old region even
>> though it had ~100ms (though Ergo logs say it has accommodated all
>> regions to cover given 500ms).
>
> Ergo is self-training, but it takes some time to adapt to the
> situation.
>
> As long running a run that log shows (thanks!), the number of mixed gcs
> is relatively small, and they are pretty far apart (in the range of
> hours between mixed gc phases). Young gc occurrences distribution is
> far from equal (even considering differences in used young gen size),
> so it seems that the application is quite bursty from time to time.
>
> The different mixed gc/old gen space reclamation phases are never
> particularly long either, so my best guess would be that the values
> used for how long particular regions take to evacuate are messed up.
>
> I.e. from some graphs it roughly looks like is that there is roughly a
> mixed gc phase at the start of every bursty phase (as far as I could
> identify them looking at graphs), and one during the phase, typically
> near the end.
>
> So depending on when that mixed gc occurs (at the start of such a burst
> or within), g1 trains itself on different application behavior that it
> later uses these values on. This is always some kind of moving average,
> which does not necessarily reflect reality.
>
> Very good adaptation to this behavior seems beyond what g1 can do at the moment.
>
> One could in theory force G1 to give much more weight to recent observations to make adaptations quicker (i.e. change some factors in that average calculation); but there is no user option for that, and it may open a separate can of worms (currently it seems to not too eagerly discount older observations compared to more recent ones if I read the code correctly).
>
> But that is just something I made up right now by staring at your log
> graphs, I may be wrong :)
>
> It is unfortunately impossible to determine the exact values for these
> predictions in a product VM (e.g. comparing actual/predicted detail
> values the per-region prediction is made of) at this time as there is
> no way to get these relevant values out of the VM.
>
> Back to your problem (if there is one, you did not state any ;)): the
> log shows a few issues with mixed gc actually: the one you explained
> about not taking enough old gen regions because
> the G1OldCSetRegionThresholdPercent is too low as you suspected (still
> not reaching max pause time; case 1), and the cases where the number of
> old gen regions taken is too low so these are filled up with "expensive
> old gen regions". However I am seeing both the actual time taken being
> too low and too high (case 2 and 3)
>
> Not sure what your goals are here, and what the actual issue is, but
>
> - you can probably fix case 1 with increasing the mentioned
> -XX:G1OldCSetRegionThresholdPercent option if that behavior annoys you.
>
> - fix either case 2 or case 3 with decreasing or increasing
> -XX:G1MixedGCCountTarget (one direction increases the minimum number of
> regions to take, the other decreases it).
>
> All in all an interesting case to look at :)
>
> Thanks a lot,
>   Thomas
>
>> Adding some log snippets here and attaching entire logs in case if
>> that helps.
>>
>> Running app with 31G
>>
>> CommandLine flags: -XX:GCLogFileSize=20971520
>> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=out-of-memory-heap-
>> dump -XX:InitialHeapSize=33285996544 -XX:MaxGCPauseMillis=500
>> -XX:MaxHeapSize=33285996544 -XX:MetaspaceSize=536870912
>> -XX:NumberOfGCLogFiles=20 -XX:+ParallelRefProcEnabled
>> -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC
>> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>> -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers
>> -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation
>> -XX:+UseStringDeduplication
>>
>> ...
>> 2017-06-19T22:54:05.488+0000: 9345.322: [GC pause (G1 Evacuation
>> Pause) (mixed)
>> Desired survivor size 104857600 bytes, new threshold 1 (max 15)
>> - age   1:  131296848 bytes,  131296848 total
>> - age   2:  237559952 bytes,  368856800 total
>> - age   3:  137259376 bytes,  506116176 total
>>  9345.322: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 130042, predicted base time: 171.58 ms, remaining
>> time: 328.42 ms, target pause time: 500.00 ms]
>>  9345.322: [G1Ergonomics (CSet Construction) add young regions to
>> CSet, eden: 121 regions, survivors: 77 regions, predicted young
>> region time: 249.33 ms]
>>  9345.322: [G1Ergonomics (CSet Construction) finish adding old
>> regions to CSet, reason: predicted time is too high, predicted time:
>> 0.44 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions]
>>  9345.322: [G1Ergonomics (CSet Construction) added expensive regions
>> to CSet, reason: old CSet region num not reached min, old: 204
>> regions, expensive: 11 regions, min: 204 regions, remaining time:
>> 0.00 ms]
>>  9345.322: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 121 regions, survivors: 77 regions, old: 204 regions, predicted
>> pause time: 504.35 ms, target pause time: 500.00 ms]
>>  9345.691: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
>> candidate old regions available, candidate old regions: 1425 regions,
>> reclaimable: 11364516952 bytes (34.14 %), threshold: 5.00 %]
>> , 0.3691404 secs]
>>    [Parallel Time: 301.4 ms, GC Workers: 13]
>>       [GC Worker Start (ms): Min: 9345323.0, Avg: 9345323.3, Max:
>> 9345323.6, Diff: 0.6]
>>       [Ext Root Scanning (ms): Min: 0.9, Avg: 1.2, Max: 1.6, Diff:
>> 0.6, Sum: 15.9]
>>       [Update RS (ms): Min: 62.1, Avg: 62.3, Max: 63.0, Diff: 0.9,
>> Sum: 809.4]
>>          [Processed Buffers: Min: 35, Avg: 51.8, Max: 91, Diff: 56,
>> Sum: 674]
>>       [Scan RS (ms): Min: 11.3, Avg: 12.1, Max: 14.8, Diff: 3.6, Sum:
>> 157.5]
>>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
>> 0.0, Sum: 0.1]
>>       [Object Copy (ms): Min: 222.2, Avg: 224.8, Max: 225.3, Diff:
>> 3.1, Sum: 2922.8]
>>       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>> Sum: 0.4]
>>          [Termination Attempts: Min: 1, Avg: 15.6, Max: 24, Diff: 23,
>> Sum: 203]
>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2,
>> Sum: 1.1]
>>       [GC Worker Total (ms): Min: 300.3, Avg: 300.6, Max: 300.8,
>> Diff: 0.5, Sum: 3907.2]
>>       [GC Worker End (ms): Min: 9345623.8, Avg: 9345623.9, Max:
>> 9345624.0, Diff: 0.2]
>>    [Code Root Fixup: 0.1 ms]
>>    [Code Root Purge: 0.0 ms]
>>    [String Dedup Fixup: 43.9 ms, GC Workers: 13]
>>       [Queue Fixup (ms): Min: 0.4, Avg: 2.2, Max: 3.7, Diff: 3.3,
>> Sum: 28.6]
>>       [Table Fixup (ms): Min: 39.8, Avg: 41.2, Max: 42.9, Diff: 3.2,
>> Sum: 535.8]
>>    [Clear CT: 3.4 ms]
>>    [Other: 20.2 ms]
>>       [Choose CSet: 0.3 ms]
>>       [Ref Proc: 13.4 ms]
>>       [Ref Enq: 1.0 ms]
>>       [Redirty Cards: 2.0 ms]
>>       [Humongous Register: 0.2 ms]
>>       [Humongous Reclaim: 0.1 ms]
>>       [Free CSet: 2.1 ms]
>>    [Eden: 968.0M(968.0M)->0.0B(1472.0M) Survivors: 616.0M->112.0M
>> Heap: 15.6G(31.0G)->13.1G(31.0G)]
>>  [Times: user=4.53 sys=0.00, real=0.36 secs]
>> ....
>> 2017-06-19T22:54:47.655+0000: 9387.489: [GC pause (G1 Evacuation
>> Pause) (mixed)
>> Desired survivor size 104857600 bytes, new threshold 15 (max 15)
>> - age   1:   31749256 bytes,   31749256 total
>>  9387.489: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 127449, predicted base time: 168.88 ms, remaining
>> time: 331.12 ms, target pause time: 500.00 ms]
>>  9387.489: [G1Ergonomics (CSet Construction) add young regions to
>> CSet, eden: 184 regions, survivors: 14 regions, predicted young
>> region time: 62.79 ms]
>>  9387.490: [G1Ergonomics (CSet Construction) finish adding old
>> regions to CSet, reason: old CSet region num reached max, old: 397
>> regions, max: 397 regions]
>>  9387.490: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 184 regions, survivors: 14 regions, old: 397 regions, predicted
>> pause time: 390.18 ms, target pause time: 500.00 ms]
>>  9387.659: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
>> candidate old regions available, candidate old regions: 1028 regions,
>> reclaimable: 8047410104 bytes (24.18 %), threshold: 5.00 %]
>> , 0.1700662 secs]
>>    [Parallel Time: 101.4 ms, GC Workers: 13]
>>       [GC Worker Start (ms): Min: 9387490.4, Avg: 9387490.8, Max:
>> 9387491.1, Diff: 0.6]
>>       [Ext Root Scanning (ms): Min: 0.7, Avg: 1.1, Max: 1.6, Diff:
>> 0.9, Sum: 14.3]
>>       [Update RS (ms): Min: 27.0, Avg: 27.8, Max: 28.9, Diff: 1.8,
>> Sum: 361.9]
>>          [Processed Buffers: Min: 34, Avg: 51.4, Max: 88, Diff: 54,
>> Sum: 668]
>>       [Scan RS (ms): Min: 25.8, Avg: 27.1, Max: 27.4, Diff: 1.6, Sum:
>> 352.2]
>>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff:
>> 0.2, Sum: 0.7]
>>       [Object Copy (ms): Min: 42.8, Avg: 43.8, Max: 44.5, Diff: 1.8,
>> Sum: 569.9]
>>       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>> Sum: 0.1]
>>          [Termination Attempts: Min: 1, Avg: 9.5, Max: 14, Diff: 13,
>> Sum: 124]
>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4,
>> Sum: 2.3]
>>       [GC Worker Total (ms): Min: 99.7, Avg: 100.1, Max: 100.6, Diff:
>> 0.9, Sum: 1301.4]
>>       [GC Worker End (ms): Min: 9387590.7, Avg: 9387590.9, Max:
>> 9387591.1, Diff: 0.4]
>>    [Code Root Fixup: 0.3 ms]
>>    [Code Root Purge: 0.0 ms]
>>    [String Dedup Fixup: 43.5 ms, GC Workers: 13]
>>       [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>> Sum: 0.0]
>>       [Table Fixup (ms): Min: 43.0, Avg: 43.2, Max: 43.4, Diff: 0.3,
>> Sum: 561.3]
>>    [Clear CT: 3.9 ms]
>>    [Other: 21.1 ms]
>>       [Choose CSet: 0.8 ms]
>>       [Ref Proc: 12.8 ms]
>>       [Ref Enq: 0.9 ms]
>>       [Redirty Cards: 0.9 ms]
>>       [Humongous Register: 0.2 ms]
>>       [Humongous Reclaim: 0.1 ms]
>>       [Free CSet: 4.2 ms]
>>    [Eden: 1472.0M(1472.0M)->0.0B(1424.0M) Survivors: 112.0M->160.0M
>> Heap: 14.5G(31.0G)->10.1G(31.0G)]
>>  [Times: user=1.93 sys=0.00, real=0.17 secs]
>> .....
>> 2017-06-19T22:55:29.656+0000: 9429.490: [GC pause (G1 Evacuation
>> Pause) (mixed)
>> Desired survivor size 104857600 bytes, new threshold 15 (max 15)
>> - age   1:   44204040 bytes,   44204040 total
>> - age   2:   31422896 bytes,   75626936 total
>>  9429.490: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 64391, predicted base time: 130.82 ms, remaining
>> time: 369.18 ms, target pause time: 500.00 ms]
>>  9429.490: [G1Ergonomics (CSet Construction) add young regions to
>> CSet, eden: 178 regions, survivors: 20 regions, predicted young
>> region time: 69.26 ms]
>>  9429.491: [G1Ergonomics (CSet Construction) finish adding old
>> regions to CSet, reason: predicted time is too high, predicted time:
>> 2.12 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions]
>>  9429.491: [G1Ergonomics (CSet Construction) added expensive regions
>> to CSet, reason: old CSet region num not reached min, old: 204
>> regions, expensive: 72 regions, min: 204 regions, remaining time:
>> 0.00 ms]
>>  9429.491: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 178 regions, survivors: 20 regions, old: 204 regions, predicted
>> pause time: 684.25 ms, target pause time: 500.00 ms]
>>  9429.663: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
>> candidate old regions available, candidate old regions: 824 regions,
>> reclaimable: 6351099672 bytes (19.08 %), threshold: 5.00 %]
>> , 0.1729571 secs]
>>    [Parallel Time: 102.6 ms, GC Workers: 13]
>>       [GC Worker Start (ms): Min: 9429491.3, Avg: 9429491.6, Max:
>> 9429491.9, Diff: 0.6]
>>       [Ext Root Scanning (ms): Min: 0.9, Avg: 1.3, Max: 1.8, Diff:
>> 0.9, Sum: 16.9]
>>       [Update RS (ms): Min: 18.7, Avg: 19.1, Max: 20.9, Diff: 2.2,
>> Sum: 248.9]
>>          [Processed Buffers: Min: 18, Avg: 32.6, Max: 58, Diff: 40,
>> Sum: 424]
>>       [Scan RS (ms): Min: 15.5, Avg: 17.1, Max: 18.5, Diff: 2.9, Sum:
>> 222.8]
>>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff:
>> 0.1, Sum: 0.5]
>>       [Object Copy (ms): Min: 62.3, Avg: 63.9, Max: 64.4, Diff: 2.2,
>> Sum: 831.3]
>>       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>> Sum: 0.1]
>>          [Termination Attempts: Min: 1, Avg: 2.6, Max: 5, Diff: 4,
>> Sum: 34]
>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4,
>> Sum: 2.2]
>>       [GC Worker Total (ms): Min: 101.4, Avg: 101.7, Max: 102.1,
>> Diff: 0.7, Sum: 1322.7]
>>       [GC Worker End (ms): Min: 9429593.3, Avg: 9429593.4, Max:
>> 9429593.6, Diff: 0.4]
>>    [Code Root Fixup: 0.2 ms]
>>    [Code Root Purge: 0.0 ms]
>>    [String Dedup Fixup: 45.4 ms, GC Workers: 13]
>>       [Queue Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5,
>> Sum: 1.5]
>>       [Table Fixup (ms): Min: 43.9, Avg: 44.1, Max: 44.2, Diff: 0.4,
>> Sum: 573.4]
>>    [Clear CT: 4.3 ms]
>>    [Other: 20.5 ms]
>>       [Choose CSet: 0.5 ms]
>>       [Ref Proc: 14.3 ms]
>>       [Ref Enq: 1.2 ms]
>>       [Redirty Cards: 0.7 ms]
>>       [Humongous Register: 0.2 ms]
>>       [Humongous Reclaim: 0.1 ms]
>>       [Free CSet: 2.4 ms]
>>    [Eden: 1424.0M(1424.0M)->0.0B(1392.0M) Survivors: 160.0M->192.0M
>> Heap: 11.5G(31.0G)->8796.0M(31.0G)]
>>  [Times: user=1.95 sys=0.00, real=0.17 secs]
>>
>>
>> On Thu, Jun 22, 2017 at 3:18 AM, Thomas Schatzl <thomas.schatzl at oracl
>> e.com> wrote:
>> > Hi,
>> >
>> > On Tue, 2017-06-20 at 23:45 -0700, Sundara Mohan M wrote:
>> > > Hi,
>> > >   Can someone shed more light on
>> > why G1OldCSetRegionThresholdPercent
>> > > flag is under experimental (Need to add  -
>> > > XX:+UnlockExperimentalVMOptions to modify it.)
>> >
>> >   in my view -XX:+UnlockExperimentalVMOptions mostly serves mostly
>> > as a
>> > "I really want to do that and I know what I am doing" confirmation
>> > from
>> > the user that he is aware that using this (in this case) option to
>> > influence the set of regions taken in during mixed gc you might get
>> > surprising behavior.
>> >
>> > Also, I think there has been no official documentation for it -
>> > also
>> > because it should be very rarely needed.
>> > In particular, I am curious about the case when it would be useful
>> > to
>> > change it. Could you give some log files showing that there is an
>> > issue
>> > with the upper bound for the number of old gen regions to take
>> > during
>> > GC? (i.e. the amount of old gen regions taken is too small and
>> > there is
>> > ample pause time left and it matters to clean up more regions in a
>> > single mixed gc?)
>> >
>> > Sometimes there are problems with the lower bound that is
>> > controlled by
>> > the -XX:G1MixedGCCountTarget (product level) option.
>> >
>> > Hth,
>> >   Thomas
>> >
>> >



More information about the hotspot-gc-dev mailing list