G1OldCSetRegionThresholdPercent under ExperimentalFlag

Thomas Schatzl thomas.schatzl at oracle.com
Thu Jun 22 21:16:19 UTC 2017


Hi Sundara,

On Thu, 2017-06-22 at 09:49 -0700, Sundara Mohan M wrote:
> Hi Thomas,
>    Thanks for the explanation.
> 
> 
> I was trying to debug why it is not including some old region even
> though it had ~100ms (though Ergo logs say it has accommodated all
> regions to cover given 500ms).

Ergo is self-training, but it takes some time to adapt to the
situation.

As long running a run that log shows (thanks!), the number of mixed gcs
is relatively small, and they are pretty far apart (in the range of
hours between mixed gc phases). Young gc occurrences distribution is
far from equal (even considering differences in used young gen size),
so it seems that the application is quite bursty from time to time.

The different mixed gc/old gen space reclamation phases are never
particularly long either, so my best guess would be that the values
used for how long particular regions take to evacuate are messed up.

I.e. from some graphs it roughly looks like is that there is roughly a
mixed gc phase at the start of every bursty phase (as far as I could
identify them looking at graphs), and one during the phase, typically
near the end.

So depending on when that mixed gc occurs (at the start of such a burst
or within), g1 trains itself on different application behavior that it
later uses these values on. This is always some kind of moving average,
which does not necessarily reflect reality.

Very good adaptation to this behavior seems beyond what g1 can do at the moment.

One could in theory force G1 to give much more weight to recent observations to make adaptations quicker (i.e. change some factors in that average calculation); but there is no user option for that, and it may open a separate can of worms (currently it seems to not too eagerly discount older observations compared to more recent ones if I read the code correctly).

But that is just something I made up right now by staring at your log
graphs, I may be wrong :)

It is unfortunately impossible to determine the exact values for these
predictions in a product VM (e.g. comparing actual/predicted detail
values the per-region prediction is made of) at this time as there is
no way to get these relevant values out of the VM.

Back to your problem (if there is one, you did not state any ;)): the
log shows a few issues with mixed gc actually: the one you explained
about not taking enough old gen regions because
the G1OldCSetRegionThresholdPercent is too low as you suspected (still
not reaching max pause time; case 1), and the cases where the number of
old gen regions taken is too low so these are filled up with "expensive
old gen regions". However I am seeing both the actual time taken being
too low and too high (case 2 and 3)

Not sure what your goals are here, and what the actual issue is, but

- you can probably fix case 1 with increasing the mentioned
-XX:G1OldCSetRegionThresholdPercent option if that behavior annoys you.

- fix either case 2 or case 3 with decreasing or increasing
-XX:G1MixedGCCountTarget (one direction increases the minimum number of
regions to take, the other decreases it).

All in all an interesting case to look at :)

Thanks a lot,
  Thomas

> Adding some log snippets here and attaching entire logs in case if
> that helps.
> 
> Running app with 31G
> 
> CommandLine flags: -XX:GCLogFileSize=20971520
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=out-of-memory-heap-
> dump -XX:InitialHeapSize=33285996544 -XX:MaxGCPauseMillis=500
> -XX:MaxHeapSize=33285996544 -XX:MetaspaceSize=536870912
> -XX:NumberOfGCLogFiles=20 -XX:+ParallelRefProcEnabled
> -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC
> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers
> -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation
> -XX:+UseStringDeduplication
> 
> ...
> 2017-06-19T22:54:05.488+0000: 9345.322: [GC pause (G1 Evacuation
> Pause) (mixed)
> Desired survivor size 104857600 bytes, new threshold 1 (max 15)
> - age   1:  131296848 bytes,  131296848 total
> - age   2:  237559952 bytes,  368856800 total
> - age   3:  137259376 bytes,  506116176 total
>  9345.322: [G1Ergonomics (CSet Construction) start choosing CSet,
> _pending_cards: 130042, predicted base time: 171.58 ms, remaining
> time: 328.42 ms, target pause time: 500.00 ms]
>  9345.322: [G1Ergonomics (CSet Construction) add young regions to
> CSet, eden: 121 regions, survivors: 77 regions, predicted young
> region time: 249.33 ms]
>  9345.322: [G1Ergonomics (CSet Construction) finish adding old
> regions to CSet, reason: predicted time is too high, predicted time:
> 0.44 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions]
>  9345.322: [G1Ergonomics (CSet Construction) added expensive regions
> to CSet, reason: old CSet region num not reached min, old: 204
> regions, expensive: 11 regions, min: 204 regions, remaining time:
> 0.00 ms]
>  9345.322: [G1Ergonomics (CSet Construction) finish choosing CSet,
> eden: 121 regions, survivors: 77 regions, old: 204 regions, predicted
> pause time: 504.35 ms, target pause time: 500.00 ms]
>  9345.691: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
> candidate old regions available, candidate old regions: 1425 regions,
> reclaimable: 11364516952 bytes (34.14 %), threshold: 5.00 %]
> , 0.3691404 secs]
>    [Parallel Time: 301.4 ms, GC Workers: 13]
>       [GC Worker Start (ms): Min: 9345323.0, Avg: 9345323.3, Max:
> 9345323.6, Diff: 0.6]
>       [Ext Root Scanning (ms): Min: 0.9, Avg: 1.2, Max: 1.6, Diff:
> 0.6, Sum: 15.9]
>       [Update RS (ms): Min: 62.1, Avg: 62.3, Max: 63.0, Diff: 0.9,
> Sum: 809.4]
>          [Processed Buffers: Min: 35, Avg: 51.8, Max: 91, Diff: 56,
> Sum: 674]
>       [Scan RS (ms): Min: 11.3, Avg: 12.1, Max: 14.8, Diff: 3.6, Sum:
> 157.5]
>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
> 0.0, Sum: 0.1]
>       [Object Copy (ms): Min: 222.2, Avg: 224.8, Max: 225.3, Diff:
> 3.1, Sum: 2922.8]
>       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
> Sum: 0.4]
>          [Termination Attempts: Min: 1, Avg: 15.6, Max: 24, Diff: 23,
> Sum: 203]
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2,
> Sum: 1.1]
>       [GC Worker Total (ms): Min: 300.3, Avg: 300.6, Max: 300.8,
> Diff: 0.5, Sum: 3907.2]
>       [GC Worker End (ms): Min: 9345623.8, Avg: 9345623.9, Max:
> 9345624.0, Diff: 0.2]
>    [Code Root Fixup: 0.1 ms]
>    [Code Root Purge: 0.0 ms]
>    [String Dedup Fixup: 43.9 ms, GC Workers: 13]
>       [Queue Fixup (ms): Min: 0.4, Avg: 2.2, Max: 3.7, Diff: 3.3,
> Sum: 28.6]
>       [Table Fixup (ms): Min: 39.8, Avg: 41.2, Max: 42.9, Diff: 3.2,
> Sum: 535.8]
>    [Clear CT: 3.4 ms]
>    [Other: 20.2 ms]
>       [Choose CSet: 0.3 ms]
>       [Ref Proc: 13.4 ms]
>       [Ref Enq: 1.0 ms]
>       [Redirty Cards: 2.0 ms]
>       [Humongous Register: 0.2 ms]
>       [Humongous Reclaim: 0.1 ms]
>       [Free CSet: 2.1 ms]
>    [Eden: 968.0M(968.0M)->0.0B(1472.0M) Survivors: 616.0M->112.0M
> Heap: 15.6G(31.0G)->13.1G(31.0G)]
>  [Times: user=4.53 sys=0.00, real=0.36 secs]
> ....
> 2017-06-19T22:54:47.655+0000: 9387.489: [GC pause (G1 Evacuation
> Pause) (mixed)
> Desired survivor size 104857600 bytes, new threshold 15 (max 15)
> - age   1:   31749256 bytes,   31749256 total
>  9387.489: [G1Ergonomics (CSet Construction) start choosing CSet,
> _pending_cards: 127449, predicted base time: 168.88 ms, remaining
> time: 331.12 ms, target pause time: 500.00 ms]
>  9387.489: [G1Ergonomics (CSet Construction) add young regions to
> CSet, eden: 184 regions, survivors: 14 regions, predicted young
> region time: 62.79 ms]
>  9387.490: [G1Ergonomics (CSet Construction) finish adding old
> regions to CSet, reason: old CSet region num reached max, old: 397
> regions, max: 397 regions]
>  9387.490: [G1Ergonomics (CSet Construction) finish choosing CSet,
> eden: 184 regions, survivors: 14 regions, old: 397 regions, predicted
> pause time: 390.18 ms, target pause time: 500.00 ms]
>  9387.659: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
> candidate old regions available, candidate old regions: 1028 regions,
> reclaimable: 8047410104 bytes (24.18 %), threshold: 5.00 %]
> , 0.1700662 secs]
>    [Parallel Time: 101.4 ms, GC Workers: 13]
>       [GC Worker Start (ms): Min: 9387490.4, Avg: 9387490.8, Max:
> 9387491.1, Diff: 0.6]
>       [Ext Root Scanning (ms): Min: 0.7, Avg: 1.1, Max: 1.6, Diff:
> 0.9, Sum: 14.3]
>       [Update RS (ms): Min: 27.0, Avg: 27.8, Max: 28.9, Diff: 1.8,
> Sum: 361.9]
>          [Processed Buffers: Min: 34, Avg: 51.4, Max: 88, Diff: 54,
> Sum: 668]
>       [Scan RS (ms): Min: 25.8, Avg: 27.1, Max: 27.4, Diff: 1.6, Sum:
> 352.2]
>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff:
> 0.2, Sum: 0.7]
>       [Object Copy (ms): Min: 42.8, Avg: 43.8, Max: 44.5, Diff: 1.8,
> Sum: 569.9]
>       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> Sum: 0.1]
>          [Termination Attempts: Min: 1, Avg: 9.5, Max: 14, Diff: 13,
> Sum: 124]
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4,
> Sum: 2.3]
>       [GC Worker Total (ms): Min: 99.7, Avg: 100.1, Max: 100.6, Diff:
> 0.9, Sum: 1301.4]
>       [GC Worker End (ms): Min: 9387590.7, Avg: 9387590.9, Max:
> 9387591.1, Diff: 0.4]
>    [Code Root Fixup: 0.3 ms]
>    [Code Root Purge: 0.0 ms]
>    [String Dedup Fixup: 43.5 ms, GC Workers: 13]
>       [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> Sum: 0.0]
>       [Table Fixup (ms): Min: 43.0, Avg: 43.2, Max: 43.4, Diff: 0.3,
> Sum: 561.3]
>    [Clear CT: 3.9 ms]
>    [Other: 21.1 ms]
>       [Choose CSet: 0.8 ms]
>       [Ref Proc: 12.8 ms]
>       [Ref Enq: 0.9 ms]
>       [Redirty Cards: 0.9 ms]
>       [Humongous Register: 0.2 ms]
>       [Humongous Reclaim: 0.1 ms]
>       [Free CSet: 4.2 ms]
>    [Eden: 1472.0M(1472.0M)->0.0B(1424.0M) Survivors: 112.0M->160.0M
> Heap: 14.5G(31.0G)->10.1G(31.0G)]
>  [Times: user=1.93 sys=0.00, real=0.17 secs]
> .....
> 2017-06-19T22:55:29.656+0000: 9429.490: [GC pause (G1 Evacuation
> Pause) (mixed)
> Desired survivor size 104857600 bytes, new threshold 15 (max 15)
> - age   1:   44204040 bytes,   44204040 total
> - age   2:   31422896 bytes,   75626936 total
>  9429.490: [G1Ergonomics (CSet Construction) start choosing CSet,
> _pending_cards: 64391, predicted base time: 130.82 ms, remaining
> time: 369.18 ms, target pause time: 500.00 ms]
>  9429.490: [G1Ergonomics (CSet Construction) add young regions to
> CSet, eden: 178 regions, survivors: 20 regions, predicted young
> region time: 69.26 ms]
>  9429.491: [G1Ergonomics (CSet Construction) finish adding old
> regions to CSet, reason: predicted time is too high, predicted time:
> 2.12 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions]
>  9429.491: [G1Ergonomics (CSet Construction) added expensive regions
> to CSet, reason: old CSet region num not reached min, old: 204
> regions, expensive: 72 regions, min: 204 regions, remaining time:
> 0.00 ms]
>  9429.491: [G1Ergonomics (CSet Construction) finish choosing CSet,
> eden: 178 regions, survivors: 20 regions, old: 204 regions, predicted
> pause time: 684.25 ms, target pause time: 500.00 ms]
>  9429.663: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
> candidate old regions available, candidate old regions: 824 regions,
> reclaimable: 6351099672 bytes (19.08 %), threshold: 5.00 %]
> , 0.1729571 secs]
>    [Parallel Time: 102.6 ms, GC Workers: 13]
>       [GC Worker Start (ms): Min: 9429491.3, Avg: 9429491.6, Max:
> 9429491.9, Diff: 0.6]
>       [Ext Root Scanning (ms): Min: 0.9, Avg: 1.3, Max: 1.8, Diff:
> 0.9, Sum: 16.9]
>       [Update RS (ms): Min: 18.7, Avg: 19.1, Max: 20.9, Diff: 2.2,
> Sum: 248.9]
>          [Processed Buffers: Min: 18, Avg: 32.6, Max: 58, Diff: 40,
> Sum: 424]
>       [Scan RS (ms): Min: 15.5, Avg: 17.1, Max: 18.5, Diff: 2.9, Sum:
> 222.8]
>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff:
> 0.1, Sum: 0.5]
>       [Object Copy (ms): Min: 62.3, Avg: 63.9, Max: 64.4, Diff: 2.2,
> Sum: 831.3]
>       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> Sum: 0.1]
>          [Termination Attempts: Min: 1, Avg: 2.6, Max: 5, Diff: 4,
> Sum: 34]
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4,
> Sum: 2.2]
>       [GC Worker Total (ms): Min: 101.4, Avg: 101.7, Max: 102.1,
> Diff: 0.7, Sum: 1322.7]
>       [GC Worker End (ms): Min: 9429593.3, Avg: 9429593.4, Max:
> 9429593.6, Diff: 0.4]
>    [Code Root Fixup: 0.2 ms]
>    [Code Root Purge: 0.0 ms]
>    [String Dedup Fixup: 45.4 ms, GC Workers: 13]
>       [Queue Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5,
> Sum: 1.5]
>       [Table Fixup (ms): Min: 43.9, Avg: 44.1, Max: 44.2, Diff: 0.4,
> Sum: 573.4]
>    [Clear CT: 4.3 ms]
>    [Other: 20.5 ms]
>       [Choose CSet: 0.5 ms]
>       [Ref Proc: 14.3 ms]
>       [Ref Enq: 1.2 ms]
>       [Redirty Cards: 0.7 ms]
>       [Humongous Register: 0.2 ms]
>       [Humongous Reclaim: 0.1 ms]
>       [Free CSet: 2.4 ms]
>    [Eden: 1424.0M(1424.0M)->0.0B(1392.0M) Survivors: 160.0M->192.0M
> Heap: 11.5G(31.0G)->8796.0M(31.0G)]
>  [Times: user=1.95 sys=0.00, real=0.17 secs]
> 
> 
> On Thu, Jun 22, 2017 at 3:18 AM, Thomas Schatzl <thomas.schatzl at oracl
> e.com> wrote:
> > Hi,
> > 
> > On Tue, 2017-06-20 at 23:45 -0700, Sundara Mohan M wrote:
> > > Hi,
> > >   Can someone shed more light on
> > why G1OldCSetRegionThresholdPercent
> > > flag is under experimental (Need to add  -
> > > XX:+UnlockExperimentalVMOptions to modify it.)
> > 
> >   in my view -XX:+UnlockExperimentalVMOptions mostly serves mostly
> > as a
> > "I really want to do that and I know what I am doing" confirmation
> > from
> > the user that he is aware that using this (in this case) option to
> > influence the set of regions taken in during mixed gc you might get
> > surprising behavior.
> > 
> > Also, I think there has been no official documentation for it -
> > also
> > because it should be very rarely needed.
> > In particular, I am curious about the case when it would be useful
> > to
> > change it. Could you give some log files showing that there is an
> > issue
> > with the upper bound for the number of old gen regions to take
> > during
> > GC? (i.e. the amount of old gen regions taken is too small and
> > there is
> > ample pause time left and it matters to clean up more regions in a
> > single mixed gc?)
> > 
> > Sometimes there are problems with the lower bound that is
> > controlled by
> > the -XX:G1MixedGCCountTarget (product level) option.
> > 
> > Hth,
> >   Thomas
> > 
> > 



More information about the hotspot-gc-dev mailing list