G1OldCSetRegionThresholdPercent under ExperimentalFlag

Bernd Eckenfels ecki at zusammenkunft.net
Thu Jun 22 22:30:18 UTC 2017


Looking at used memory like this is a bit problematic, since the Java Heap tends to hold on memory - and only when the GC runs and tries to free memory it is known how the real memory is used. In case of CMS the collection is triggered regularly in the background. This is why the used memory metric is not that bad. However with G1 (and even worse with throughput collector) you often see a larger usage than actual referenced memory. (This is a bit an oversimplification as it does not address soft references)

What I typically recommend is to not look at the used memory metric at fixed intervals but wait for a Gc event and look at the 'usage after gc'. This also has problems (gives you the usage late) but it will avoid the false positive you have observed.

Gruss
Bernd
--
http://bernd.eckenfels.net
_____________________________
From: Sundara Mohan M <email.sundarms at gmail.com<mailto:email.sundarms at gmail.com>>
Sent: Freitag, Juni 23, 2017 12:23 AM
Subject: Re: G1OldCSetRegionThresholdPercent under ExperimentalFlag
To: Thomas Schatzl <thomas.schatzl at oracle.com<mailto:thomas.schatzl at oracle.com>>
Cc: <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>


Thanks for the insights on Ergo.

I was trying to migrate from CMS to G1GC, the app has a low memory
handler ( the thread which finds memory utilization from
Runtime.getFreememory and removes some data from in memory if it
exceeds the threshold).

In CMS this handler was not invoked frequently (for ex: When I have
60K objects it will kick in remove ~5K LRU objects and continue
regular operation) when i moved to G1GC this handler started kicking
in frequently(ex: When i have 60K object it will remove 5K LRU objects
and immediately after some time it will kick in and remove another 5K
and goes till 10K objects are left).

So, i was trying to find out why did mixed GC doesn't cleanup quick
enough before my low memory handler kicks in.

Though i see number of young gen collection and time taken to clean
has came down by ~40%.

Another issue (may be this is expected) is after increasing
G1OldCSetRegionThresholdPercent to 20% from 10% i am started seeing
few mixed GC taking 1s (most of the time is spent on UpdateRS,
MaxPause=500ms). Will get back once i have more understanding on what
is happening..

Thanks,
Sundar

On Thu, Jun 22, 2017 at 2:16 PM, Thomas Schatzl
<thomas.schatzl at oracle.com<mailto:thomas.schatzl at oracle.com>> wrote:
> Hi Sundara,
>
> On Thu, 2017-06-22 at 09:49 -0700, Sundara Mohan M wrote:
>> Hi Thomas,
>> Thanks for the explanation.
>>
>>
>> I was trying to debug why it is not including some old region even
>> though it had ~100ms (though Ergo logs say it has accommodated all
>> regions to cover given 500ms).
>
> Ergo is self-training, but it takes some time to adapt to the
> situation.
>
> As long running a run that log shows (thanks!), the number of mixed gcs
> is relatively small, and they are pretty far apart (in the range of
> hours between mixed gc phases). Young gc occurrences distribution is
> far from equal (even considering differences in used young gen size),
> so it seems that the application is quite bursty from time to time.
>
> The different mixed gc/old gen space reclamation phases are never
> particularly long either, so my best guess would be that the values
> used for how long particular regions take to evacuate are messed up.
>
> I.e. from some graphs it roughly looks like is that there is roughly a
> mixed gc phase at the start of every bursty phase (as far as I could
> identify them looking at graphs), and one during the phase, typically
> near the end.
>
> So depending on when that mixed gc occurs (at the start of such a burst
> or within), g1 trains itself on different application behavior that it
> later uses these values on. This is always some kind of moving average,
> which does not necessarily reflect reality.
>
> Very good adaptation to this behavior seems beyond what g1 can do at the moment.
>
> One could in theory force G1 to give much more weight to recent observations to make adaptations quicker (i.e. change some factors in that average calculation); but there is no user option for that, and it may open a separate can of worms (currently it seems to not too eagerly discount older observations compared to more recent ones if I read the code correctly).
>
> But that is just something I made up right now by staring at your log
> graphs, I may be wrong :)
>
> It is unfortunately impossible to determine the exact values for these
> predictions in a product VM (e.g. comparing actual/predicted detail
> values the per-region prediction is made of) at this time as there is
> no way to get these relevant values out of the VM.
>
> Back to your problem (if there is one, you did not state any ;)): the
> log shows a few issues with mixed gc actually: the one you explained
> about not taking enough old gen regions because
> the G1OldCSetRegionThresholdPercent is too low as you suspected (still
> not reaching max pause time; case 1), and the cases where the number of
> old gen regions taken is too low so these are filled up with "expensive
> old gen regions". However I am seeing both the actual time taken being
> too low and too high (case 2 and 3)
>
> Not sure what your goals are here, and what the actual issue is, but
>
> - you can probably fix case 1 with increasing the mentioned
> -XX:G1OldCSetRegionThresholdPercent option if that behavior annoys you.
>
> - fix either case 2 or case 3 with decreasing or increasing
> -XX:G1MixedGCCountTarget (one direction increases the minimum number of
> regions to take, the other decreases it).
>
> All in all an interesting case to look at :)
>
> Thanks a lot,
> Thomas
>
>> Adding some log snippets here and attaching entire logs in case if
>> that helps.
>>
>> Running app with 31G
>>
>> CommandLine flags: -XX:GCLogFileSize=20971520
>> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=out-of-memory-heap-
>> dump -XX:InitialHeapSize=33285996544 -XX:MaxGCPauseMillis=500
>> -XX:MaxHeapSize=33285996544 -XX:MetaspaceSize=536870912
>> -XX:NumberOfGCLogFiles=20 -XX:+ParallelRefProcEnabled
>> -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC
>> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>> -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers
>> -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation
>> -XX:+UseStringDeduplication
>>
>> ...
>> 2017-06-19T22:54:05.488+0000: 9345.322: [GC pause (G1 Evacuation
>> Pause) (mixed)
>> Desired survivor size 104857600 bytes, new threshold 1 (max 15)
>> - age 1: 131296848 bytes, 131296848 total
>> - age 2: 237559952 bytes, 368856800 total
>> - age 3: 137259376 bytes, 506116176 total
>> 9345.322: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 130042, predicted base time: 171.58 ms, remaining
>> time: 328.42 ms, target pause time: 500.00 ms]
>> 9345.322: [G1Ergonomics (CSet Construction) add young regions to
>> CSet, eden: 121 regions, survivors: 77 regions, predicted young
>> region time: 249.33 ms]
>> 9345.322: [G1Ergonomics (CSet Construction) finish adding old
>> regions to CSet, reason: predicted time is too high, predicted time:
>> 0.44 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions]
>> 9345.322: [G1Ergonomics (CSet Construction) added expensive regions
>> to CSet, reason: old CSet region num not reached min, old: 204
>> regions, expensive: 11 regions, min: 204 regions, remaining time:
>> 0.00 ms]
>> 9345.322: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 121 regions, survivors: 77 regions, old: 204 regions, predicted
>> pause time: 504.35 ms, target pause time: 500.00 ms]
>> 9345.691: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
>> candidate old regions available, candidate old regions: 1425 regions,
>> reclaimable: 11364516952 bytes (34.14 %), threshold: 5.00 %]
>> , 0.3691404 secs]
>> [Parallel Time: 301.4 ms, GC Workers: 13]
>> [GC Worker Start (ms): Min: 9345323.0, Avg: 9345323.3, Max:
>> 9345323.6, Diff: 0.6]
>> [Ext Root Scanning (ms): Min: 0.9, Avg: 1.2, Max: 1.6, Diff:
>> 0.6, Sum: 15.9]
>> [Update RS (ms): Min: 62.1, Avg: 62.3, Max: 63.0, Diff: 0.9,
>> Sum: 809.4]
>> [Processed Buffers: Min: 35, Avg: 51.8, Max: 91, Diff: 56,
>> Sum: 674]
>> [Scan RS (ms): Min: 11.3, Avg: 12.1, Max: 14.8, Diff: 3.6, Sum:
>> 157.5]
>> [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
>> 0.0, Sum: 0.1]
>> [Object Copy (ms): Min: 222.2, Avg: 224.8, Max: 225.3, Diff:
>> 3.1, Sum: 2922.8]
>> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>> Sum: 0.4]
>> [Termination Attempts: Min: 1, Avg: 15.6, Max: 24, Diff: 23,
>> Sum: 203]
>> [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2,
>> Sum: 1.1]
>> [GC Worker Total (ms): Min: 300.3, Avg: 300.6, Max: 300.8,
>> Diff: 0.5, Sum: 3907.2]
>> [GC Worker End (ms): Min: 9345623.8, Avg: 9345623.9, Max:
>> 9345624.0, Diff: 0.2]
>> [Code Root Fixup: 0.1 ms]
>> [Code Root Purge: 0.0 ms]
>> [String Dedup Fixup: 43.9 ms, GC Workers: 13]
>> [Queue Fixup (ms): Min: 0.4, Avg: 2.2, Max: 3.7, Diff: 3.3,
>> Sum: 28.6]
>> [Table Fixup (ms): Min: 39.8, Avg: 41.2, Max: 42.9, Diff: 3.2,
>> Sum: 535.8]
>> [Clear CT: 3.4 ms]
>> [Other: 20.2 ms]
>> [Choose CSet: 0.3 ms]
>> [Ref Proc: 13.4 ms]
>> [Ref Enq: 1.0 ms]
>> [Redirty Cards: 2.0 ms]
>> [Humongous Register: 0.2 ms]
>> [Humongous Reclaim: 0.1 ms]
>> [Free CSet: 2.1 ms]
>> [Eden: 968.0M(968.0M)->0.0B(1472.0M) Survivors: 616.0M->112.0M
>> Heap: 15.6G(31.0G)->13.1G(31.0G)]
>> [Times: user=4.53 sys=0.00, real=0.36 secs]
>> ....
>> 2017-06-19T22:54:47.655+0000: 9387.489: [GC pause (G1 Evacuation
>> Pause) (mixed)
>> Desired survivor size 104857600 bytes, new threshold 15 (max 15)
>> - age 1: 31749256 bytes, 31749256 total
>> 9387.489: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 127449, predicted base time: 168.88 ms, remaining
>> time: 331.12 ms, target pause time: 500.00 ms]
>> 9387.489: [G1Ergonomics (CSet Construction) add young regions to
>> CSet, eden: 184 regions, survivors: 14 regions, predicted young
>> region time: 62.79 ms]
>> 9387.490: [G1Ergonomics (CSet Construction) finish adding old
>> regions to CSet, reason: old CSet region num reached max, old: 397
>> regions, max: 397 regions]
>> 9387.490: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 184 regions, survivors: 14 regions, old: 397 regions, predicted
>> pause time: 390.18 ms, target pause time: 500.00 ms]
>> 9387.659: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
>> candidate old regions available, candidate old regions: 1028 regions,
>> reclaimable: 8047410104 bytes (24.18 %), threshold: 5.00 %]
>> , 0.1700662 secs]
>> [Parallel Time: 101.4 ms, GC Workers: 13]
>> [GC Worker Start (ms): Min: 9387490.4, Avg: 9387490.8, Max:
>> 9387491.1, Diff: 0.6]
>> [Ext Root Scanning (ms): Min: 0.7, Avg: 1.1, Max: 1.6, Diff:
>> 0.9, Sum: 14.3]
>> [Update RS (ms): Min: 27.0, Avg: 27.8, Max: 28.9, Diff: 1.8,
>> Sum: 361.9]
>> [Processed Buffers: Min: 34, Avg: 51.4, Max: 88, Diff: 54,
>> Sum: 668]
>> [Scan RS (ms): Min: 25.8, Avg: 27.1, Max: 27.4, Diff: 1.6, Sum:
>> 352.2]
>> [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff:
>> 0.2, Sum: 0.7]
>> [Object Copy (ms): Min: 42.8, Avg: 43.8, Max: 44.5, Diff: 1.8,
>> Sum: 569.9]
>> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>> Sum: 0.1]
>> [Termination Attempts: Min: 1, Avg: 9.5, Max: 14, Diff: 13,
>> Sum: 124]
>> [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4,
>> Sum: 2.3]
>> [GC Worker Total (ms): Min: 99.7, Avg: 100.1, Max: 100.6, Diff:
>> 0.9, Sum: 1301.4]
>> [GC Worker End (ms): Min: 9387590.7, Avg: 9387590.9, Max:
>> 9387591.1, Diff: 0.4]
>> [Code Root Fixup: 0.3 ms]
>> [Code Root Purge: 0.0 ms]
>> [String Dedup Fixup: 43.5 ms, GC Workers: 13]
>> [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>> Sum: 0.0]
>> [Table Fixup (ms): Min: 43.0, Avg: 43.2, Max: 43.4, Diff: 0.3,
>> Sum: 561.3]
>> [Clear CT: 3.9 ms]
>> [Other: 21.1 ms]
>> [Choose CSet: 0.8 ms]
>> [Ref Proc: 12.8 ms]
>> [Ref Enq: 0.9 ms]
>> [Redirty Cards: 0.9 ms]
>> [Humongous Register: 0.2 ms]
>> [Humongous Reclaim: 0.1 ms]
>> [Free CSet: 4.2 ms]
>> [Eden: 1472.0M(1472.0M)->0.0B(1424.0M) Survivors: 112.0M->160.0M
>> Heap: 14.5G(31.0G)->10.1G(31.0G)]
>> [Times: user=1.93 sys=0.00, real=0.17 secs]
>> .....
>> 2017-06-19T22:55:29.656+0000: 9429.490: [GC pause (G1 Evacuation
>> Pause) (mixed)
>> Desired survivor size 104857600 bytes, new threshold 15 (max 15)
>> - age 1: 44204040 bytes, 44204040 total
>> - age 2: 31422896 bytes, 75626936 total
>> 9429.490: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 64391, predicted base time: 130.82 ms, remaining
>> time: 369.18 ms, target pause time: 500.00 ms]
>> 9429.490: [G1Ergonomics (CSet Construction) add young regions to
>> CSet, eden: 178 regions, survivors: 20 regions, predicted young
>> region time: 69.26 ms]
>> 9429.491: [G1Ergonomics (CSet Construction) finish adding old
>> regions to CSet, reason: predicted time is too high, predicted time:
>> 2.12 ms, remaining time: 0.00 ms, old: 204 regions, min: 204 regions]
>> 9429.491: [G1Ergonomics (CSet Construction) added expensive regions
>> to CSet, reason: old CSet region num not reached min, old: 204
>> regions, expensive: 72 regions, min: 204 regions, remaining time:
>> 0.00 ms]
>> 9429.491: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 178 regions, survivors: 20 regions, old: 204 regions, predicted
>> pause time: 684.25 ms, target pause time: 500.00 ms]
>> 9429.663: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
>> candidate old regions available, candidate old regions: 824 regions,
>> reclaimable: 6351099672 bytes (19.08 %), threshold: 5.00 %]
>> , 0.1729571 secs]
>> [Parallel Time: 102.6 ms, GC Workers: 13]
>> [GC Worker Start (ms): Min: 9429491.3, Avg: 9429491.6, Max:
>> 9429491.9, Diff: 0.6]
>> [Ext Root Scanning (ms): Min: 0.9, Avg: 1.3, Max: 1.8, Diff:
>> 0.9, Sum: 16.9]
>> [Update RS (ms): Min: 18.7, Avg: 19.1, Max: 20.9, Diff: 2.2,
>> Sum: 248.9]
>> [Processed Buffers: Min: 18, Avg: 32.6, Max: 58, Diff: 40,
>> Sum: 424]
>> [Scan RS (ms): Min: 15.5, Avg: 17.1, Max: 18.5, Diff: 2.9, Sum:
>> 222.8]
>> [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff:
>> 0.1, Sum: 0.5]
>> [Object Copy (ms): Min: 62.3, Avg: 63.9, Max: 64.4, Diff: 2.2,
>> Sum: 831.3]
>> [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>> Sum: 0.1]
>> [Termination Attempts: Min: 1, Avg: 2.6, Max: 5, Diff: 4,
>> Sum: 34]
>> [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4,
>> Sum: 2.2]
>> [GC Worker Total (ms): Min: 101.4, Avg: 101.7, Max: 102.1,
>> Diff: 0.7, Sum: 1322.7]
>> [GC Worker End (ms): Min: 9429593.3, Avg: 9429593.4, Max:
>> 9429593.6, Diff: 0.4]
>> [Code Root Fixup: 0.2 ms]
>> [Code Root Purge: 0.0 ms]
>> [String Dedup Fixup: 45.4 ms, GC Workers: 13]
>> [Queue Fixup (ms): Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5,
>> Sum: 1.5]
>> [Table Fixup (ms): Min: 43.9, Avg: 44.1, Max: 44.2, Diff: 0.4,
>> Sum: 573.4]
>> [Clear CT: 4.3 ms]
>> [Other: 20.5 ms]
>> [Choose CSet: 0.5 ms]
>> [Ref Proc: 14.3 ms]
>> [Ref Enq: 1.2 ms]
>> [Redirty Cards: 0.7 ms]
>> [Humongous Register: 0.2 ms]
>> [Humongous Reclaim: 0.1 ms]
>> [Free CSet: 2.4 ms]
>> [Eden: 1424.0M(1424.0M)->0.0B(1392.0M) Survivors: 160.0M->192.0M
>> Heap: 11.5G(31.0G)->8796.0M(31.0G)]
>> [Times: user=1.95 sys=0.00, real=0.17 secs]
>>
>>
>> On Thu, Jun 22, 2017 at 3:18 AM, Thomas Schatzl <thomas.schatzl at oracl
>> e.com<http://e.com>> wrote:
>> > Hi,
>> >
>> > On Tue, 2017-06-20 at 23:45 -0700, Sundara Mohan M wrote:
>> > > Hi,
>> > > Can someone shed more light on
>> > why G1OldCSetRegionThresholdPercent
>> > > flag is under experimental (Need to add -
>> > > XX:+UnlockExperimentalVMOptions to modify it.)
>> >
>> > in my view -XX:+UnlockExperimentalVMOptions mostly serves mostly
>> > as a
>> > "I really want to do that and I know what I am doing" confirmation
>> > from
>> > the user that he is aware that using this (in this case) option to
>> > influence the set of regions taken in during mixed gc you might get
>> > surprising behavior.
>> >
>> > Also, I think there has been no official documentation for it -
>> > also
>> > because it should be very rarely needed.
>> > In particular, I am curious about the case when it would be useful
>> > to
>> > change it. Could you give some log files showing that there is an
>> > issue
>> > with the upper bound for the number of old gen regions to take
>> > during
>> > GC? (i.e. the amount of old gen regions taken is too small and
>> > there is
>> > ample pause time left and it matters to clean up more regions in a
>> > single mixed gc?)
>> >
>> > Sometimes there are problems with the lower bound that is
>> > controlled by
>> > the -XX:G1MixedGCCountTarget (product level) option.
>> >
>> > Hth,
>> > Thomas
>> >
>> >


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20170622/0905c5b6/attachment.htm>


More information about the hotspot-gc-dev mailing list