Need help on G1 GC young gen Update RS and Scan RS pause reduction

Wed Jan 25 18:24:30 UTC 2017

Hi Amit,

On Fri, Jan 20, 2017 at 10:51 AM, Amit Mishra <amit.mishra at redknee.com> wrote:
> Thank you very much Thomas, after making said changes RSUpdate time is in
> control but Scan RS is still high.
>
> I also observed that sys time is very high but my system has sufficient
> memory and there has not been a single swapping/paging happened on node, Not
> sure how to troubleshoot this high sys time.

High system/kernel time, Solaris -> DTrace
It won't give you the answer directly but it should at least provide
thread/task stacks indicating which user land and kernel functions are
involved when high sys time is reported, which may help.

>
> 244.719: [GC pause (young), 1.3896037 secs]
>    [Parallel Time: 1356.6 ms, GC Workers: 53]
>       [GC Worker Start (ms): Min: 244718.9, Avg: 244719.3, Max: 244719.8,
> Diff: 0.9]
>       [Ext Root Scanning (ms): Min: 2.2, Avg: 3.0, Max: 3.8, Diff: 1.6, Sum:
> 156.7]
>       [Update RS (ms): Min: 13.3, Avg: 16.3, Max: 47.0, Diff: 33.6, Sum:
> 866.5]
>          [Processed Buffers: Min: 3, Avg: 6.8, Max: 17, Diff: 14, Sum: 361]
>       [Scan RS (ms): Min: 972.9, Avg: 1002.8, Max: 1006.0, Diff: 33.1, Sum:
> 53149.3]
>       [Object Copy (ms): Min: 332.1, Avg: 333.1, Max: 334.5, Diff: 2.4, Sum:
> 17656.0]
>       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 1.0]
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.5, Diff: 0.4, Sum:
> 11.5]
>       [GC Worker Total (ms): Min: 1354.9, Avg: 1355.5, Max: 1356.1, Diff:
> 1.2, Sum: 71841.1]
>       [GC Worker End (ms): Min: 246074.7, Avg: 246074.8, Max: 246075.1,
> Diff: 0.4]
>    [Code Root Fixup: 0.0 ms]
>    [Clear CT: 2.1 ms]
>    [Other: 30.8 ms]
>       [Choose CSet: 0.0 ms]
>       [Ref Proc: 10.3 ms]
>       [Ref Enq: 1.6 ms]
>       [Free CSet: 1.6 ms]
>    [Eden: 2848.0M(1856.0M)->0.0B(224.0M) Survivors: 192.0M->256.0M Heap:
> 10.3G(48.0G)->9324.6M(48.0G)]
> [Times: user=16.93 sys=1.07, real=1.39 secs]
>
>
> Thanks,
> Amit Mishra
>
> -----Original Message-----
> From: Thomas Schatzl [mailto:thomas.schatzl at oracle.com]
> Sent: Friday, January 20, 2017 00:33
> To: Amit Mishra <amit.mishra at redknee.com>; hotspot-gc-use at openjdk.java.net
> Subject: Re: Need help on G1 GC young gen Update RS and Scan RS pause
> reduction
>
> Hi Amit,
>
> On Thu, 2017-01-19 at 13:05 +0000, Amit Mishra wrote:
>> Hello Friends,
>>
>> I am using G1 GC on Java 1.7.45 version in my solaris LAB environment
>> but am getting very high UpdateRS time and few occasions high Scan RS
>> time as well.
>
>   just as reminder, consider at least the latest version of 1.7. There were
> a few interesting changes to G1 later than 1.7.45.
>
> JDK8 would even be better of course :)
>
>> OS version : Generic_150401-30 i86pc i386 i86pc Number of CPU # 80, A
>> day back i have tried to play with below options hoping extra GC
>> threads will help me but all of no avail as I had witnessed long
>> pauses up to 7 seconds. Right now I have disabled these GC threads.
>>
>> argv[52]: -XX:ParallelGCThreads=70
>> argv[53]: -XX:ConcGCThreads=60
>> argv[54]: -XX:G1ConcRefinementThreads=70
>
> Unless I misunderstand what a virtual cpu in Solaris is, I do not think
> using more than there are (64 from an earlier email) gives you any
> advantage.
>
>> I have already put -XX:G1RSetUpdatingPauseTimePercent=5 option and
>> relaxed pause time goal by setting it to 500 ms but it doesn’t seems
>> to be helping me.
>
> There is one (compiler) optimization that may generate lots of work for the
> Update RS phase called ReduceInitialCardMarks. Basically it tells
> g1 to revisit the entire array concurrently. If this allocation happens just
> before a GC, the gc needs to do all the work in the pause.
>
> You can try the option that disables this optimization, i.e. set -XX:-
> ReduceInitialCardMarks. Of course it is kind of a shot in the dark, and only
> if you know that your application allocates (a lot) of largish objects.
> However, this helped in other cases of spurious long Update RS pauses.
>
> Also note that I am not sure whether disabling this option causes crashes in
> 7u - there were some bugfixes in 9 for that.
>
> Another option (that should help at least a bit) to reduce update rs/scan rs
> times may be increasing heap region size to 32M (- XX:G1HeapRegionSize=32M).
>
>> Your urgent help will be highly appreciated as I need system where
>>  avg. pause time remain around 200-300 milli-second and Max pause
>> remain around 500-600 ms.
>>
>> Current GC parameters are as below:
>> argv[11]: -Xmx48g
>> argv[12]: -Xms48g
>> argv[13]: -XX:-EliminateLocks
>> argv[24]: -Xss1m
>> argv[25]: -Xoss1m
>> argv[26]: -XX:PermSize=512m
>> argv[27]: -XX:MaxPermSize=512m
>> argv[28]: -XX:ReservedCodeCacheSize=128m
>> argv[29]: -XX:+HeapDumpOnOutOfMemoryError
>> argv[30]: -XX:+AggressiveOpts
>> argv[35]: -XX:+UseG1GC
>> argv[36]: -XX:MaxGCPauseMillis=500
>> argv[37]: -XX:+PrintFlagsFinal
>> argv[38]: -XX:G1RSetUpdatingPauseTimePercent=5
>> argv[39]: -XX:+PrintGCTimeStamps
>> argv[40]: -XX:+PrintGCDetails
>> argv[42]: -verbose:gc
>> argv[43]: -XX:+UseLargePages
>> argv[44]: -XX:+MaxFDLimit
>> argv[48]: -XX:+UnlockExperimentalVMOptions
>> argv[49]: -XX:G1NewSizePercent=1
>> argv[50]: -XX:+ParallelRefProcEnabled
>> argv[51]: -XX:+DisableExplicitGC
>> argv[52]: -XX:G1MaxNewSizePercent=1
>
> I am not sure if setting the maximum new size is a good idea (causing long
> Object Copy *mostly*). It might be useful to just remove this one.
>
>> argv[53]: -XX:InitiatingHeapOccupancyPercent=40
>> argv[54]: -XX:+UnlockDiagnosticVMOptions
>> argv[55]: -XX:+G1SummarizeRSetStats
>> argv[56]: -XX:G1SummarizeRSetStatsPeriod=1
>> argv[57]: -XX:+PerfDisableSharedMem
>> argv[58]: -XX:+AlwaysPreTouch
>>
>>
>>
>> Pause snapshots:
>>
>> Very high UpdateRS but low scan RS
>>
>> 2602.655: [GC pause (young), 7.8981373 secs]
>>    [Parallel Time: 7883.9 ms, GC Workers: 53]
>>       [GC Worker Start (ms): Min: 2602654.9, Avg: 2602655.3, Max:
>> 2602655.7, Diff: 0.8]
>>       [Ext Root Scanning (ms): Min: 3.5, Avg: 4.5, Max: 34.2, Diff:
>> 30.7, Sum: 240.0]
>>       [Update RS (ms): Min: 7796.5, Avg: 7826.2, Max: 7827.8, Diff:
>> 31.4, Sum: 414789.9]
>>          [Processed Buffers: Min: 1, Avg: 3.8, Max: 12, Diff: 11,
>> Sum: 199]
>>       [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
>> 0.6]
>>       [Object Copy (ms): Min: 48.2, Avg: 48.8, Max: 49.4, Diff: 1.3,
>> Sum: 2588.2]
>>       [Termination (ms): Min: 3.1, Avg: 3.5, Max: 3.9, Diff: 0.8,
>> Sum: 183.8]
>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.3,
>> Sum: 8.4]
>>       [GC Worker Total (ms): Min: 7882.7, Avg: 7883.2, Max: 7883.8,
>> Diff: 1.1, Sum: 417810.9]
>>       [GC Worker End (ms): Min: 2610538.4, Avg: 2610538.5, Max:
>> 2610538.7, Diff: 0.3]
>>    [Code Root Fixup: 0.0 ms]
>>    [Clear CT: 0.9 ms]
>>    [Other: 13.3 ms]
>>       [Choose CSet: 0.0 ms]
>>       [Ref Proc: 10.5 ms]
>>       [Ref Enq: 1.0 ms]
>>       [Free CSet: 0.5 ms]
>>    [Eden: 432.0M(432.0M)->0.0B(416.0M) Survivors: 48.0M->64.0M Heap:
>> 19.3G(48.0G)->18.9G(48.0G)]
>> [Times: user=3.24 sys=0.02, real=7.90 secs]
>
> Something is wrong here, i.e. a a wall time larger than user time is strange
> to say at least. Maybe that machine is overloaded with other applications?
>
> I recommend trying to get to the bottom of this first.
>
>> Concurrent RS processed 48786945 cards
>>   Of 1661300 completed buffers:
>>       1661300 (100.0%) by conc RS threads.
>>             0 (  0.0%) by mutator threads.
>>   Conc RS threads times(s)
>>          17.55    15.01    12.67    11.26    10.14     9.56
>> 8.13     7.26     6.61     6.18     5.25     4.64     3.97
>> 3.16     2.92     2.76     2.60     2.54     2.36     2.30     2.16
>>   2.12     2.00     1.92     1.83     1.82     1.75     1.69
>> 1.64     1.64     1.55     1.50     1.45     1.48     1.41
>> 1.36     1.32     1.29     1.26     1.22     1.15     1.18     1.12
>>    1.07     1.05     1.03     1.01     0.97     0.96     0.95
>> 0.89     0.89     0.87     0.33
>>   Total heap region rem set sizes = 808374K.  Max = 4135K.
>>   Static structures = 1596K, free_lists = 0K.
>>     72965736 occupied cards represented.
>>     Max size region =
>> 35:(O)[0xfffffd73e3000000,0xfffffd73e4000000,0xfffffd73e4000000],
>> size = 4136K, occupied = 14130K.
>>     Did 252 coarsenings.
>
> This may or may not be what causes long scan rs times; you could try
> -XX:G1RSetRegionEntries=2048 to avoid coarsenings (and you should remove the
> g1SummarizeRSetStats* options again).
>
> (the 2048 is just a value higher than the number of regions in the heap, at
> least with 32m regions).
>
> Hth,
>   Thomas
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>