Need help on G1 GC young gen Update RS and Scan RS pause reduction

Thu Jan 19 19:03:24 UTC 2017

Hi Amit,

On Thu, 2017-01-19 at 13:05 +0000, Amit Mishra wrote:
> Hello Friends,
>  
> I am using G1 GC on Java 1.7.45 version in my solaris LAB environment
> but am getting very high UpdateRS time and few occasions high Scan RS
> time as well.

  just as reminder, consider at least the latest version of 1.7. There
were a few interesting changes to G1 later than 1.7.45.

JDK8 would even be better of course :)

> OS version : Generic_150401-30 i86pc i386 i86pc
> Number of CPU # 80, A day back i have tried to play with below
> options hoping extra GC threads will help me but all of no avail as I
> had witnessed long pauses up to 7 seconds. Right now I have disabled
> these GC threads.
>  
> argv[52]: -XX:ParallelGCThreads=70
> argv[53]: -XX:ConcGCThreads=60
> argv[54]: -XX:G1ConcRefinementThreads=70

Unless I misunderstand what a virtual cpu in Solaris is, I do not think
using more than there are (64 from an earlier email) gives you any
advantage.

> I have already put -XX:G1RSetUpdatingPauseTimePercent=5 option and
> relaxed pause time goal by setting it to 500 ms but it doesn’t seems
> to be helping me.

There is one (compiler) optimization that may generate lots of work for
the Update RS phase called ReduceInitialCardMarks. Basically it tells
g1 to revisit the entire array concurrently. If this allocation happens
just before a GC, the gc needs to do all the work in the pause.

You can try the option that disables this optimization, i.e. set -XX:-
ReduceInitialCardMarks. Of course it is kind of a shot in the dark, and
only if you know that your application allocates (a lot) of largish
objects. However, this helped in other cases of spurious long Update RS
pauses.

Also note that I am not sure whether disabling this option causes
crashes in 7u - there were some bugfixes in 9 for that.

Another option (that should help at least a bit) to reduce update
rs/scan rs times may be increasing heap region size to 32M (-
XX:G1HeapRegionSize=32M).

> Your urgent help will be highly appreciated as I need system where
>  avg. pause time remain around 200-300 milli-second and Max pause
> remain around 500-600 ms.
>  
> Current GC parameters are as below:
> argv[11]: -Xmx48g
> argv[12]: -Xms48g
> argv[13]: -XX:-EliminateLocks
> argv[24]: -Xss1m
> argv[25]: -Xoss1m
> argv[26]: -XX:PermSize=512m
> argv[27]: -XX:MaxPermSize=512m
> argv[28]: -XX:ReservedCodeCacheSize=128m
> argv[29]: -XX:+HeapDumpOnOutOfMemoryError
> argv[30]: -XX:+AggressiveOpts
> argv[35]: -XX:+UseG1GC
> argv[36]: -XX:MaxGCPauseMillis=500
> argv[37]: -XX:+PrintFlagsFinal
> argv[38]: -XX:G1RSetUpdatingPauseTimePercent=5
> argv[39]: -XX:+PrintGCTimeStamps
> argv[40]: -XX:+PrintGCDetails
> argv[42]: -verbose:gc
> argv[43]: -XX:+UseLargePages
> argv[44]: -XX:+MaxFDLimit
> argv[48]: -XX:+UnlockExperimentalVMOptions
> argv[49]: -XX:G1NewSizePercent=1
> argv[50]: -XX:+ParallelRefProcEnabled
> argv[51]: -XX:+DisableExplicitGC
> argv[52]: -XX:G1MaxNewSizePercent=1

I am not sure if setting the maximum new size is a good idea (causing
long Object Copy *mostly*). It might be useful to just remove this one.

> argv[53]: -XX:InitiatingHeapOccupancyPercent=40
> argv[54]: -XX:+UnlockDiagnosticVMOptions
> argv[55]: -XX:+G1SummarizeRSetStats
> argv[56]: -XX:G1SummarizeRSetStatsPeriod=1
> argv[57]: -XX:+PerfDisableSharedMem
> argv[58]: -XX:+AlwaysPreTouch
>                
>  
>  
> Pause snapshots:
>  
> Very high UpdateRS but low scan RS
>  
> 2602.655: [GC pause (young), 7.8981373 secs]
>    [Parallel Time: 7883.9 ms, GC Workers: 53]
>       [GC Worker Start (ms): Min: 2602654.9, Avg: 2602655.3, Max:
> 2602655.7, Diff: 0.8]
>       [Ext Root Scanning (ms): Min: 3.5, Avg: 4.5, Max: 34.2, Diff:
> 30.7, Sum: 240.0]
>       [Update RS (ms): Min: 7796.5, Avg: 7826.2, Max: 7827.8, Diff:
> 31.4, Sum: 414789.9]
>          [Processed Buffers: Min: 1, Avg: 3.8, Max: 12, Diff: 11,
> Sum: 199]
>       [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
> 0.6]
>       [Object Copy (ms): Min: 48.2, Avg: 48.8, Max: 49.4, Diff: 1.3,
> Sum: 2588.2]
>       [Termination (ms): Min: 3.1, Avg: 3.5, Max: 3.9, Diff: 0.8,
> Sum: 183.8]
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.3,
> Sum: 8.4]
>       [GC Worker Total (ms): Min: 7882.7, Avg: 7883.2, Max: 7883.8,
> Diff: 1.1, Sum: 417810.9]
>       [GC Worker End (ms): Min: 2610538.4, Avg: 2610538.5, Max:
> 2610538.7, Diff: 0.3]
>    [Code Root Fixup: 0.0 ms]
>    [Clear CT: 0.9 ms]
>    [Other: 13.3 ms]
>       [Choose CSet: 0.0 ms]
>       [Ref Proc: 10.5 ms]
>       [Ref Enq: 1.0 ms]
>       [Free CSet: 0.5 ms]
>    [Eden: 432.0M(432.0M)->0.0B(416.0M) Survivors: 48.0M->64.0M Heap:
> 19.3G(48.0G)->18.9G(48.0G)]
> [Times: user=3.24 sys=0.02, real=7.90 secs]

Something is wrong here, i.e. a a wall time larger than user time is
strange to say at least. Maybe that machine is overloaded with other
applications?

I recommend trying to get to the bottom of this first.

> Concurrent RS processed 48786945 cards
>   Of 1661300 completed buffers:
>       1661300 (100.0%) by conc RS threads.
>             0 (  0.0%) by mutator threads.
>   Conc RS threads times(s)
>          17.55    15.01    12.67    11.26    10.14     9.56    
> 8.13     7.26     6.61     6.18     5.25     4.64     3.97    
> 3.16     2.92     2.76     2.60     2.54     2.36     2.30     2.16
>   2.12     2.00     1.92     1.83     1.82     1.75     1.69    
> 1.64     1.64     1.55     1.50     1.45     1.48     1.41    
> 1.36     1.32     1.29     1.26     1.22     1.15     1.18     1.12
>    1.07     1.05     1.03     1.01     0.97     0.96     0.95    
> 0.89     0.89     0.87     0.33
>   Total heap region rem set sizes = 808374K.  Max = 4135K.
>   Static structures = 1596K, free_lists = 0K.
>     72965736 occupied cards represented.
>     Max size region =
> 35:(O)[0xfffffd73e3000000,0xfffffd73e4000000,0xfffffd73e4000000],
> size = 4136K, occupied = 14130K.
>     Did 252 coarsenings.

This may or may not be what causes long scan rs times; you could try
-XX:G1RSetRegionEntries=2048 to avoid coarsenings (and you should
remove the g1SummarizeRSetStats* options again).

(the 2048 is just a value higher than the number of regions in the
heap, at least with 32m regions).

Hth,
  Thomas