G1 high STW

Thomas Schatzl thomas.schatzl at oracle.com
Wed May 11 09:10:01 UTC 2016


Hi,

On Tue, 2016-05-03 at 16:34 -0500, charlie hunt wrote:
> Hi Artur,
> 
> Based on the information you have shared thus far, there are several
> possible things that could be an issue.
> 
> These are things that are very unusual:
> 
> 2016-04-27T21:32:08.721+0400: 9355.239: [SoftReference, 0 refs,
> 0.5480516 secs]2016-04-27T21:32:09.269+0400: 9355.787:
> [WeakReference, 3415 refs, 0.3998733 secs]2016-04-
> 27T21:32:09.669+0400: 9356.187: [FinalReference, 4373 refs, 1.3356211
> secs]2016-04-27T21:32:11.005+0400: 9357.523: [PhantomReference, 0
> refs, 0 refs, 0.3307221 secs]2016-04-27T21:32:11.336+0400: 9357.854:
> [JNI Weak Reference, 0.0005460 secs], 7.6676890 secs]

With that few references I would kind of recommend disabling parallel
reference processing.

A large amount of overhead is incurred in these cases.

It may just be some different process on the same machine stealing cpu
time too.

>  [Times: user=36.61 sys=4.36, real=7.67 secs] 
> Any time we see sys CPU usage, it catches our eyes. Given there are
> FinalReferences being processed here, it could be that the
> FinaliReference could be invoking some operation that requires a
> system call. It may be possible that the application has finalizers
> that are invoking something that results in a large number of system
> calls being executed. The second GC entry also has FinalReferences
> being processed, and a high sys CPU time.
> 
> So here are few questions and comments.
> 
> 1.)  How many hardware threads (i.e. cores / hyper threads) are on
> the system where this app is running? And, are there other apps
> running on this same system? Depending on the number of hardware
> threads on your system, and other apps that are running, you may have
> too many ConcGCThreads and ParallelGCThreads configured.
> 2.) Assuming you are running Linux … whenever I see high sys time,
> the first thing that comes to mind is whether transparent huge pages
> are enabled. Can you check if transparent huge pages are enabled,
> (though you should have seen some high sys CPU time with CMS GC with
> transparent huge pages enabled)?
> 3.) The other thought that crosses my mind when I see high sys time
> is whether the system is paging to virtual memory. Can you monitor
> your system for paging to virtual memory?
> 4.) What version of JDK/JRE are you running? At least JDK 8u60 or
> later would be best, or ideally the very latest JDK 8 update release.

Agree with all these points.

> 
> hths,
> 
> charlie
> 
> > On May 1, 2016, at 2:50 PM, Artur Petrosyan <artspace1986 at gmail.com
> > > wrote:
> > 
> > Hi,
> > I'm currently working on changing cms to g1  on my application.
> > But how much harder I tried cms is performing good then g1.
> > But cms problem is that on long therm run, it gets higher full gc.
> > Can you please help me with that?
> > 
> > Below are g1 params and logs.
> > 
> > -XX:ConcGCThreads=16
> > -XX:G1HeapRegionSize=16777216
> > -XX:G1HeapWastePercent=20
> > -XX:InitialHeapSize=18253611008
> > -XX:InitiatingHeapOccupancyPercent=70
> > -XX:MaxGCPauseMillis=600
> > -XX:MaxHeapSize=21474836480
> > -XX:MetaspaceSize=314572800
> > -XX:ParallelGCThreads=16
> > -XX:+ParallelRefProcEnabled
> > -XX:StringTableSize=100003
> > -XX:ThreadStackSize=1024
> > -XX:+UseG1GC
> > -XX:+UseStringDeduplication
> > 
> > {Heap before GC invocations=824 (full 0):
> >  garbage-first heap   total 20971520K, used 18639510K
> > [0x00000002c0000000, 0x00000002c1002800, 0x00000007c0000000)
> >   region size 16384K, 554 young (9076736K), 8 survivors (131072K)
> >  Metaspace       used 120451K, capacity 123529K, committed 126848K,
> > reserved 1163264K
> >   class space    used 11748K, capacity 12384K, committed 12928K,
> > reserved 1048576K
> > 2016-04-27T21:32:04.448+0400: 9350.966: [GC pause (G1 Evacuation
> > Pause) (young)
> > Desired survivor size 587202560 bytes, new threshold 15 (max 15)
> > - age   1:   39555904 bytes,   39555904 total
> >  9350.969: [G1Ergonomics (CSet Construction) start choosing CSet,
> > _pending_cards: 160459, predicted base time: 151.45 ms, remaining
> > time: 448.55 ms, target pause time: 600.00 ms]
> >  9350.969: [G1Ergonomics (CSet Construction) add young regions to
> > CSet, eden: 546 regions, survivors: 8 regions, predicted young
> > region time: 302.05 ms]
> >  9350.969: [G1Ergonomics (CSet Construction) finish choosing CSet,
> > eden: 546 regions, survivors: 8 regions, old: 0 regions, predicted
> > pause time: 453.50 ms, target pause time: 600.00 ms]
> > 2016-04-27T21:32:08.721+0400: 9355.239: [SoftReference, 0 refs,
> > 0.5480516 secs]2016-04-27T21:32:09.269+0400: 9355.787:
> > [WeakReference, 3415 refs, 0.3998733 secs]2016-04-
> > 27T21:32:09.669+0400: 9356.187: [FinalReference, 4373 refs,
> > 1.3356211 secs]2016-04-27T21:32:11.005+0400: 9357.523:
> > [PhantomReference, 0 refs, 0 refs, 0.3307221 secs]2016-04-
> > 27T21:32:11.336+0400: 9357.854: [JNI Weak Reference, 0.0005460
> > secs], 7.6676890 secs]
> >    [Parallel Time: 3854.6 ms, GC Workers: 16]
> >       [GC Worker Start (ms): Min: 9350971.2, Avg: 9350991.3, Max:
> > 9351040.0, Diff: 68.8]
> >       [Ext Root Scanning (ms): Min: 0.4, Avg: 8.5, Max: 60.4, Diff:
> > 60.0, Sum: 136.5]
> >       [Update RS (ms): Min: 120.4, Avg: 582.6, Max: 1986.0, Diff:
> > 1865.7, Sum: 9322.2]
> >          [Processed Buffers: Min: 35, Avg: 70.0, Max: 109, Diff:
> > 74, Sum: 1120]
> >       [Scan RS (ms): Min: 111.1, Avg: 1516.1, Max: 1984.2, Diff:
> > 1873.2, Sum: 24257.4]
> >       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
> > 0.0, Sum: 0.2]
> >       [Object Copy (ms): Min: 1259.5, Avg: 1537.3, Max: 1668.0,
> > Diff: 408.6, Sum: 24596.5]
> >       [Termination (ms): Min: 0.1, Avg: 168.9, Max: 199.6, Diff:
> > 199.5, Sum: 2702.5]
> >          [Termination Attempts: Min: 1, Avg: 114.8, Max: 305, Diff:
> > 304, Sum: 1836]
> >       [GC Worker Other (ms): Min: 0.0, Avg: 0.3, Max: 1.1, Diff:
> > 1.1, Sum: 4.9]
> >       [GC Worker Total (ms): Min: 3764.4, Avg: 3813.8, Max: 3833.6,
> > Diff: 69.2, Sum: 61020.3]
> >       [GC Worker End (ms): Min: 9354804.3, Avg: 9354805.1, Max:
> > 9354810.1, Diff: 5.8]
> >    [Code Root Fixup: 1.6 ms]
> >    [Code Root Purge: 0.0 ms]
> >    [String Dedup Fixup: 40.0 ms, GC Workers: 16]
> >       [Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> > Sum: 0.0]
> >       [Table Fixup (ms): Min: 0.0, Avg: 13.7, Max: 39.5, Diff:
> > 39.5, Sum: 219.9]
> >    [Clear CT: 721.4 ms]

That one should be improved with jdk9.

> >    [Other: 3050.1 ms]
> >       [Choose CSet: 0.0 ms]
> >       [Ref Proc: 3027.0 ms]
> >       [Ref Enq: 1.1 ms]
> >       [Redirty Cards: 7.2 ms]
> >       [Humongous Register: 0.3 ms]
> >       [Humongous Reclaim: 0.3 ms]
> >       [Free CSet: 7.7 ms]
> >    [Eden: 8736.0M(8736.0M)->0.0B(464.0M) Survivors: 128.0M->560.0M
> > Heap: 17.8G(20.0G)->9881.6M(20.0G)]

You can try to bound eden size with -XX:G1MaxNewSizePercent=XX (where
XX is maybe something in the range of 30%).

It may be that the application does some kind of state transition
(suddenly high load) where lots of data might stay live for a short
time. Or something like report generation or any other activity that
temporarily increases the amount of live data.

> > Heap after GC invocations=825 (full 0):
> >  garbage-first heap   total 20971520K, used 10118751K
> > [0x00000002c0000000, 0x00000002c1002800, 0x00000007c0000000)
> >   region size 16384K, 35 young (573440K), 35 survivors (573440K)
> >  Metaspace       used 120451K, capacity 123529K, committed 126848K,
> > reserved 1163264K
> >   class space    used 11748K, capacity 12384K, committed 12928K,
> > reserved 1048576K
> > }
> >  [Times: user=36.61 sys=4.36, real=7.67 secs] 
> > ......
> > {Heap before GC invocations=121 (full 0):
> >  garbage-first heap   total 20856832K, used 15428264K
> > [0x00000002c0000000, 0x00000002c10027c8, 0x00000007c0000000)
> >   region size 16384K, 504 young (8257536K), 51 survivors (835584K)
> >  Metaspace       used 117938K, capacity 121865K, committed 121984K,
> > reserved 1159168K
> >   class space    used 11814K, capacity 12624K, committed 12672K,
> > reserved 1048576K
> > 2016-04-27T19:26:42.536+0400: 1829.054: [GC pause (G1 Evacuation
> > Pause) (young)
> > Desired survivor size 528482304 bytes, new threshold 5 (max 15)
> > - age   1:  172611848 bytes,  172611848 total
> > - age   2:   15368952 bytes,  187980800 total
> > - age   3:    7895944 bytes,  195876744 total
> > - age   4:  165247296 bytes,  361124040 total
> > - age   5:  273741576 bytes,  634865616 total
> > - age   6:    1922856 bytes,  636788472 total
> >  1829.055: [G1Ergonomics (CSet Construction) start choosing CSet,
> > _pending_cards: 241067, predicted base time: 211.38 ms, remaining
> > time: 388.62 ms, target pause time: 600.00 ms]
> >  1829.055: [G1Ergonomics (CSet Construction) add young regions to
> > CSet, eden: 453 regions, survivors: 51 regions, predicted young
> > region time: 396.50 ms]
> >  1829.055: [G1Ergonomics (CSet Construction) finish choosing CSet,
> > eden: 453 regions, survivors: 51 regions, old: 0 regions, predicted
> > pause time: 607.88 ms, target pause time: 600.00 ms]
> > 2016-04-27T19:26:47.643+0400: 1834.161: [SoftReference, 0 refs,
> > 0.0204859 secs]2016-04-27T19:26:47.664+0400: 1834.182:
> > [WeakReference, 812 refs, 0.0057011 secs]2016-04-
> > 27T19:26:47.670+0400: 1834.188: [FinalReference, 4132 refs,
> > 0.0334876 secs]2016-04-27T19:26:47.703+0400: 1834.221:
> > [PhantomReference, 0 refs, 4 refs, 0.0320119 secs]2016-04-
> > 27T19:26:47.736+0400: 1834.254: [JNI Weak Reference, 0.0010508
> > secs], 5.3061080 secs]
> >    [Parallel Time: 5070.7 ms, GC Workers: 16]
> >       [GC Worker Start (ms): Min: 1829056.4, Avg: 1829203.3, Max:
> > 1829323.5, Diff: 267.1]
> >       [Ext Root Scanning (ms): Min: 0.1, Avg: 115.8, Max: 518.5,
> > Diff: 518.4, Sum: 1853.3]
> >       [Update RS (ms): Min: 182.2, Avg: 496.7, Max: 616.5, Diff:
> > 434.2, Sum: 7947.7]
> >          [Processed Buffers: Min: 23, Avg: 76.4, Max: 157, Diff:
> > 134, Sum: 1222]
> >       [Scan RS (ms): Min: 0.1, Avg: 0.5, Max: 1.5, Diff: 1.4, Sum:
> > 8.4]
> >       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff:
> > 0.2, Sum: 0.5]
> >       [Object Copy (ms): Min: 3791.1, Avg: 4104.0, Max: 4554.5,
> > Diff: 763.4, Sum: 65664.7]
> >       [Termination (ms): Min: 0.0, Avg: 189.2, Max: 379.5, Diff:
> > 379.5, Sum: 3027.8]
> >          [Termination Attempts: Min: 1, Avg: 1.1, Max: 2, Diff: 1,
> > Sum: 18]
> >       [GC Worker Other (ms): Min: 0.1, Avg: 6.2, Max: 16.5, Diff:
> > 16.4, Sum: 99.1]
> >       [GC Worker Total (ms): Min: 4786.3, Avg: 4912.6, Max: 5069.7,
> > Diff: 283.4, Sum: 78601.4]
> >       [GC Worker End (ms): Min: 1834109.7, Avg: 1834115.9, Max:
> > 1834126.2, Diff: 16.5]
> >    [Code Root Fixup: 2.2 ms]
> >    [Code Root Purge: 0.0 ms]
> >    [String Dedup Fixup: 70.1 ms, GC Workers: 16]
> >       [Queue Fixup (ms): Min: 0.0, Avg: 0.2, Max: 2.4, Diff: 2.4,
> > Sum: 2.5]
> >       [Table Fixup (ms): Min: 0.0, Avg: 27.7, Max: 62.0, Diff:
> > 62.0, Sum: 442.9]
> >    [Clear CT: 13.7 ms]
> >    [Other: 149.4 ms]
> >       [Choose CSet: 0.0 ms]
> >       [Ref Proc: 125.8 ms]
> >       [Ref Enq: 1.5 ms]
> >       [Redirty Cards: 14.6 ms]
> >       [Humongous Register: 0.5 ms]
> >       [Humongous Reclaim: 0.2 ms]
> >       [Free CSet: 2.9 ms]
> >    [Eden: 7248.0M(7248.0M)->0.0B(464.0M) Survivors: 816.0M->544.0M
> > Heap: 14.7G(19.9G)->7818.2M(19.9G)]
> > Heap after GC invocations=122 (full 0):
> >  garbage-first heap   total 20856832K, used 8005821K
> > [0x00000002c0000000, 0x00000002c10027c8, 0x00000007c0000000)
> >   region size 16384K, 34 young (557056K), 34 survivors (557056K)
> >  Metaspace       used 117938K, capacity 121865K, committed 121984K,
> > reserved 1159168K
> >   class space    used 11814K, capacity 12624K, committed 12672K,
> > reserved 1048576K
> > }
> >  [Times: user=40.69 sys=3.47, real=5.31 secs]
> > 

Thanks,
  Thomas




More information about the hotspot-gc-dev mailing list