G1 GC clean up time is too long

yao yaoshengzhe at gmail.com
Thu Dec 19 14:16:06 PST 2013


 Hi All,

We have a real time application build on top of HBase (a distributed big
data store), our original CMS GC works great most of time with only 0 - 2
Full GCs per node everyday. This is good performance considering our
read/write traffic. Now, it's java 7 and we want to see the power of G1 GC.
We tried G1 GC on two of our production but did not see a big improvement
over CMS and we would like to hear feedbacks from you if there are any G1
GC tuning chances.

Here is our observation: G1 GC cleanup time is around 0.75 secs and there
are also some mix pauses taking more than 5 secs when application is in a
stable state*. *Due to these high STW pauses, we believe it is the reason
why G1 doesn't do better job than CMS (their performance is pretty close).



*Here is the java 7 version we use on our G1 experiment machines:*
java version "1.7.0_40"
         Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
         Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)

and *G1 parameters*:

         -XX:+UnlockExperimentalVMOptions
         -XX:+UseG1GC
         -Xmx67714m
         -Xms67714m
         -server
         -XX:MaxGCPauseMillis=150
         -XX:G1HeapRegionSize=32m
         -XX:InitiatingHeapOccupancyPercent=30
         -XX:G1ReservePercent=20

Explanations for some G1 params we set

*   G1HeapRegionSize=32m*
           We have some map/reduce jobs reading from HBase everyday and
HBase will send data in a batch fashion to mapper tasks; therefore, a
larger temporary object (up to 15MB) will be created during that time and
that's why we set region size to 32MB to avoid humongous allocation.

   *InitiatingHeapOccupancyPercent=30 and  *
*G1ReservePercent=20*
           We've observed a frequent Full GC (every one hour) before
explicitly setting these two parameters. After reading from various GC
tuning posts, we decided to reduce
InitiatingHeapOccupancyPercent to 30 and increase G1ReservePercent to 20,
this reduce full GC to about 3~5 times per day per node.

Given above background, we are still keep working on G1 GC tuning and hope
to beat CMS on our application. For now, G1 is no worse than CMS on our
production machines except full GC happens a little more on G1 machines.

 Here is the Pause time (STW time, including: initial mark, young, mix and
clean up) graph, *x-axis*: gc start time, *y-axis*: pause time in
milliseconds.

Pauses larger than 2000 milliseconds are due to mix pause and some spikes
are due to GC cleanup time.
[image: Inline image 1]



CMS vs G1, Red Curve is CMS and Green Curve is G1,
*x-axis*: gc start time, *y-axis*: read latency 99 percentile in
milliseconds

[image: Inline image 1]


A close look, CMS vs G1, Red Curve is CMS and Green Curve is G1,
*x-axis*: gc start time, *y-axis*: read latency 99 percentile in
milliseconds

[image: Inline image 2]
*Typical G1 long clean up log*

2013-12-19T14:37:48.949-0500: 926630.478: [GC pause (young) (initial-mark)
Desired survivor size 268435456 bytes, new threshold 3 (max 15)
- age   1:   95847832 bytes,   95847832 total
- age   2:  101567360 bytes,  197415192 total
- age   3:  102072520 bytes,  299487712 total
 926630.478: [G1Ergonomics (CSet Construction) start choosing CSet,
_pending_cards: 32605, predicted base time: 49.35 ms, remaining time:
100.65 ms, target pause time: 150\
.00 ms]
 926630.478: [G1Ergonomics (CSet Construction) add young regions to CSet,
eden: 107 regions, survivors: 14 regions, predicted young region time:
47.90 ms]
 926630.479: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
107 regions, survivors: 14 regions, old: 0 regions, predicted pause time:
97.25 ms, target pause\
 time: 150.00 ms]
, 0.1023900 secs]
   [Parallel Time: 74.1 ms, GC Workers: 18]
      [GC Worker Start (ms): Min: 926630478.9, Avg: 926630479.3, Max:
926630479.6, Diff: 0.7]
      [Ext Root Scanning (ms): Min: 4.4, Avg: 4.9, Max: 6.5, Diff: 2.2,
Sum: 88.3]
      [Update RS (ms): Min: 10.6, Avg: 12.3, Max: 18.7, Diff: 8.1, Sum:
221.4]
         [Processed Buffers: Min: 10, Avg: 18.7, Max: 25, Diff: 15, Sum:
337]
      [Scan RS (ms): Min: 5.8, Avg: 12.1, Max: 12.9, Diff: 7.1, Sum: 218.2]
      [Object Copy (ms): Min: 43.0, Avg: 43.4, Max: 43.7, Diff: 0.8, Sum:
781.3]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.5]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
0.8]
      [GC Worker Total (ms): Min: 72.4, Avg: 72.8, Max: 73.1, Diff: 0.7,
Sum: 1310.4]
      [GC Worker End (ms): Min: 926630552.0, Avg: 926630552.1, Max:
926630552.1, Diff: 0.1]
   [Code Root Fixup: 0.0 ms]
   [Clear CT: 11.1 ms]
   [Other: 17.2 ms]
      [Choose CSet: 0.1 ms]
      [Ref Proc: 7.9 ms]
      [Ref Enq: 1.8 ms]
      [Free CSet: 0.6 ms]
   [Eden: 3424.0M(3424.0M)->0.0B(3328.0M) Survivors: 448.0M->416.0M Heap:
51.9G(66.2G)->48.7G(66.2G)]
 [Times: user=1.46 sys=0.00, real=0.10 secs]
2013-12-19T14:37:49.051-0500: 926630.581: [GC
concurrent-root-region-scan-start]
2013-12-19T14:37:49.064-0500: 926630.593: [GC
concurrent-root-region-scan-end, 0.0126390 secs]
2013-12-19T14:37:49.064-0500: 926630.593: [GC concurrent-mark-start]
2013-12-19T14:37:55.287-0500: 926636.816: [GC concurrent-mark-end,
6.2228410 secs]
2013-12-19T14:37:55.291-0500: 926636.821: [GC remark
2013-12-19T14:37:55.295-0500: 926636.824: [GC ref-proc, 0.0086540 secs],
0.0767770 secs]
 [Times: user=1.09 sys=0.00, real=0.08 secs]
2013-12-19T14:37:55.375-0500: 926636.905: [*GC cleanup 50G->50G(66G),
0.7325340 secs*]
 [Times: user=12.98 sys=0.00, real=0.73 secs]


*Typical G1 Full GC log*2013-12-19T14:23:13.558-0500: 925755.088: [GC pause
(mixed)
Desired survivor size 234881024 bytes, new threshold 4 (max 15)
- age   1:   85160928 bytes,   85160928 total
- age   2:   71237536 bytes,  156398464 total
- age   3:   70904176 bytes,  227302640 total
- age   4:   78106288 bytes,  305408928 total
 925755.088: [G1Ergonomics (CSet Construction) start choosing CSet,
_pending_cards: 43149, predicted base time: 49.67 ms, remaining time:
100.33 ms, target pause time: 150.00 ms]
 925755.088: [G1Ergonomics (CSet Construction) add young regions to CSet,
eden: 92 regions, survivors: 13 regions, predicted young region time: 35.76
ms]
 925755.094: [G1Ergonomics (CSet Construction) finish adding old regions to
CSet, reason: predicted time is too high, predicted time: 4.02 ms,
remaining time: 0.00 ms, old: 89 regions, min: 89 regions]
 925755.094: [G1Ergonomics (CSet Construction) added expensive regions to
CSet, reason: old CSet region num not reached min, old: 89 regions,
expensive: 64 regions, min: 89 regions, remaining time: 0.00 ms]
 925755.094: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
92 regions, survivors: 13 regions, old: 89 regions, predicted pause time:
418.65 ms, target pause time: 150.00 ms]
925755.459: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
region allocation request failed, allocation request: 4007128 bytes]
 925755.459: [G1Ergonomics (Heap Sizing) expand the heap, requested
expansion amount: 4007128 bytes, attempted expansion amount: 33554432 bytes]
 925755.459: [G1Ergonomics (Heap Sizing) did not expand the heap, reason:
heap expansion operation failed]
 925756.116: [G1Ergonomics (Concurrent Cycles) do not request concurrent
cycle initiation, reason: still doing mixed collections, occupancy:
70598524928 bytes, allocation request: 0 bytes, threshold: 21310419750
bytes (30.00 %), source: end of GC]
 925756.116: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason:
candidate old regions available, candidate old regions: 617 regions,
reclaimable: 9897360192 bytes (13.93 %), threshold: 10.00 %]
* (to-space exhausted), 1.0285640 secs]*
   [Parallel Time: 821.1 ms, GC Workers: 18]
      [GC Worker Start (ms): Min: 925755094.0, Avg: 925755094.4, Max:
925755094.7, Diff: 0.7]
      [Ext Root Scanning (ms): Min: 4.6, Avg: 5.1, Max: 6.5, Diff: 1.8,
Sum: 92.7]
      [Update RS (ms): Min: 12.4, Avg: 15.6, Max: 27.8, Diff: 15.3, Sum:
280.1]
         [Processed Buffers: Min: 2, Avg: 22.6, Max: 32, Diff: 30, Sum: 406]
      [Scan RS (ms): Min: 168.6, Avg: 180.7, Max: 184.4, Diff: 15.8, Sum:
3252.4]
      [Object Copy (ms): Min: 616.5, Avg: 618.1, Max: 618.9, Diff: 2.5,
Sum: 11125.9]
      [Termination (ms): Min: 0.0, Avg: 0.3, Max: 0.6, Diff: 0.5, Sum: 5.3]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
0.5]
      [GC Worker Total (ms): Min: 819.5, Avg: 819.8, Max: 820.2, Diff: 0.7,
Sum: 14756.9]
      [GC Worker End (ms): Min: 925755914.2, Avg: 925755914.2, Max:
925755914.2, Diff: 0.1]
   [Code Root Fixup: 0.0 ms]
   [Clear CT: 13.5 ms]
   [Other: 194.0 ms]
      [Choose CSet: 5.6 ms]
      [Ref Proc: 16.9 ms]
      [Ref Enq: 0.9 ms]
      [Free CSet: 4.3 ms]
  * [Eden: 2944.0M(2944.0M)->0.0B(2944.0M) Survivors: 416.0M->416.0M Heap:
64.3G(66.2G)->65.9G(66.2G)]*
* [Times: user=8.96 sys=0.21, real=1.03 secs]*
 925756.121: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
allocation request failed, allocation request: 17960 bytes]
 925756.121: [G1Ergonomics (Heap Sizing) expand the heap, requested
expansion amount: 33554432 bytes, attempted expansion amount: 33554432
bytes]
 925756.121: [G1Ergonomics (Heap Sizing) did not expand the heap, reason:
heap expansion operation failed]
*2013-12-19T14:23:14.592-0500: 925756.121: [Full GC 65G->43G(66G),
45.3188580 secs]*
   [Eden: 0.0B(2944.0M)->0.0B(9248.0M) Survivors: 416.0M->0.0B Heap:
65.9G(66.2G)->43.1G(66.2G)]
 [Times: user=88.57 sys=0.81, real=45.31 secs]

Thanks
Shengzhe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131219/d4a010d0/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 26874 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131219/d4a010d0/attachment-0003.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 16958 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131219/d4a010d0/attachment-0004.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 13653 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131219/d4a010d0/attachment-0005.png 


More information about the hotspot-gc-use mailing list