G1 GC consuming all CPU time
claes.redestad
claes.redestad at oracle.com
Tue Jun 3 23:28:49 UTC 2014
At least CPU load is down, suggesting its no longer a concurrency issue. One thing that comes to mind is that G1 emits costly write and read barriers that heavily penalize interpreted code, while JMH generally avoid that benchmarking trap. Try extracting the loop body in your test into a method to help the JIT along and see if that evens out the playing field?
/Claes.
-------- Originalmeddelande --------
Från: Peter Harvey <harvey at actenum.com>
Datum:04-06-2014 01:12 (GMT+01:00)
Till: Claes Redestad <claes.redestad at oracle.com>
Kopia: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Rubrik: Re: G1 GC consuming all CPU time
Here's my list of benchmarks from the previous email:
-XX:+UseG1GC -XX:ParallelGCThreads=1 took 82.393 seconds and CPU load was 0.1247
-XX:+UseG1GC -XX:ParallelGCThreads=4 took 19.829 seconds and CPU load was 0.5960
-XX:+UseG1GC -XX:ParallelGCThreads=8 took 14.815 seconds and CPU load was 0.9184
-XX:+UseConcMarkSweepGC took 0.322 seconds and CPU load was 0.1119 regardless of the setting of -XX:ParallelGCThreads
And here's using those new parameters with my original code:
-XX:+UseG1GC -XX:-G1UseAdaptiveConcRefinement -XX:G1ConcRefinementThreads=1 took 53.077 seconds and CPU load was 0.1237
I may be completely misunderstanding the implications of your JMH tests, but those parameters don't seem to improve overall application performance when using my original code.
It looks to me like the use of the G1 garbage collector is somehow also slowing down the application itself (but not necessarily your JMH test?). Not only are the GC threads tripping over each other, they seem to be tripping up the main application thread too.
Regards,
Peter.
On Tue, Jun 3, 2014 at 4:57 PM, Claes Redestad <claes.redestad at oracle.com> wrote:
Hi,
guessing it's due to the concurrent GC threads tripping over themselves: the microbenchmark is creating one, big linked structure that will occupy most of the old gen, and then you're doing intense pointer updates which will trigger scans and updates of remembered sets etc. I actually don't know half the details and am mostly just guessing. :-)
Converted your micro to a JMH micro to ease with experimenting[1] (hope you don't mind) then verified the regression reproduces:
Parallel:
java -jar target/microbenchmarks.jar -wi 3 -i 10 -f 1 .*G1GraphBench.*
~1625 ops/ms
G1:
java -XX:+UseG1GC -jar target/microbenchmarks.jar -wi 3 -i 10 -f 1 .*G1GraphBench.*
~12 ops/ms
Testing my hunch, let's try forcing the concurrent refinement to use only one thread:
java -XX:+UseG1GC -XX:-G1UseAdaptiveConcRefinement -XX:G1ConcRefinementThreads=1 -jar target/microbenchmarks.jar -wi 3 -i 10 -f 1 .*G1GraphBench.*
~1550 ops/ms
I guess we have a winner! I won't hazard to try and answer your questions about how this should be resolved - perhaps the adaptive policy can detect this corner case and scale down the number of refinement threads when they start interfering with each other, or something.
/Claes
[1]
package org.sample;
import org.openjdk.jmh.annotations.GenerateMicroBenchmark;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import java.util.Random;
@State(Scope.Thread)
public class G1GraphBench {
private static class Node {
private Node next;
private Node[] others = new Node[10];
}
Node head = buildGraph();
private static Node buildGraph() {
// Create a collection of Node objects
Node[] array = new Node[10000000];
for (int i = 0; i < array.length; i++) {
array[i] = new Node();
}
// Each Node refers to 10 other random Nodes
Random random = new Random(12);
for (int i = 0; i < array.length; i++) {
for (int j = 0; j < array[i].others.length; j++) {
int k = random.nextInt(array.length);
array[i].others[j] = array[k];
}
}
// The first Node serves as the head of a queue
return array[0];
}
@GenerateMicroBenchmark
public Node nodeBench() {
Node node = head;
head = node.next;
node.next = null;
// Add the other Nodes to the head of the queue
for (Node other : node.others) {
other.next = head;
head = other;
}
return head;
}
}
On 2014-06-03 23:43, Peter Harvey wrote:
Thanks for the response. Here are the additional logs.
0.094: [GC pause (young), 0.0347877 secs]
[Parallel Time: 34.1 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 94.2, Avg: 104.4, Max: 126.4,
Diff: 32.2]
[Ext Root Scanning (ms): Min: 0.0, Avg: 3.3, Max: 25.0,
Diff: 25.0, Sum: 26.6]
[Update RS (ms): Min: 0.0, Avg: 2.1, Max: 5.3, Diff: 5.3,
Sum: 16.7]
[Processed Buffers: Min: 0, Avg: 2.3, Max: 9, Diff: 9,
Sum: 18]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum:
0.0]
[Object Copy (ms): Min: 1.8, Avg: 18.3, Max: 29.9, Diff:
28.2, Sum: 146.4]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1,
Sum: 0.6]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 1.9, Avg: 23.8, Max: 34.1, Diff:
32.2, Sum: 190.4]
[GC Worker End (ms): Min: 128.2, Avg: 128.3, Max: 128.3,
Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Clear CT: 0.0 ms]
[Other: 0.6 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.3 ms]
[Ref Enq: 0.0 ms]
[Free CSet: 0.0 ms]
[Eden: 24.0M(24.0M)->0.0B(11.0M) Survivors: 0.0B->3072.0K Heap:
62.1M(254.0M)->62.2M(254.0M)]
[Times: user=0.09 sys=0.03, real=0.04 secs]
0.131: [GC pause (young), 0.0295093 secs]
[Parallel Time: 28.1 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 130.9, Avg: 135.5, Max: 158.7,
Diff: 27.8]
[Ext Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff:
0.4, Sum: 1.2]
[Update RS (ms): Min: 0.0, Avg: 11.4, Max: 27.5, Diff: 27.5,
Sum: 90.8]
[Processed Buffers: Min: 0, Avg: 23.8, Max: 42, Diff: 42,
Sum: 190]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum:
0.0]
[Object Copy (ms): Min: 0.0, Avg: 11.7, Max: 17.1, Diff:
17.1, Sum: 93.8]
[Termination (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3,
Sum: 1.7]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 0.2, Avg: 23.5, Max: 28.1, Diff:
27.8, Sum: 187.7]
[GC Worker End (ms): Min: 159.0, Avg: 159.0, Max: 159.0,
Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 1.3 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Free CSet: 0.0 ms]
[Eden: 11.0M(11.0M)->0.0B(23.0M) Survivors: 3072.0K->2048.0K
Heap: 73.2M(254.0M)->82.7M(508.0M)]
[Times: user=0.19 sys=0.00, real=0.03 secs]
0.166: [GC pause (young), 0.0385523 secs]
[Parallel Time: 35.9 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 166.4, Avg: 169.8, Max: 192.4,
Diff: 25.9]
[Ext Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff:
0.4, Sum: 1.9]
[Update RS (ms): Min: 0.0, Avg: 10.9, Max: 31.9, Diff: 31.9,
Sum: 87.2]
[Processed Buffers: Min: 0, Avg: 14.6, Max: 26, Diff: 26,
Sum: 117]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum:
0.1]
[Object Copy (ms): Min: 3.5, Avg: 21.4, Max: 27.0, Diff:
23.4, Sum: 171.1]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
Sum: 0.4]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 10.0, Avg: 32.6, Max: 35.9,
Diff: 25.9, Sum: 260.7]
[GC Worker End (ms): Min: 202.3, Avg: 202.4, Max: 202.4,
Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Clear CT: 0.0 ms]
[Other: 2.6 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Free CSet: 0.0 ms]
[Eden: 23.0M(23.0M)->0.0B(46.0M) Survivors: 2048.0K->4096.0K
Heap: 105.7M(508.0M)->110.1M(1016.0M)]
[Times: user=0.19 sys=0.00, real=0.04 secs]
0.222: [GC pause (young), 0.0558720 secs]
[Parallel Time: 53.0 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 222.0, Avg: 222.2, Max: 222.5,
Diff: 0.5]
[Ext Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff:
0.4, Sum: 1.5]
[Update RS (ms): Min: 7.7, Avg: 8.7, Max: 10.9, Diff: 3.2,
Sum: 69.4]
[Processed Buffers: Min: 7, Avg: 8.5, Max: 12, Diff: 5,
Sum: 68]
[Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 0.6, Diff: 0.6, Sum:
2.3]
[Object Copy (ms): Min: 41.7, Avg: 43.6, Max: 44.3, Diff:
2.7, Sum: 348.5]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.0]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
0.0, Sum: 0.2]
[GC Worker Total (ms): Min: 52.4, Avg: 52.7, Max: 52.9,
Diff: 0.5, Sum: 421.8]
[GC Worker End (ms): Min: 274.9, Avg: 274.9, Max: 274.9,
Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Clear CT: 0.0 ms]
[Other: 2.8 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Free CSet: 0.0 ms]
[Eden: 46.0M(46.0M)->0.0B(74.0M) Survivors: 4096.0K->7168.0K
Heap: 156.1M(1016.0M)->158.6M(1625.0M)]
[Times: user=0.48 sys=0.01, real=0.06 secs]
0.328: [GC pause (young), 0.0853794 secs]
[Parallel Time: 82.8 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 327.9, Avg: 330.8, Max: 351.1,
Diff: 23.2]
[Ext Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff:
0.3, Sum: 2.0]
[Update RS (ms): Min: 0.0, Avg: 5.5, Max: 8.3, Diff: 8.3,
Sum: 43.9]
[Processed Buffers: Min: 0, Avg: 2.3, Max: 3, Diff: 3,
Sum: 18]
[Scan RS (ms): Min: 0.0, Avg: 2.2, Max: 3.3, Diff: 3.3, Sum:
17.4]
[Object Copy (ms): Min: 59.5, Avg: 71.8, Max: 73.7, Diff:
14.2, Sum: 574.7]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.2]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
0.0, Sum: 0.2]
[GC Worker Total (ms): Min: 59.5, Avg: 79.8, Max: 82.7,
Diff: 23.2, Sum: 638.4]
[GC Worker End (ms): Min: 410.6, Avg: 410.7, Max: 410.7,
Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 2.6 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 74.0M(74.0M)->0.0B(94.0M) Survivors: 7168.0K->11.0M
Heap: 232.6M(1625.0M)->237.6M(2112.0M)]
[Times: user=0.59 sys=0.00, real=0.09 secs]
0.447: [GC pause (young), 0.1239103 secs]
[Parallel Time: 121.5 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 447.5, Avg: 447.7, Max: 448.5,
Diff: 0.9]
[Ext Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff:
0.3, Sum: 1.9]
[Update RS (ms): Min: 26.5, Avg: 28.2, Max: 28.7, Diff: 2.2,
Sum: 225.7]
[Processed Buffers: Min: 38, Avg: 39.8, Max: 44, Diff: 6,
Sum: 318]
[Scan RS (ms): Min: 0.3, Avg: 0.7, Max: 1.9, Diff: 1.6, Sum:
5.3]
[Object Copy (ms): Min: 92.1, Avg: 92.2, Max: 92.3, Diff:
0.2, Sum: 737.5]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
Sum: 0.3]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
0.0, Sum: 0.2]
[GC Worker Total (ms): Min: 120.6, Avg: 121.4, Max: 121.5,
Diff: 0.9, Sum: 970.8]
[GC Worker End (ms): Min: 569.0, Avg: 569.0, Max: 569.0,
Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 2.3 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 94.0M(94.0M)->0.0B(111.0M) Survivors: 11.0M->14.0M Heap:
331.6M(2112.0M)->334.6M(2502.0M)]
[Times: user=0.80 sys=0.05, real=0.12 secs]
0.599: [GC pause (young), 0.1479438 secs]
[Parallel Time: 145.7 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 599.4, Avg: 599.5, Max: 599.8,
Diff: 0.4]
[Ext Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff:
0.3, Sum: 1.9]
[Update RS (ms): Min: 41.8, Avg: 43.0, Max: 44.0, Diff: 2.1,
Sum: 343.6]
[Processed Buffers: Min: 67, Avg: 70.9, Max: 73, Diff: 6,
Sum: 567]
[Scan RS (ms): Min: 0.0, Avg: 0.8, Max: 1.9, Diff: 1.9, Sum:
6.2]
[Object Copy (ms): Min: 101.3, Avg: 101.6, Max: 101.7, Diff:
0.3, Sum: 812.6]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.1]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
0.0, Sum: 0.2]
[GC Worker Total (ms): Min: 145.2, Avg: 145.6, Max: 145.6,
Diff: 0.4, Sum: 1164.6]
[GC Worker End (ms): Min: 745.1, Avg: 745.1, Max: 745.1,
Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 2.2 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 111.0M(111.0M)->0.0B(124.0M) Survivors: 14.0M->16.0M
Heap: 445.6M(2502.0M)->448.6M(2814.0M)]
[Times: user=1.20 sys=0.05, real=0.15 secs]
0.787: [GC pause (young), 0.1625321 secs]
[Parallel Time: 160.0 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 786.6, Avg: 786.7, Max: 786.9,
Diff: 0.4]
[Ext Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff:
0.3, Sum: 1.8]
[Update RS (ms): Min: 46.4, Avg: 47.0, Max: 49.0, Diff: 2.5,
Sum: 376.0]
[Processed Buffers: Min: 75, Avg: 78.0, Max: 79, Diff: 4,
Sum: 624]
[Scan RS (ms): Min: 0.0, Avg: 0.9, Max: 1.5, Diff: 1.5, Sum:
7.4]
[Object Copy (ms): Min: 110.6, Avg: 111.7, Max: 112.0, Diff:
1.4, Sum: 893.5]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.1]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff:
0.1, Sum: 0.3]
[GC Worker Total (ms): Min: 159.6, Avg: 159.9, Max: 160.0,
Diff: 0.4, Sum: 1279.0]
[GC Worker End (ms): Min: 946.5, Avg: 946.5, Max: 946.6,
Diff: 0.1]
[Code Root Fixup: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 2.4 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Free CSet: 0.2 ms]
[Eden: 124.0M(124.0M)->0.0B(135.0M) Survivors: 16.0M->18.0M
Heap: 572.6M(2814.0M)->576.6M(3064.0M)]
[Times: user=1.37 sys=0.00, real=0.16 secs]
0.981: [GC pause (young), 0.2063055 secs]
[Parallel Time: 204.1 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 980.8, Avg: 980.9, Max: 981.0,
Diff: 0.2]
[Ext Root Scanning (ms): Min: 0.1, Avg: 0.3, Max: 0.3, Diff:
0.2, Sum: 2.1]
[Update RS (ms): Min: 55.9, Avg: 57.8, Max: 58.8, Diff: 2.9,
Sum: 462.8]
[Processed Buffers: Min: 100, Avg: 101.5, Max: 103, Diff:
3, Sum: 812]
[Scan RS (ms): Min: 0.0, Avg: 1.0, Max: 3.1, Diff: 3.1, Sum:
8.3]
[Object Copy (ms): Min: 144.7, Avg: 144.8, Max: 144.9, Diff:
0.1, Sum: 1158.3]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.3]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
0.0, Sum: 0.2]
[GC Worker Total (ms): Min: 203.8, Avg: 204.0, Max: 204.0,
Diff: 0.2, Sum: 1631.9]
[GC Worker End (ms): Min: 1184.9, Avg: 1184.9, Max: 1184.9,
Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 2.1 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 135.0M(135.0M)->0.0B(143.0M) Survivors: 18.0M->20.0M
Heap: 711.6M(3064.0M)->714.6M(3264.0M)]
[Times: user=1.40 sys=0.11, real=0.21 secs]
CPU Load Is -1.0
Start
Stop
Sleep
CPU Load Is 0.9166222455142531
Start
Stop
Sleep
CPU Load Is 0.907013989900451
Start
Stop
Sleep
CPU Load Is 0.9085635227776081
Start
Stop
Sleep
CPU Load Is 0.909945506396622
Note that all the logged GC occurs during the construction of my graph of Nodes, which is /before/ my algorithm (modifyGraph) starts, There is no log of GC activity once the algorithm starts, but there is significant (100%) CPU usage.
My questions are:
* Why is the G1 garbage collector consuming so much CPU time? What
is it doing?
* Why is the G1 garbage collector not logging anything? The only
reason I even know it's the garbage collector consuming my CPU
time is that (a) I only see this behaviour when the G1 collector
is enabled and (b) the load on the CPU correlates with the value
of -XX:ParallelGCThreads.
* Are there particular object-graph structures that the G1 garbage
collector will struggle with? Should complex graphs be considered
bad coding practice?
* How can I write my code to avoid this behaviour in the G1 garbage
collector? For example, if all my Nodes are in an array, will this
fix it?
* Should this be considered a bug in the G1 garbage collector? This
is far beyond 'a small increase in CPU usage'.
Just to demonstrate the issue further, I timed my calls to modifyGraph() and trialled different GC parameters:
* -XX:+UseG1GC -XX:ParallelGCThreads=1 took 82.393 seconds and CPU
load was 0.1247
* -XX:+UseG1GC -XX:ParallelGCThreads=4 took 19.829 seconds and CPU
load was 0.5960
* -XX:+UseG1GC -XX:ParallelGCThreads=8 took 14.815 seconds and CPU
load was 0.9184
* -XX:+UseConcMarkSweepGC took 0.322 seconds and CPU load was 0.1119
regardless of the setting of -XX:ParallelGCThreads
So using the CMS GC made my application 44x faster (14.815 seconds versus 0.322 seconds) and placed 1/8th of the load (0.9184 versus 0.1119) on the CPU.
If my code represents some kind of hypothetical worst case for the G1 garbage collector, I think it should be documented and/or fixed somehow.
Regards,
Peter.
On Tue, Jun 3, 2014 at 3:16 PM, Tao Mao <yiyeguhu at gmail.com <mailto:yiyeguhu at gmail.com>> wrote:
And, use –XX:+PrintGCDetails –XX:+PrintGCTimeStamps to get more
log. Thanks. -Tao
On Tue, Jun 3, 2014 at 2:13 PM, Tao Mao <yiyeguhu at gmail.com
<mailto:yiyeguhu at gmail.com>> wrote:
Hi Peter,
What was your actual question?
Try -XX:ParallelGCThreads=<value> if you want less CPU usage
from GC.
Thanks.
Tao
On Tue, Jun 3, 2014 at 11:49 AM, Peter Harvey
<harvey at actenum.com <mailto:harvey at actenum.com>> wrote:
Small correction. The last example of output was with
"-XX:+UseConcMarkSweepGC -verbose:gc".
On Tue, Jun 3, 2014 at 12:41 PM, Peter Harvey
<harvey at actenum.com <mailto:harvey at actenum.com>> wrote:
I have an algorithm (at bottom of email) which builds
a graph of 'Node' objects with random connections
between them. It then repeatedly processes a queue of
those Nodes, adding new Nodes to the queue as it goes.
This is a single-threaded algorithm that will never
terminate. Our actual production code is much more
complex, but I've trimmed it down as much as possible.
On Windows 7 with JRE 7u60, enabling the G1 garbage
collector will cause the JRE to consume all 8 cores of
my CPU. No other garbage collector does this. You can
see the differences in CPU load in the example output
below. It's also worth nothing that "-verbose:gc" with
the G1 garbage collector prints nothing after my
algorithm starts. Presumably the G1 garbage collector
is doing something (concurrent mark?), but it's not
printing anything about it.
When run with VM args "-XX:+UseG1GC -verbose:gc" I get
output like this (note the huge CPU load value which
should not be this high for a single-threaded
algorithm on an 8 core CPU):
[GC pause (young) 62M->62M(254M), 0.0394214 secs]
[GC pause (young) 73M->83M(508M), 0.0302781 secs]
[GC pause (young) 106M->111M(1016M), 0.0442273 secs]
[GC pause (young) 157M->161M(1625M), 0.0660902 secs]
[GC pause (young) 235M->240M(2112M), 0.0907231 secs]
[GC pause (young) 334M->337M(2502M), 0.1356917 secs]
[GC pause (young) 448M->450M(2814M), 0.1219090 secs]
[GC pause (young) 574M->577M(3064M), 0.1778062 secs]
[GC pause (young) 712M->715M(3264M), 0.1878443 secs]
CPU Load Is -1.0
Start
Stop
Sleep
CPU Load Is 0.9196154547182949
Start
Stop
Sleep
CPU Load Is 0.9150735995043818
...
When run with VM args "-XX:+UseParallelGC -verbose:gc"
I get output like this:
[GC 65536K->64198K(249344K), 0.0628289 secs]
[GC 129734K->127974K(314880K), 0.1583369 secs]
[Full GC 127974K->127630K(451072K), 0.9675224 secs]
[GC 258702K->259102K(451072K), 0.3543645 secs]
[Full GC 259102K->258701K(732672K), 1.8085702 secs]
[GC 389773K->390181K(790528K), 0.3332060 secs]
[GC 579109K->579717K(803328K), 0.5126388 secs]
[Full GC 579717K->578698K(1300480K), 4.0647303 secs]
[GC 780426K->780842K(1567232K), 0.4364933 secs]
CPU Load Is -1.0
Start
Stop
Sleep
CPU Load Is 0.03137771539054431
Start
Stop
Sleep
CPU Load Is 0.032351299224373145
...
When run with VM args "-verbose:gc" I get output like
this:
[GC 69312K->67824K(251136K), 0.1533803 secs]
[GC 137136K->135015K(251136K), 0.0970460 secs]
[GC 137245K(251136K), 0.0095245 secs]
[GC 204327K->204326K(274368K), 0.1056259 secs]
[GC 273638K->273636K(343680K), 0.1081515 secs]
[GC 342948K->342946K(412992K), 0.1181966 secs]
[GC 412258K->412257K(482304K), 0.1126966 secs]
[GC 481569K->481568K(551808K), 0.1156015 secs]
[GC 550880K->550878K(620928K), 0.1184089 secs]
[GC 620190K->620189K(690048K), 0.1209312 secs]
[GC 689501K->689499K(759552K), 0.1199338 secs]
[GC 758811K->758809K(828864K), 0.1162532 secs]
CPU Load Is -1.0
Start
Stop
Sleep
CPU Load Is 0.10791719146608299
Start
[GC 821213K(828864K), 0.1966807 secs]
Stop
Sleep
CPU Load Is 0.1540065314146181
Start
Stop
Sleep
[GC 821213K(1328240K), 0.1962688 secs]
CPU Load Is 0.08427292195744103
...
Why is the G1 garbage collector consuming so much CPU
time? Is it stuck in the mark phase as I am modifying
the graph structure?
I'm not a subscriber to the list, so please CC me in
any response.
Thanks,
Peter.
--
import java.lang.management.ManagementFactory;
import com.sun.management.OperatingSystemMXBean;
import java.util.Random;
@SuppressWarnings("restriction")
public class Node {
private static OperatingSystemMXBean os =
(OperatingSystemMXBean)
ManagementFactory.getOperatingSystemMXBean();
private Node next;
private Node[] others = new Node[10];
public static void main(String[] args) throws
InterruptedException {
// Build a graph of Nodes
Node head = buildGraph();
while (true) {
// Print CPU load for this process
System.out.println("CPU Load Is " +
os.getProcessCpuLoad());
System.out.println();
// Modify the graph
System.out.println("Start");
head = modifyGraph(head);
System.out.println("Stop");
// Sleep, as otherwise we tend to DoS the host computer...
System.out.println("Sleep");
Thread.sleep(1000);
}
}
private static Node buildGraph() {
// Create a collection of Node objects
Node[] array = new Node[10000000];
for (int i = 0; i < array.length; i++) {
array[i] = new Node();
}
// Each Node refers to 10 other random Nodes
Random random = new Random(12);
for (int i = 0; i < array.length; i++) {
for (int j = 0; j < array[i].others.length; j++) {
int k = random.nextInt(array.length);
array[i].others[j] = array[k];
}
}
// The first Node serves as the head of a queue
return array[0];
}
private static Node modifyGraph(Node head) {
// Perform a million iterations
for (int i = 0; i < 1000000; i++) {
// Pop a Node off the head of the queue
Node node = head;
head = node.next;
node.next = null;
// Add the other Nodes to the head of the queue
for (Node other : node.others) {
other.next = head;
head = other;
}
}
return head;
}
}
-- *Actenum Corporation*
Peter Harvey | Cell: 780.729.8192 <tel:780.729.8192>
| harvey at actenum.com <mailto:harvey at actenum.com> |
www.actenum.com <http://www.actenum.com>
-- *Actenum Corporation*
Peter Harvey | Cell: 780.729.8192 <tel:780.729.8192> |
harvey at actenum.com <mailto:harvey at actenum.com> |
www.actenum.com <http://www.actenum.com>
--
*Actenum Corporation*
Peter Harvey | Cell: 780.729.8192 | harvey at actenum.com <mailto:harvey at actenum.com> | www.actenum.com <http://www.actenum.com>
--
Actenum Corporation
Peter Harvey | Cell: 780.729.8192 | harvey at actenum.com | www.actenum.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20140604/16ea063a/attachment.htm>
More information about the hotspot-gc-dev
mailing list