G1 GC consuming all CPU time

claes.redestad claes.redestad at oracle.com
Tue Jun 3 23:28:49 UTC 2014


At least CPU load is down, suggesting its no longer a concurrency issue. One thing that comes to mind is that G1 emits costly write and read barriers that heavily penalize interpreted code, while JMH generally avoid that benchmarking trap. Try extracting the loop body in your test into a method to help the JIT along and see if that evens out the playing field?

/Claes.


-------- Originalmeddelande --------
Från: Peter Harvey <harvey at actenum.com> 
Datum:04-06-2014  01:12  (GMT+01:00) 
Till: Claes Redestad <claes.redestad at oracle.com> 
Kopia: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net> 
Rubrik: Re: G1 GC consuming all CPU time 

Here's my list of benchmarks from the previous email:
-XX:+UseG1GC -XX:ParallelGCThreads=1 took 82.393 seconds and CPU load was 0.1247
-XX:+UseG1GC -XX:ParallelGCThreads=4 took 19.829 seconds and CPU load was 0.5960
-XX:+UseG1GC -XX:ParallelGCThreads=8 took 14.815 seconds and CPU load was 0.9184
-XX:+UseConcMarkSweepGC took 0.322 seconds and CPU load was 0.1119 regardless of the setting of -XX:ParallelGCThreads
And here's using those new parameters with my original code:
-XX:+UseG1GC -XX:-G1UseAdaptiveConcRefinement -XX:G1ConcRefinementThreads=1 took 53.077 seconds and CPU load was 0.1237
I may be completely misunderstanding the implications of your JMH tests, but those parameters don't seem to improve overall application performance when using my original code.

It looks to me like the use of the G1 garbage collector is somehow also slowing down the application itself (but not necessarily your JMH test?). Not only are the GC threads tripping over each other, they seem to be tripping up the main application thread too.

Regards,
Peter.




On Tue, Jun 3, 2014 at 4:57 PM, Claes Redestad <claes.redestad at oracle.com> wrote:
Hi,

 guessing it's due to the concurrent GC threads tripping over themselves: the microbenchmark is creating one, big linked structure that will occupy most of the old gen, and then you're doing intense pointer updates which will trigger scans and updates of remembered sets etc. I actually don't know half the details and am mostly just guessing. :-)

 Converted your micro to a JMH micro to ease with experimenting[1] (hope you don't mind) then verified the regression reproduces:

 Parallel:
 java -jar target/microbenchmarks.jar -wi 3 -i 10 -f 1 .*G1GraphBench.*
 ~1625 ops/ms

 G1:
 java -XX:+UseG1GC -jar target/microbenchmarks.jar -wi 3 -i 10 -f 1 .*G1GraphBench.*
 ~12 ops/ms

 Testing my hunch, let's try forcing the concurrent refinement to use only one thread:

 java -XX:+UseG1GC -XX:-G1UseAdaptiveConcRefinement -XX:G1ConcRefinementThreads=1 -jar target/microbenchmarks.jar -wi 3 -i 10 -f 1 .*G1GraphBench.*
~1550 ops/ms

 I guess we have a winner! I won't hazard to try and answer your questions about how this should be resolved - perhaps the adaptive policy can detect this corner case and scale down the number of refinement threads when they start interfering with each other, or something.

 /Claes

[1]

package org.sample;

import org.openjdk.jmh.annotations.GenerateMicroBenchmark;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;

import java.util.Random;

@State(Scope.Thread)
public class G1GraphBench {

    private static class Node {

        private Node next;
        private Node[] others = new Node[10];
    }

    Node head = buildGraph();


    private static Node buildGraph() {

        // Create a collection of Node objects
        Node[] array = new Node[10000000];
        for (int i = 0; i < array.length; i++) {
            array[i] = new Node();
        }

        // Each Node refers to 10 other random Nodes
        Random random = new Random(12);
        for (int i = 0; i < array.length; i++) {
            for (int j = 0; j < array[i].others.length; j++) {
                int k = random.nextInt(array.length);
                array[i].others[j] = array[k];
            }
        }

        // The first Node serves as the head of a queue
        return array[0];
    }

    @GenerateMicroBenchmark
    public Node nodeBench() {

        Node node = head;
        head = node.next;
        node.next = null;

        // Add the other Nodes to the head of the queue
        for (Node other : node.others) {
            other.next = head;
            head = other;
        }
        return head;
    }

}

On 2014-06-03 23:43, Peter Harvey wrote:
Thanks for the response. Here are the additional logs.

    0.094: [GC pause (young), 0.0347877 secs]
       [Parallel Time: 34.1 ms, GC Workers: 8]
          [GC Worker Start (ms): Min: 94.2, Avg: 104.4, Max: 126.4,
    Diff: 32.2]
          [Ext Root Scanning (ms): Min: 0.0, Avg: 3.3, Max: 25.0,
    Diff: 25.0, Sum: 26.6]
          [Update RS (ms): Min: 0.0, Avg: 2.1, Max: 5.3, Diff: 5.3,
    Sum: 16.7]
             [Processed Buffers: Min: 0, Avg: 2.3, Max: 9, Diff: 9,
    Sum: 18]
          [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum:
    0.0]
          [Object Copy (ms): Min: 1.8, Avg: 18.3, Max: 29.9, Diff:
    28.2, Sum: 146.4]
          [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1,
    Sum: 0.6]
          [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
    0.0, Sum: 0.1]
          [GC Worker Total (ms): Min: 1.9, Avg: 23.8, Max: 34.1, Diff:
    32.2, Sum: 190.4]
          [GC Worker End (ms): Min: 128.2, Avg: 128.3, Max: 128.3,
    Diff: 0.0]
       [Code Root Fixup: 0.0 ms]
       [Clear CT: 0.0 ms]
       [Other: 0.6 ms]
          [Choose CSet: 0.0 ms]
          [Ref Proc: 0.3 ms]
          [Ref Enq: 0.0 ms]
          [Free CSet: 0.0 ms]
       [Eden: 24.0M(24.0M)->0.0B(11.0M) Survivors: 0.0B->3072.0K Heap:
    62.1M(254.0M)->62.2M(254.0M)]
     [Times: user=0.09 sys=0.03, real=0.04 secs]
    0.131: [GC pause (young), 0.0295093 secs]
       [Parallel Time: 28.1 ms, GC Workers: 8]
          [GC Worker Start (ms): Min: 130.9, Avg: 135.5, Max: 158.7,
    Diff: 27.8]
          [Ext Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff:
    0.4, Sum: 1.2]
          [Update RS (ms): Min: 0.0, Avg: 11.4, Max: 27.5, Diff: 27.5,
    Sum: 90.8]
             [Processed Buffers: Min: 0, Avg: 23.8, Max: 42, Diff: 42,
    Sum: 190]
          [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum:
    0.0]
          [Object Copy (ms): Min: 0.0, Avg: 11.7, Max: 17.1, Diff:
    17.1, Sum: 93.8]
          [Termination (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3,
    Sum: 1.7]
          [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
    0.0, Sum: 0.1]
          [GC Worker Total (ms): Min: 0.2, Avg: 23.5, Max: 28.1, Diff:
    27.8, Sum: 187.7]
          [GC Worker End (ms): Min: 159.0, Avg: 159.0, Max: 159.0,
    Diff: 0.0]
       [Code Root Fixup: 0.0 ms]
       [Clear CT: 0.1 ms]
       [Other: 1.3 ms]
          [Choose CSet: 0.0 ms]
          [Ref Proc: 0.1 ms]
          [Ref Enq: 0.0 ms]
          [Free CSet: 0.0 ms]
       [Eden: 11.0M(11.0M)->0.0B(23.0M) Survivors: 3072.0K->2048.0K
    Heap: 73.2M(254.0M)->82.7M(508.0M)]
     [Times: user=0.19 sys=0.00, real=0.03 secs]
    0.166: [GC pause (young), 0.0385523 secs]
       [Parallel Time: 35.9 ms, GC Workers: 8]
          [GC Worker Start (ms): Min: 166.4, Avg: 169.8, Max: 192.4,
    Diff: 25.9]
          [Ext Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff:
    0.4, Sum: 1.9]
          [Update RS (ms): Min: 0.0, Avg: 10.9, Max: 31.9, Diff: 31.9,
    Sum: 87.2]
             [Processed Buffers: Min: 0, Avg: 14.6, Max: 26, Diff: 26,
    Sum: 117]
          [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum:
    0.1]
          [Object Copy (ms): Min: 3.5, Avg: 21.4, Max: 27.0, Diff:
    23.4, Sum: 171.1]
          [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
    Sum: 0.4]
          [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
    0.0, Sum: 0.1]
          [GC Worker Total (ms): Min: 10.0, Avg: 32.6, Max: 35.9,
    Diff: 25.9, Sum: 260.7]
          [GC Worker End (ms): Min: 202.3, Avg: 202.4, Max: 202.4,
    Diff: 0.0]
       [Code Root Fixup: 0.0 ms]
       [Clear CT: 0.0 ms]
       [Other: 2.6 ms]
          [Choose CSet: 0.0 ms]
          [Ref Proc: 0.1 ms]
          [Ref Enq: 0.0 ms]
          [Free CSet: 0.0 ms]
       [Eden: 23.0M(23.0M)->0.0B(46.0M) Survivors: 2048.0K->4096.0K
    Heap: 105.7M(508.0M)->110.1M(1016.0M)]
     [Times: user=0.19 sys=0.00, real=0.04 secs]
    0.222: [GC pause (young), 0.0558720 secs]
       [Parallel Time: 53.0 ms, GC Workers: 8]
          [GC Worker Start (ms): Min: 222.0, Avg: 222.2, Max: 222.5,
    Diff: 0.5]
          [Ext Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff:
    0.4, Sum: 1.5]
          [Update RS (ms): Min: 7.7, Avg: 8.7, Max: 10.9, Diff: 3.2,
    Sum: 69.4]
             [Processed Buffers: Min: 7, Avg: 8.5, Max: 12, Diff: 5,
    Sum: 68]
          [Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 0.6, Diff: 0.6, Sum:
    2.3]
          [Object Copy (ms): Min: 41.7, Avg: 43.6, Max: 44.3, Diff:
    2.7, Sum: 348.5]
          [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
    Sum: 0.0]
          [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
    0.0, Sum: 0.2]
          [GC Worker Total (ms): Min: 52.4, Avg: 52.7, Max: 52.9,
    Diff: 0.5, Sum: 421.8]
          [GC Worker End (ms): Min: 274.9, Avg: 274.9, Max: 274.9,
    Diff: 0.0]
       [Code Root Fixup: 0.0 ms]
       [Clear CT: 0.0 ms]
       [Other: 2.8 ms]
          [Choose CSet: 0.0 ms]
          [Ref Proc: 0.1 ms]
          [Ref Enq: 0.0 ms]
          [Free CSet: 0.0 ms]
       [Eden: 46.0M(46.0M)->0.0B(74.0M) Survivors: 4096.0K->7168.0K
    Heap: 156.1M(1016.0M)->158.6M(1625.0M)]
     [Times: user=0.48 sys=0.01, real=0.06 secs]
    0.328: [GC pause (young), 0.0853794 secs]
       [Parallel Time: 82.8 ms, GC Workers: 8]
          [GC Worker Start (ms): Min: 327.9, Avg: 330.8, Max: 351.1,
    Diff: 23.2]
          [Ext Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff:
    0.3, Sum: 2.0]
          [Update RS (ms): Min: 0.0, Avg: 5.5, Max: 8.3, Diff: 8.3,
    Sum: 43.9]
             [Processed Buffers: Min: 0, Avg: 2.3, Max: 3, Diff: 3,
    Sum: 18]
          [Scan RS (ms): Min: 0.0, Avg: 2.2, Max: 3.3, Diff: 3.3, Sum:
    17.4]
          [Object Copy (ms): Min: 59.5, Avg: 71.8, Max: 73.7, Diff:
    14.2, Sum: 574.7]
          [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
    Sum: 0.2]
          [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
    0.0, Sum: 0.2]
          [GC Worker Total (ms): Min: 59.5, Avg: 79.8, Max: 82.7,
    Diff: 23.2, Sum: 638.4]
          [GC Worker End (ms): Min: 410.6, Avg: 410.7, Max: 410.7,
    Diff: 0.0]
       [Code Root Fixup: 0.0 ms]
       [Clear CT: 0.1 ms]
       [Other: 2.6 ms]
          [Choose CSet: 0.0 ms]
          [Ref Proc: 0.1 ms]
          [Ref Enq: 0.0 ms]
          [Free CSet: 0.1 ms]
       [Eden: 74.0M(74.0M)->0.0B(94.0M) Survivors: 7168.0K->11.0M
    Heap: 232.6M(1625.0M)->237.6M(2112.0M)]
     [Times: user=0.59 sys=0.00, real=0.09 secs]
    0.447: [GC pause (young), 0.1239103 secs]
       [Parallel Time: 121.5 ms, GC Workers: 8]
          [GC Worker Start (ms): Min: 447.5, Avg: 447.7, Max: 448.5,
    Diff: 0.9]
          [Ext Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff:
    0.3, Sum: 1.9]
          [Update RS (ms): Min: 26.5, Avg: 28.2, Max: 28.7, Diff: 2.2,
    Sum: 225.7]
             [Processed Buffers: Min: 38, Avg: 39.8, Max: 44, Diff: 6,
    Sum: 318]
          [Scan RS (ms): Min: 0.3, Avg: 0.7, Max: 1.9, Diff: 1.6, Sum:
    5.3]
          [Object Copy (ms): Min: 92.1, Avg: 92.2, Max: 92.3, Diff:
    0.2, Sum: 737.5]
          [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
    Sum: 0.3]
          [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
    0.0, Sum: 0.2]
          [GC Worker Total (ms): Min: 120.6, Avg: 121.4, Max: 121.5,
    Diff: 0.9, Sum: 970.8]
          [GC Worker End (ms): Min: 569.0, Avg: 569.0, Max: 569.0,
    Diff: 0.0]
       [Code Root Fixup: 0.0 ms]
       [Clear CT: 0.1 ms]
       [Other: 2.3 ms]
          [Choose CSet: 0.0 ms]
          [Ref Proc: 0.1 ms]
          [Ref Enq: 0.0 ms]
          [Free CSet: 0.1 ms]
       [Eden: 94.0M(94.0M)->0.0B(111.0M) Survivors: 11.0M->14.0M Heap:
    331.6M(2112.0M)->334.6M(2502.0M)]
     [Times: user=0.80 sys=0.05, real=0.12 secs]
    0.599: [GC pause (young), 0.1479438 secs]
       [Parallel Time: 145.7 ms, GC Workers: 8]
          [GC Worker Start (ms): Min: 599.4, Avg: 599.5, Max: 599.8,
    Diff: 0.4]
          [Ext Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff:
    0.3, Sum: 1.9]
          [Update RS (ms): Min: 41.8, Avg: 43.0, Max: 44.0, Diff: 2.1,
    Sum: 343.6]
             [Processed Buffers: Min: 67, Avg: 70.9, Max: 73, Diff: 6,
    Sum: 567]
          [Scan RS (ms): Min: 0.0, Avg: 0.8, Max: 1.9, Diff: 1.9, Sum:
    6.2]
          [Object Copy (ms): Min: 101.3, Avg: 101.6, Max: 101.7, Diff:
    0.3, Sum: 812.6]
          [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
    Sum: 0.1]
          [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
    0.0, Sum: 0.2]
          [GC Worker Total (ms): Min: 145.2, Avg: 145.6, Max: 145.6,
    Diff: 0.4, Sum: 1164.6]
          [GC Worker End (ms): Min: 745.1, Avg: 745.1, Max: 745.1,
    Diff: 0.0]
       [Code Root Fixup: 0.0 ms]
       [Clear CT: 0.1 ms]
       [Other: 2.2 ms]
          [Choose CSet: 0.0 ms]
          [Ref Proc: 0.1 ms]
          [Ref Enq: 0.0 ms]
          [Free CSet: 0.1 ms]
       [Eden: 111.0M(111.0M)->0.0B(124.0M) Survivors: 14.0M->16.0M
    Heap: 445.6M(2502.0M)->448.6M(2814.0M)]
     [Times: user=1.20 sys=0.05, real=0.15 secs]
    0.787: [GC pause (young), 0.1625321 secs]
       [Parallel Time: 160.0 ms, GC Workers: 8]
          [GC Worker Start (ms): Min: 786.6, Avg: 786.7, Max: 786.9,
    Diff: 0.4]
          [Ext Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff:
    0.3, Sum: 1.8]
          [Update RS (ms): Min: 46.4, Avg: 47.0, Max: 49.0, Diff: 2.5,
    Sum: 376.0]
             [Processed Buffers: Min: 75, Avg: 78.0, Max: 79, Diff: 4,
    Sum: 624]
          [Scan RS (ms): Min: 0.0, Avg: 0.9, Max: 1.5, Diff: 1.5, Sum:
    7.4]
          [Object Copy (ms): Min: 110.6, Avg: 111.7, Max: 112.0, Diff:
    1.4, Sum: 893.5]
          [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
    Sum: 0.1]
          [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff:
    0.1, Sum: 0.3]
          [GC Worker Total (ms): Min: 159.6, Avg: 159.9, Max: 160.0,
    Diff: 0.4, Sum: 1279.0]
          [GC Worker End (ms): Min: 946.5, Avg: 946.5, Max: 946.6,
    Diff: 0.1]
       [Code Root Fixup: 0.0 ms]
       [Clear CT: 0.1 ms]
       [Other: 2.4 ms]
          [Choose CSet: 0.0 ms]
          [Ref Proc: 0.1 ms]
          [Ref Enq: 0.0 ms]
          [Free CSet: 0.2 ms]
       [Eden: 124.0M(124.0M)->0.0B(135.0M) Survivors: 16.0M->18.0M
    Heap: 572.6M(2814.0M)->576.6M(3064.0M)]
     [Times: user=1.37 sys=0.00, real=0.16 secs]
    0.981: [GC pause (young), 0.2063055 secs]
       [Parallel Time: 204.1 ms, GC Workers: 8]
          [GC Worker Start (ms): Min: 980.8, Avg: 980.9, Max: 981.0,
    Diff: 0.2]
          [Ext Root Scanning (ms): Min: 0.1, Avg: 0.3, Max: 0.3, Diff:
    0.2, Sum: 2.1]
          [Update RS (ms): Min: 55.9, Avg: 57.8, Max: 58.8, Diff: 2.9,
    Sum: 462.8]
             [Processed Buffers: Min: 100, Avg: 101.5, Max: 103, Diff:
    3, Sum: 812]
          [Scan RS (ms): Min: 0.0, Avg: 1.0, Max: 3.1, Diff: 3.1, Sum:
    8.3]
          [Object Copy (ms): Min: 144.7, Avg: 144.8, Max: 144.9, Diff:
    0.1, Sum: 1158.3]
          [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
    Sum: 0.3]
          [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
    0.0, Sum: 0.2]
          [GC Worker Total (ms): Min: 203.8, Avg: 204.0, Max: 204.0,
    Diff: 0.2, Sum: 1631.9]
          [GC Worker End (ms): Min: 1184.9, Avg: 1184.9, Max: 1184.9,
    Diff: 0.0]
       [Code Root Fixup: 0.0 ms]
       [Clear CT: 0.1 ms]
       [Other: 2.1 ms]
          [Choose CSet: 0.0 ms]
          [Ref Proc: 0.1 ms]
          [Ref Enq: 0.0 ms]
          [Free CSet: 0.1 ms]
       [Eden: 135.0M(135.0M)->0.0B(143.0M) Survivors: 18.0M->20.0M
    Heap: 711.6M(3064.0M)->714.6M(3264.0M)]
     [Times: user=1.40 sys=0.11, real=0.21 secs]
    CPU Load Is -1.0

    Start
    Stop
    Sleep
    CPU Load Is 0.9166222455142531

    Start
    Stop
    Sleep
    CPU Load Is 0.907013989900451

    Start
    Stop
    Sleep
    CPU Load Is 0.9085635227776081

    Start
    Stop
    Sleep
    CPU Load Is 0.909945506396622



Note that all the logged GC occurs during the construction of my graph of Nodes, which is /before/ my algorithm (modifyGraph) starts, There is no log of GC activity once the algorithm starts, but there is significant (100%) CPU usage.

My questions are:

  * Why is the G1 garbage collector consuming so much CPU time? What
    is it doing?
  * Why is the G1 garbage collector not logging anything? The only

    reason I even know it's the garbage collector consuming my CPU
    time is that (a) I only see this behaviour when the G1 collector
    is enabled and (b) the load on the CPU correlates with the value
    of -XX:ParallelGCThreads.
  * Are there particular object-graph structures that the G1 garbage

    collector will struggle with? Should complex graphs be considered
    bad coding practice?
  * How can I write my code to avoid this behaviour in the G1 garbage

    collector? For example, if all my Nodes are in an array, will this
    fix it?
  * Should this be considered a bug in the G1 garbage collector? This

    is far beyond 'a small increase in CPU usage'.

Just to demonstrate the issue further, I timed my calls to modifyGraph() and trialled different GC parameters:

  * -XX:+UseG1GC -XX:ParallelGCThreads=1 took 82.393 seconds and CPU
    load was 0.1247
  * -XX:+UseG1GC -XX:ParallelGCThreads=4 took 19.829 seconds and CPU
    load was 0.5960
  * -XX:+UseG1GC -XX:ParallelGCThreads=8 took 14.815 seconds and CPU
    load was 0.9184
  * -XX:+UseConcMarkSweepGC took 0.322 seconds and CPU load was 0.1119

    regardless of the setting of -XX:ParallelGCThreads

So using the CMS GC made my application 44x faster (14.815 seconds versus 0.322 seconds) and placed 1/8th of the load (0.9184 versus 0.1119) on the CPU.

If my code represents some kind of hypothetical worst case for the G1 garbage collector, I think it should be documented and/or fixed somehow.

Regards,
Peter.



On Tue, Jun 3, 2014 at 3:16 PM, Tao Mao <yiyeguhu at gmail.com <mailto:yiyeguhu at gmail.com>> wrote:

    And, use –XX:+PrintGCDetails –XX:+PrintGCTimeStamps to get more
    log. Thanks. -Tao


    On Tue, Jun 3, 2014 at 2:13 PM, Tao Mao <yiyeguhu at gmail.com
    <mailto:yiyeguhu at gmail.com>> wrote:

        Hi Peter,

        What was your actual question?
        Try -XX:ParallelGCThreads=<value> if you want less CPU usage
        from GC.

        Thanks.
        Tao


        On Tue, Jun 3, 2014 at 11:49 AM, Peter Harvey
        <harvey at actenum.com <mailto:harvey at actenum.com>> wrote:

            Small correction. The last example of output was with
            "-XX:+UseConcMarkSweepGC -verbose:gc".


            On Tue, Jun 3, 2014 at 12:41 PM, Peter Harvey
            <harvey at actenum.com <mailto:harvey at actenum.com>> wrote:

                I have an algorithm (at bottom of email) which builds
                a graph of 'Node' objects with random connections
                between them. It then repeatedly processes a queue of
                those Nodes, adding new Nodes to the queue as it goes.
                This is a single-threaded algorithm that will never
                terminate. Our actual production code is much more
                complex, but I've trimmed it down as much as possible.

                On Windows 7 with JRE 7u60, enabling the G1 garbage
                collector will cause the JRE to consume all 8 cores of
                my CPU. No other garbage collector does this. You can
                see the differences in CPU load in the example output
                below. It's also worth nothing that "-verbose:gc" with
                the G1 garbage collector prints nothing after my
                algorithm starts. Presumably the G1 garbage collector
                is doing something (concurrent mark?), but it's not
                printing anything about it.

                When run with VM args "-XX:+UseG1GC -verbose:gc" I get
                output like this (note the huge CPU load value which
                should not be this high for a single-threaded
                algorithm on an 8 core CPU):

                    [GC pause (young) 62M->62M(254M), 0.0394214 secs]
                    [GC pause (young) 73M->83M(508M), 0.0302781 secs]
                    [GC pause (young) 106M->111M(1016M), 0.0442273 secs]
                    [GC pause (young) 157M->161M(1625M), 0.0660902 secs]
                    [GC pause (young) 235M->240M(2112M), 0.0907231 secs]
                    [GC pause (young) 334M->337M(2502M), 0.1356917 secs]
                    [GC pause (young) 448M->450M(2814M), 0.1219090 secs]
                    [GC pause (young) 574M->577M(3064M), 0.1778062 secs]
                    [GC pause (young) 712M->715M(3264M), 0.1878443 secs]
                    CPU Load Is -1.0

                    Start
                    Stop
                    Sleep
                    CPU Load Is 0.9196154547182949

                    Start
                    Stop
                    Sleep
                    CPU Load Is 0.9150735995043818

                    ...



                When run with VM args "-XX:+UseParallelGC -verbose:gc"
                I get output like this:

                    [GC 65536K->64198K(249344K), 0.0628289 secs]
                    [GC 129734K->127974K(314880K), 0.1583369 secs]
                    [Full GC 127974K->127630K(451072K), 0.9675224 secs]
                    [GC 258702K->259102K(451072K), 0.3543645 secs]
                    [Full GC 259102K->258701K(732672K), 1.8085702 secs]
                    [GC 389773K->390181K(790528K), 0.3332060 secs]
                    [GC 579109K->579717K(803328K), 0.5126388 secs]
                    [Full GC 579717K->578698K(1300480K), 4.0647303 secs]
                    [GC 780426K->780842K(1567232K), 0.4364933 secs]
                    CPU Load Is -1.0

                    Start
                    Stop
                    Sleep
                    CPU Load Is 0.03137771539054431

                    Start
                    Stop
                    Sleep
                    CPU Load Is 0.032351299224373145

                    ...



                When run with VM args "-verbose:gc" I get output like
                this:

                    [GC 69312K->67824K(251136K), 0.1533803 secs]
                    [GC 137136K->135015K(251136K), 0.0970460 secs]
                    [GC 137245K(251136K), 0.0095245 secs]
                    [GC 204327K->204326K(274368K), 0.1056259 secs]
                    [GC 273638K->273636K(343680K), 0.1081515 secs]
                    [GC 342948K->342946K(412992K), 0.1181966 secs]
                    [GC 412258K->412257K(482304K), 0.1126966 secs]
                    [GC 481569K->481568K(551808K), 0.1156015 secs]
                    [GC 550880K->550878K(620928K), 0.1184089 secs]
                    [GC 620190K->620189K(690048K), 0.1209312 secs]
                    [GC 689501K->689499K(759552K), 0.1199338 secs]
                    [GC 758811K->758809K(828864K), 0.1162532 secs]
                    CPU Load Is -1.0

                    Start
                    Stop
                    Sleep
                    CPU Load Is 0.10791719146608299

                    Start
                    [GC 821213K(828864K), 0.1966807 secs]
                    Stop
                    Sleep
                    CPU Load Is 0.1540065314146181

                    Start
                    Stop
                    Sleep
                    [GC 821213K(1328240K), 0.1962688 secs]
                    CPU Load Is 0.08427292195744103

                    ...



                Why is the G1 garbage collector consuming so much CPU
                time? Is it stuck in the mark phase as I am modifying
                the graph structure?

                I'm not a subscriber to the list, so please CC me in
                any response.

                Thanks,
                Peter.

                --

                import java.lang.management.ManagementFactory;
                import com.sun.management.OperatingSystemMXBean;
                import java.util.Random;

                @SuppressWarnings("restriction")
                public class Node {
                private static OperatingSystemMXBean os =
                (OperatingSystemMXBean)
                ManagementFactory.getOperatingSystemMXBean();

                private Node next;

                private Node[] others = new Node[10];

                public static void main(String[] args) throws
                InterruptedException {

                // Build a graph of Nodes
                Node head = buildGraph();

                while (true) {
                // Print CPU load for this process
                System.out.println("CPU Load Is " +
                os.getProcessCpuLoad());
                System.out.println();

                // Modify the graph
                System.out.println("Start");
                head = modifyGraph(head);
                System.out.println("Stop");

                // Sleep, as otherwise we tend to DoS the host computer...
                System.out.println("Sleep");
                Thread.sleep(1000);
                }
                }

                private static Node buildGraph() {

                // Create a collection of Node objects
                Node[] array = new Node[10000000];
                for (int i = 0; i < array.length; i++) {
                array[i] = new Node();
                }

                // Each Node refers to 10 other random Nodes
                Random random = new Random(12);
                for (int i = 0; i < array.length; i++) {
                for (int j = 0; j < array[i].others.length; j++) {
                int k = random.nextInt(array.length);
                array[i].others[j] = array[k];
                }
                }

                // The first Node serves as the head of a queue
                return array[0];
                }

                private static Node modifyGraph(Node head) {

                // Perform a million iterations
                for (int i = 0; i < 1000000; i++) {

                // Pop a Node off the head of the queue
                Node node = head;
                head = node.next;
                node.next = null;

                // Add the other Nodes to the head of the queue
                for (Node other : node.others) {
                other.next = head;
                head = other;
                }
                }
                return head;
                }

                }

                --                 *Actenum Corporation*
                Peter Harvey  |  Cell: 780.729.8192 <tel:780.729.8192>
                 | harvey at actenum.com <mailto:harvey at actenum.com>  |
                www.actenum.com <http://www.actenum.com>




            --             *Actenum Corporation*
            Peter Harvey  |  Cell: 780.729.8192 <tel:780.729.8192>  |
            harvey at actenum.com <mailto:harvey at actenum.com>  |
            www.actenum.com <http://www.actenum.com>






-- 
*Actenum Corporation*
Peter Harvey  |  Cell: 780.729.8192  | harvey at actenum.com <mailto:harvey at actenum.com>  | www.actenum.com <http://www.actenum.com>




-- 
Actenum Corporation
Peter Harvey  |  Cell: 780.729.8192  |  harvey at actenum.com  |  www.actenum.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20140604/16ea063a/attachment.htm>


More information about the hotspot-gc-dev mailing list