G1 GC - pauses much larger than target
Kirti Teja Rao
kirtiteja at gmail.com
Mon Feb 24 17:29:33 PST 2014
Hi,
I tried with swappiness set to 0 and turned off all the logging of the
other application that is running on the same machine to cut down on the io
on the machine. The results are much better and all the large outliers with
over 100ms and upto 500-600 msec are gone now. I see pauses around
50ms-60ms for a pause target of 30ms which is ok to work with. Attached is
the scatter chart comparison from excel with one of the earlier runs and
this run. It is also interesting to see there are less number of gcs but
slightly more pause interval per gc in this run.
I am yet to find out if this improvement is because of setting swappiness
to 0 or cutting down on the IO. Will do individual runs tomorrow or day
after and will update the thread about the findings.
On Sat, Feb 22, 2014 at 8:09 AM, Andreas Müller
<Andreas.Mueller at mgm-tp.com>wrote:
> Hi Kirti,
>
>
>
> > [Times: user=0.06 sys=0.00, real=1.54 secs] real time being far greater
> than user time,
>
> That's extreme.
>
> But running several JVMs on the same hardware can produce such a
> phenomenon on a smaller scale. I usually observe an increase of pause times
> when two JVMs compete for the CPUs by a factor between sqrt(2) and 2 which
> I understand from a statistical point of view.
>
>
>
> I have also got the impression that G1 is much more sensitive to CPU
> contention (even well below 100% load) than other collectors.
>
> I have attached a GC log graphic where one JVM runs G1 on a 25 GB heap for
> 30 minutes when after 15 minutes a second JVM with exactly the same
> settings starts to compete for the CPUs (and other system resources). It is
> clearly visible from the plot that the second JVM has a huge impact on the
> operation of the first one.:
>
> - Long pauses jump from a max of 250 millis to 700 millis (almost
> a factor of 3)
>
> - G1 sharply decreases the new generation size (probably to regain
> control of pause times because the target is at default value, i.e. 200
> millis)
>
> The statistics on the right hand side are taken only from the time period
> where both JVMs compete.
>
> My impression is that G1 does not like sharing CPUs with other processes.
> It probably spoils its ability to predict pause durations properly.
>
>
>
> Your example with a 2 GB heap looks much more extreme, however.
>
> It looks like your GC threads are almost being starved by something.
>
> I would be very happy to learn about the reason if any can be found.
>
>
>
> Best regards
>
> Andreas
>
>
>
> *Von:* Kirti Teja Rao [mailto:kirtiteja at gmail.com]
> *Gesendet:* Freitag, 21. Februar 2014 20:28
> *An:* Andreas Müller
>
> *Cc:* hotspot-gc-use at openjdk.java.net
> *Betreff:* Re: G1 GC - pauses much larger than target
>
>
>
> Hi,
>
>
>
> @Jenny - CPU looks fine. Never over 40% and generally between 25-35%. Some
> of these pauses are as large as 1 second and these are always observed
> after the parallel phase, I assume this is the phase were G1 would need the
> most amount of CPU.
>
>
>
> @Andreas - Most of these pauses are in young collection and are not
> showing in the parallel/serial phases shown in GC log. The pauses i observe
> are unreasonable 1.5+ sec for a heap of 2 GB.
>
>
>
> @All - [Times: user=0.06 sys=0.00, real=1.54 secs] real time being far
> greater than user time, I believe G1 is blocked on some resource. The
> application i run is not swapping and also there is more headroom in
> memory. CPU is less than 35%.There are other applications running on the
> machine which log quite a bit and can cause the iowait avg queue size to
> spike upto 20-30 occasionally. Does G1 logging happen during the pause
> time? Can a slow disk or high disk IO affect these timings?
>
>
>
> Is there anything else that we can try to uncover the cause for these
> pauses?
>
>
>
>
>
> 2014-02-21T06:18:13.592+0000: 12675.969: Application time: 10.7438770
> seconds
>
> 2014-02-21T06:18:13.593+0000: 12675.970: [GC pause (young)
>
> Desired survivor size 81788928 bytes, new threshold 15 (max 15)
>
> - age 1: 564704 bytes, 564704 total
>
> - age 2: 18504 bytes, 583208 total
>
> - age 3: 18552 bytes, 601760 total
>
> - age 4: 18776 bytes, 620536 total
>
> - age 5: 197048 bytes, 817584 total
>
> - age 6: 18712 bytes, 836296 total
>
> - age 7: 18456 bytes, 854752 total
>
> - age 8: 18920 bytes, 873672 total
>
> - age 9: 18456 bytes, 892128 total
>
> - age 10: 18456 bytes, 910584 total
>
> - age 11: 18456 bytes, 929040 total
>
> - age 12: 18456 bytes, 947496 total
>
> - age 13: 18488 bytes, 965984 total
>
> - age 14: 18456 bytes, 984440 total
>
> - age 15: 18456 bytes, 1002896 total
>
> 12675.970: [G1Ergonomics (CSet Construction) start choosing CSet,
> _pending_cards: 4408, predicted base time: 6.77 ms, remaining time: 23.23
> ms, target pause time: 30.00 ms]
>
> 12675.970: [G1Ergonomics (CSet Construction) add young regions to CSet,
> eden: 306 regions, survivors: 1 regions, predicted young region time: 1.89
> ms]
>
> 12675.970: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
> 306 regions, survivors: 1 regions, old: 0 regions, predicted pause time:
> 8.67 ms, target pause time: 30.00 ms]
>
> , 0.0079290 secs]
>
> [Parallel Time: 6.0 ms, GC Workers: 9]
>
> [GC Worker Start (ms): Min: 12675970.1, Avg: 12675970.3, Max:
> 12675970.8, Diff: 0.7]
>
> [Ext Root Scanning (ms): Min: 3.0, Avg: 4.0, Max: 5.0, Diff: 1.9,
> Sum: 36.3]
>
> [Update RS (ms): Min: 0.0, Avg: 0.5, Max: 0.9, Diff: 0.9, Sum: 4.1]
>
> [Processed Buffers: Min: 0, Avg: 5.4, Max: 13, Diff: 13, Sum: 49]
>
> [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.9]
>
> [Object Copy (ms): Min: 0.3, Avg: 0.7, Max: 0.9, Diff: 0.6, Sum: 6.5]
>
> [Termination (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 2.0]
>
> [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum:
> 0.2]
>
> [GC Worker Total (ms): Min: 5.1, Avg: 5.6, Max: 5.8, Diff: 0.8, Sum:
> 50.1]
>
> [GC Worker End (ms): Min: 12675975.8, Avg: 12675975.9, Max:
> 12675975.9, Diff: 0.1]
>
> [Code Root Fixup: 0.0 ms]
>
> [Clear CT: 0.5 ms]
>
> [Other: 1.4 ms]
>
> [Choose CSet: 0.0 ms]
>
> [Ref Proc: 0.5 ms]
>
> [Ref Enq: 0.0 ms]
>
> [Free CSet: 0.7 ms]
>
> [Eden: 1224.0M(1224.0M)->0.0B(1224.0M) Survivors: 4096.0K->4096.0K
> Heap: 1342.2M(2048.0M)->118.1M(2048.0M)]
>
> [Times: user=0.06 sys=0.00, real=1.54 secs]
>
> 2014-02-21T06:18:15.135+0000: 12677.511: Total time for which application
> threads were stopped: 1.5421650 seconds
>
>
>
>
>
> On Fri, Feb 21, 2014 at 12:38 AM, Andreas Müller <
> Andreas.Mueller at mgm-tp.com> wrote:
>
> Hi Kirti,
>
>
>
> > I am trying out G1 collector for our application. Our application runs
> with 2GB heap and we expect relatively low latency.
>
> > The pause time target is set to 25ms. There >are much bigger pauses (and
> unexplained) in order of few 100s of ms.
>
> > This is not a rare occurence and i can see this 15-20 times in 6-7 hours
> runs.
>
>
>
> This conforms to what I have observed in extended tests:
>
> G1's control of GC pause duration is limited to a rather narrow range.
>
> Even in that range, only new gen pauses do follow the pause time target
> well while "mixed" pauses tend to overshoot with considerable probability.
>
> Find attached a graphic which shows what I mean:
>
> - New gen pauses (red) do follow the target very well from 150-800
> millis
>
> - With a target below 150 the actual new gen pauses remain flat at
> 150-180 millis
>
> - "mixed" pauses (blue) do not follow the target well and some of
> them will always take 500-700 millis, whatever the target be
>
> - There are other pauses (remark etc., green) which are short but
> completely independent of the target value
>
>
>
> The range with reasonable control depends on the heap size, the
> application and the hardware.
>
> I measured the graphic attached on a 6-core Xeon/2GHz server running Java
> 7u45 on CentOS/Linux with 64 GB RAM and a heap size of -Xms50g -Xmx50g.
>
> (For which the pause durations achieved are not bad at all!)
>
> The application was a synthetic benchmark described here:
> http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/
>
> With the same benchmark but only 10 GB of overall heap size on a Oracle T3
> server running Java 7u45 on Solaris/SPARC I got a very similar kind of plot
> but the range with reasonable pause time control was now 60-180 millis.
>
> Again the pause durations reached were by themselves not bad at all. But
> the idea of setting a pause time target and expecting it to be followed in
> a meaningful way is to some extent misleading.
>
>
>
> These results on G1's pause time control will be published soon on the
> blog of the link above.
>
>
>
> Best regards
>
> Andreas
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140224/c8e50aa5/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: NoLoggingComparision.png
Type: image/png
Size: 13893 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140224/c8e50aa5/NoLoggingComparision.png
More information about the hotspot-gc-use
mailing list