AW: G1 GC - pauses much larger than target

Thu Feb 27 11:51:42 PST 2014

There is a follow up question i have. What is the GC phase that is affected
due to paging and why is it not showing up in PrintGCDetails? Also, i see
this is the time spent after the parallel phase is complete. Any ideas on
this?

On Tue, Feb 25, 2014 at 5:24 PM, Kirti Teja Rao <kirtiteja at gmail.com> wrote:

> I  am running the tests on Cent OS 5.5  with kernel 2.6.18-194.el5. The
> deviations are due vm swappiness and nothing to do with IO of the other
> process. Thanks for all members, for the help in resolving the issue. I
> have to continue to tune it for the application but for now the major
> problem is resolved.
>
>
> On Tue, Feb 25, 2014 at 3:26 AM, Chi Ho Kwok <chkwok at digibites.nl> wrote:
>
>> We're running CMS and with the default swappiness, Linux swaps out huge
>> chunks of idle, not actively used reserve memory; from the OS point of
>> view, it's malloc'ed memory that isn't used for hours, but oh dear when CMS
>> does a full sweep and clean them up... Swapstorm ahoy.
>>
>> vm.swappiness is the first thing we change on an os install.
>>
>>
>> Kind regards,
>>
>> Chi Ho Kwok
>> On 25 Feb 2014 10:55, "Andreas Müller" <Andreas.Mueller at mgm-tp.com>
>> wrote:
>>
>>>  Hi Kirti,
>>>
>>>
>>>
>>> thanks.
>>>
>>> > I tried with swappiness set to 0
>>>
>>> Interesting. This means then, that the Linux kernel was swapping around
>>> with no real need? I am surprised to read this but it  explains the long
>>> real time and the low usr time from your logs. I have sometimes seen GC
>>> logs from Linux which strangely looked like swapping without any shortage
>>> of physical RAM (which I can't remember seeing on Solaris). This difference
>>> is  worth keeping in mind!
>>>
>>>
>>>
>>> > I am yet to find out if this improvement is because of setting
>>> swappiness to 0 or cutting down on the IO.
>>>
>>> I assume that swappiness is the prime factor here. Without swapping, why
>>> should a JVM care about the IO of another process? I would be rather
>>> shocked (and eye Linux with a lot more suspicion) if it were different.
>>>
>>>
>>>
>>> Can you tell whether other collectors (e.g. CMS) also suffered from idle
>>> GC pauses when you ran them with swappiness set to the default of 60?
>>>
>>>
>>>
>>> Best regards
>>>
>>> Andreas
>>>
>>>
>>>
>>> *Von:* Kirti Teja Rao [mailto:kirtiteja at gmail.com]
>>> *Gesendet:* Dienstag, 25. Februar 2014 02:30
>>> *An:* Andreas Müller
>>> *Cc:* hotspot-gc-use at openjdk.java.net
>>> *Betreff:* Re: G1 GC - pauses much larger than target
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I tried with swappiness set to 0 and turned off all the logging of the
>>> other application that is running on the same machine to cut down on the io
>>> on the machine. The results are much better and all the large outliers with
>>> over 100ms and upto 500-600 msec are gone now. I see pauses around
>>> 50ms-60ms for a pause target of 30ms which is ok to work with. Attached is
>>> the scatter chart comparison from excel with one of the earlier runs and
>>> this run. It is also interesting to see there are less number of gcs but
>>> slightly more pause interval per gc in this run.
>>>
>>>
>>>
>>> I am yet to find out if this improvement is because of setting
>>> swappiness to 0 or cutting down on the IO. Will do individual runs tomorrow
>>> or day after and will update the thread about the findings.
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Feb 22, 2014 at 8:09 AM, Andreas Müller <
>>> Andreas.Mueller at mgm-tp.com> wrote:
>>>
>>> Hi Kirti,
>>>
>>>
>>>
>>> > [Times: user=0.06 sys=0.00, real=1.54 secs] real time being far
>>> greater than user time,
>>>
>>> That's extreme.
>>>
>>> But running several JVMs on the same hardware can produce such a
>>> phenomenon on a smaller scale. I usually observe an increase of pause times
>>> when two JVMs compete for the CPUs by a factor between sqrt(2) and 2 which
>>> I understand from a statistical point of view.
>>>
>>>
>>>
>>> I have also got the impression that G1 is much more sensitive to CPU
>>> contention (even well below 100% load) than other collectors.
>>>
>>> I have attached a GC log graphic where one JVM runs G1 on a 25 GB heap
>>> for 30 minutes when after 15 minutes a second JVM with exactly the same
>>> settings starts to compete for the CPUs (and other system resources). It is
>>> clearly visible from the plot that the second JVM has a huge impact on the
>>> operation of the first one.:
>>>
>>> -        Long pauses jump from a max of 250 millis to 700 millis
>>> (almost a factor of 3)
>>>
>>> -        G1 sharply decreases the new generation size (probably to
>>> regain control of pause times because the target is at default value, i.e.
>>> 200 millis)
>>>
>>> The statistics on the right hand side are taken only from the time
>>> period where both JVMs compete.
>>>
>>> My impression is that G1 does not like sharing CPUs with other
>>> processes. It probably spoils its ability to predict pause durations
>>> properly.
>>>
>>>
>>>
>>> Your example with a 2 GB heap looks much more extreme, however.
>>>
>>> It looks like your GC threads are almost being starved by something.
>>>
>>> I would be very happy to learn about the reason if any can be found.
>>>
>>>
>>>
>>> Best regards
>>>
>>> Andreas
>>>
>>>
>>>
>>> *Von:* Kirti Teja Rao [mailto:kirtiteja at gmail.com]
>>> *Gesendet:* Freitag, 21. Februar 2014 20:28
>>> *An:* Andreas Müller
>>>
>>>
>>> *Cc:* hotspot-gc-use at openjdk.java.net
>>> *Betreff:* Re: G1 GC - pauses much larger than target
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> @Jenny - CPU looks fine. Never over 40% and generally between 25-35%.
>>> Some of these pauses are as large as 1 second and these are always observed
>>> after the parallel phase, I assume this is the phase were G1 would need the
>>> most amount of CPU.
>>>
>>>
>>>
>>> @Andreas - Most of these pauses are in young collection and are not
>>> showing in the parallel/serial phases shown in GC log. The pauses i observe
>>> are unreasonable 1.5+ sec for a heap of 2 GB.
>>>
>>>
>>>
>>> @All - [Times: user=0.06 sys=0.00, real=1.54 secs] real time being far
>>> greater than user time, I believe G1 is blocked on some resource. The
>>> application i run is not swapping and also there is more headroom in
>>> memory. CPU is less than 35%.There are other applications running on the
>>> machine which log quite a bit and can cause the iowait avg queue size to
>>> spike upto 20-30 occasionally. Does G1 logging happen during the pause
>>> time? Can a slow disk or high disk IO affect these timings?
>>>
>>>
>>>
>>> Is there anything else that we can try to uncover the cause for these
>>> pauses?
>>>
>>>
>>>
>>>
>>>
>>> 2014-02-21T06:18:13.592+0000: 12675.969: Application time: 10.7438770
>>> seconds
>>>
>>> 2014-02-21T06:18:13.593+0000: 12675.970: [GC pause (young)
>>>
>>> Desired survivor size 81788928 bytes, new threshold 15 (max 15)
>>>
>>> - age   1:     564704 bytes,     564704 total
>>>
>>> - age   2:      18504 bytes,     583208 total
>>>
>>> - age   3:      18552 bytes,     601760 total
>>>
>>> - age   4:      18776 bytes,     620536 total
>>>
>>> - age   5:     197048 bytes,     817584 total
>>>
>>> - age   6:      18712 bytes,     836296 total
>>>
>>> - age   7:      18456 bytes,     854752 total
>>>
>>> - age   8:      18920 bytes,     873672 total
>>>
>>> - age   9:      18456 bytes,     892128 total
>>>
>>> - age  10:      18456 bytes,     910584 total
>>>
>>> - age  11:      18456 bytes,     929040 total
>>>
>>> - age  12:      18456 bytes,     947496 total
>>>
>>> - age  13:      18488 bytes,     965984 total
>>>
>>> - age  14:      18456 bytes,     984440 total
>>>
>>> - age  15:      18456 bytes,    1002896 total
>>>
>>>  12675.970: [G1Ergonomics (CSet Construction) start choosing CSet,
>>> _pending_cards: 4408, predicted base time: 6.77 ms, remaining time: 23.23
>>> ms, target pause time: 30.00 ms]
>>>
>>>  12675.970: [G1Ergonomics (CSet Construction) add young regions to CSet,
>>> eden: 306 regions, survivors: 1 regions, predicted young region time: 1.89
>>> ms]
>>>
>>>  12675.970: [G1Ergonomics (CSet Construction) finish choosing CSet,
>>> eden: 306 regions, survivors: 1 regions, old: 0 regions, predicted pause
>>> time: 8.67 ms, target pause time: 30.00 ms]
>>>
>>> , 0.0079290 secs]
>>>
>>>    [Parallel Time: 6.0 ms, GC Workers: 9]
>>>
>>>       [GC Worker Start (ms): Min: 12675970.1, Avg: 12675970.3, Max:
>>> 12675970.8, Diff: 0.7]
>>>
>>>       [Ext Root Scanning (ms): Min: 3.0, Avg: 4.0, Max: 5.0, Diff: 1.9,
>>> Sum: 36.3]
>>>
>>>       [Update RS (ms): Min: 0.0, Avg: 0.5, Max: 0.9, Diff: 0.9, Sum: 4.1]
>>>
>>>          [Processed Buffers: Min: 0, Avg: 5.4, Max: 13, Diff: 13, Sum:
>>> 49]
>>>
>>>       [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.9]
>>>
>>>       [Object Copy (ms): Min: 0.3, Avg: 0.7, Max: 0.9, Diff: 0.6, Sum:
>>> 6.5]
>>>
>>>       [Termination (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum:
>>> 2.0]
>>>
>>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>>> Sum: 0.2]
>>>
>>>       [GC Worker Total (ms): Min: 5.1, Avg: 5.6, Max: 5.8, Diff: 0.8,
>>> Sum: 50.1]
>>>
>>>       [GC Worker End (ms): Min: 12675975.8, Avg: 12675975.9, Max:
>>> 12675975.9, Diff: 0.1]
>>>
>>>    [Code Root Fixup: 0.0 ms]
>>>
>>>    [Clear CT: 0.5 ms]
>>>
>>>    [Other: 1.4 ms]
>>>
>>>       [Choose CSet: 0.0 ms]
>>>
>>>       [Ref Proc: 0.5 ms]
>>>
>>>       [Ref Enq: 0.0 ms]
>>>
>>>       [Free CSet: 0.7 ms]
>>>
>>>    [Eden: 1224.0M(1224.0M)->0.0B(1224.0M) Survivors: 4096.0K->4096.0K
>>> Heap: 1342.2M(2048.0M)->118.1M(2048.0M)]
>>>
>>>  [Times: user=0.06 sys=0.00, real=1.54 secs]
>>>
>>> 2014-02-21T06:18:15.135+0000: 12677.511: Total time for which
>>> application threads were stopped: 1.5421650 seconds
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Feb 21, 2014 at 12:38 AM, Andreas Müller <
>>> Andreas.Mueller at mgm-tp.com> wrote:
>>>
>>> Hi Kirti,
>>>
>>>
>>>
>>> > I am trying out G1 collector for our application. Our application
>>> runs with 2GB heap and we expect relatively low latency.
>>>
>>> > The pause time target is set to 25ms. There >are much bigger pauses
>>> (and unexplained) in order of few 100s of ms.
>>>
>>> > This is not a rare occurence and i can see this 15-20 times in 6-7
>>> hours runs.
>>>
>>>
>>>
>>> This conforms to what I have observed in extended tests:
>>>
>>> G1's control of GC pause duration is limited to a rather narrow range.
>>>
>>> Even in that range, only new gen pauses do follow the pause time target
>>> well while "mixed" pauses tend to overshoot with considerable probability.
>>>
>>> Find attached a graphic which shows what I mean:
>>>
>>> -        New gen pauses (red) do follow the target very well from
>>> 150-800 millis
>>>
>>> -        With a target below 150 the actual new gen pauses remain flat
>>> at 150-180 millis
>>>
>>> -        "mixed" pauses (blue) do not follow the target well and some
>>> of them will always take 500-700 millis, whatever the target be
>>>
>>> -        There are other pauses (remark etc., green) which are short
>>> but completely independent of the target value
>>>
>>>
>>>
>>> The range with reasonable control depends on the heap size, the
>>> application and the hardware.
>>>
>>> I measured the graphic attached on a 6-core Xeon/2GHz server running
>>> Java 7u45 on CentOS/Linux with 64 GB RAM and a heap size of -Xms50g
>>> -Xmx50g.
>>>
>>> (For which the pause durations achieved are not bad at all!)
>>>
>>> The application was a synthetic benchmark described here:
>>> http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/
>>>
>>> With the same benchmark but only 10 GB of overall heap size on a Oracle
>>> T3 server running Java 7u45 on Solaris/SPARC I got a very similar kind of
>>> plot but the range with reasonable pause time control was now 60-180
>>> millis.
>>>
>>> Again the pause durations reached were by themselves not bad at all. But
>>> the idea of setting a pause time target and expecting it to be followed in
>>> a meaningful way is to some extent misleading.
>>>
>>>
>>>
>>> These results on G1's pause time control will be published soon on the
>>> blog of the link above.
>>>
>>>
>>>
>>> Best regards
>>>
>>> Andreas
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140227/0d377d48/attachment.html