Periodic long minor GC pauses
Paul Hohensee
paul.hohensee at oracle.com
Thu May 5 19:24:58 UTC 2011
Forgot Shane...
Paul
On 5/5/11 3:24 PM, Paul Hohensee wrote:
> A long shot, but...
>
> If this is a recent version of Linux, perhaps you're running the
> Completely
> Fair Scheduler and this is an artifact. If so, try reverting to the
> old scheduler.
>
> Paul
>
> On 5/5/11 3:11 PM, Shane Cox wrote:
>> Jon,
>>
>> Thanks for the suggestion. Interestingly enough, adding
>> -XX:+AlwaysPreTouch increased the frequency of the problem by 2x.
>> Although this didn't solve the problem, I think it offers a clue.
>>
>> Touching all of the memory pages on startup will leave fewer pages on
>> the free list, no? Could fewer pages on the free list trigger an OS
>> behavior that interferes with an activity in the GC prologue? All
>> that comes to mind is the linux page scanner (recycling pages or
>> pruning the page cache) - not sure how this could interfere with the
>> GC prologue.
>>
>> According to the free command, swap usage remained 0 throughout the
>> test ... so I'm not suspecting swap at this time.
>>
>> FYI, I ran the application/test on a Solaris 10 host and was unable
>> to recreate the problem. So the problem appears to be linux-specific
>> (RHEL).
>>
>>
>> Any additional thoughts would be appreciated.
>>
>> Shane
>>
>>
>> On Wed, Apr 27, 2011 at 12:34 AM, Jon Masamitsu
>> <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>> wrote:
>>
>> Shane,
>>
>> Have you tried running with -XX:+AlwaysPreTouch ? We've occasionally
>> seen intermittent
>> long pauses as the heap grows into newly committed pages. This flag
>> causes pages
>> to be touched as they are committed. I don't know how this fits into
>> Ramki's
>> observation but it might be worth a shot.
>>
>> Jon
>>
>>
>> On 4/26/2011 12:40 PM, Y. S. Ramakrishna wrote:
>> > Well-spotted; it's a version of the same problem as near as
>> > i can tell. Please make sure to include a sizable GC log with
>> > your bug report (starting from VM start-up, so we can see if
>> > there is any clue in when the problem first starts during
>> > the life of the VM).
>> >
>> > thanks.
>> > -- ramki
>> >
>> > On 04/26/11 11:29, Shane Cox wrote:
>> >> Below is an example from a Remark. Of the total 1.3 seconds
>> of elapsed
>> >> time, 1.2 seconds is found between the first two timestamps.
>> However,
>> >> I'm not savvy enough to know whether this is the same problem
>> or simply
>> >> the result of a long scavenge that occurs as part of the
>> Remark. Is
>> >> there any way to tell?
>> >>
>> >> 2011-04-25T14:38:40.215-0400: 9466.139: [GC[YG occupancy: 712500 K
>> >> (943744 K)]9467.353: [Rescan (parallel) , 0.0106370
>> secs]9467.374: [weak
>> >> refs processing, 0.0159250 secs]9467.390: [class unloading,
>> 0.0180420
>> >> secs]9467.408: [scrub symbol& string tables, 0.0458500 secs] [1
>> >> CMS-remark: 12520949K(24117248K)] 13233450K(25060992K),
>> 0.1052950 secs]
>> >> [Times: user=0.13 sys=0.01, real=1.32 secs]
>> >>
>> >>
>> >> On Tue, Apr 26, 2011 at 2:17 PM, Y. S. Ramakrishna
>> >> <y.s.ramakrishna at oracle.com
>> <mailto:y.s.ramakrishna at oracle.com><mailto:y.s.ramakrishna at oracle.com
>> <mailto:y.s.ramakrishna at oracle.com>>> wrote:
>> >>
>> >> I had a quick look and all i could find was the GC prologue
>> >> code (although i didn't look all that carefully).
>> >> Bascially, GC is invoked, it prints this timestamp,
>> >> does a bit of global book-keeping and some initialization,
>> >> and then goes over each generation in the heap and
>> >> says "i am going to do a collection, do whatever you need
>> >> to do before i do the collection", and the generations
>> each do a bit of
>> >> book-keeping and any relevant initialization.
>> >>
>> >> The only thing i can see in the gc prologues other than a bit
>> >> of lightweight book-keeping is some reporting code that could
>> >> potentially be heavyweight. But you do not have any of those
>> >> enabled in your option set, so there should not be anything
>> >> obviously heavyweight going on.
>> >>
>> >> I'd suggest filing a bug under the category of
>> >> jvm/hotspot/garbage_collector
>> >> so someone in support can work with you to get this
>> diagnosed...
>> >>
>> >> Three questions when you file the bug:
>> >> (1) have you seen this start happening recently? (version?)
>> >> (2) can you check if the longer pauses are "random" or do
>> >> they always happen "during" CMS concurrent cycles or
>> >> always outside of such cycles?
>> >> (3) test set-up.
>> >>
>> >> -- ramki
>> >>
>> >>
>> >> On 04/26/11 10:45, Y. S. Ramakrishna wrote:
>> >>
>> >> The pause is definitely in the beginning, before GC
>> collection code
>> >> itself runs; witness the timestamps:-
>> >>
>> >> 2011-04-26T12:50:45.251-0400: 2120.686: [GC 2123.075:
>> [ParNew:
>> >> 943744K->79296K(943744K), 0.0559560 secs]
>> >> 4989811K->4187520K(25060992K), 0.0563970 secs]
>> [Times: user=0.31
>> >> sys=0.09, real=2.45 secs]
>> >>
>> >> The first timestamp is 2120.686 and the next one is
>> 2123.075, so
>> >> we have
>> >> about 2.389 s between those two. If you add to that
>> the GC time
>> >> of 0.056 s,
>> >> you get 2.445 which is close enough to the 2.45 s
>> reported.
>> >>
>> >> So we need to figure out what happens in the JVM
>> between those two
>> >> time-stamps and we can at least bound the culprit.
>> >>
>> >> -- ramki
>> >>
>> >> On 04/26/11 10:36, Shane Cox wrote:
>> >>
>> >> Periodically, our Java app on Linux experiences a
>> long Minor
>> >> GC pause that cannot be accounted for by the GC
>> time in the
>> >> log file. Instead, the pause is captured as
>> "real" (wall
>> >> clock) time and is observable in our application
>> logs. An
>> >> example is below. The GC completed in 56ms, but the
>> >> application was paused for 2.45 seconds.
>> >>
>> >> 2011-04-26T12:50:41.722-0400: 2117.157: [GC 2117.157:
>> >> [ParNew: 943439K->104832K(943744K), 0.0481790 secs]
>> >> 4909998K->4086751K(25060992K), 0.0485110 secs]
>> [Times:
>> >> user=0.34 sys=0.03, real=0.04 secs]
>> >> 2011-04-26T12:50:43.882-0400: 2119.317: [GC 2119.317:
>> >> [ParNew: 942852K->104832K(943744K), 0.0738000 secs]
>> >> 4924772K->4150899K(25060992K), 0.0740980 secs]
>> [Times:
>> >> user=0.45 sys=0.12, real=0.07 secs]
>> >> 2011-04-26T12:50:45.251-0400: 2120.686: [GC 2123.075:
>> >> [ParNew: 943744K->79296K(943744K), 0.0559560 secs]
>> >> 4989811K->4187520K(25060992K), 0.0563970 secs]
>> [Times:
>> >> user=0.31 sys=0.09, *real=2.45 secs]*
>> >> 2011-04-26T12:50:48.493-0400: 2123.928: [GC 2123.928:
>> >> [ParNew: 918208K->81040K(943744K), 0.0396620 secs]
>> >> 5026432K->4189265K(25060992K), 0.0400030 secs]
>> [Times:
>> >> user=0.32 sys=0.00, real=0.04 secs]
>> >> 2011-04-26T12:50:51.010-0400: 2126.445: [GC 2126.445:
>> >> [ParNew: 919952K->104832K(943744K), 0.0845070 secs]
>> >> 5028177K->4268050K(25060992K), 0.0848300 secs]
>> [Times:
>> >> user=0.52 sys=0.11, real=0.09 secs]
>> >>
>> >>
>> >> Initially I suspected swapping, but according to
>> the free
>> >> command, 0 bytes of swap are in use.
>> >> >free -m
>> >> total used free shared
>> >> buffers cached
>> >> Mem: 32168 28118 4050 0
>> >> 824 12652
>> >> -/+ buffers/cache: 14641 17527
>> >> Swap: 8191 0 8191
>> >>
>> >>
>> >> Next, I read about a problem relating to
>> mprotect() on Linux
>> >> that can be worked around with -XX:+UseMember. I
>> tried
>> >> that, but I still see the same unexplainable pauses.
>> >>
>> >>
>> >> Any suggestions/ideas? We've upgraded to the
>> latest JDK,
>> >> but no luck.
>> >>
>> >> Thanks,
>> >> Shane
>> >>
>> >>
>> >> java version "1.6.0_25"
>> >> Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
>> >> Java HotSpot(TM) 64-Bit Server VM (build
>> 20.0-b11, mixed mode)
>> >>
>> >>
>> >> Linux 2.6.18-128.el5 #1 SMP Wed Jan 21 08:45:05
>> EST 2009
>> >> x86_64 x86_64 x86_64 GNU/Linux
>> >>
>> >>
>> >> -verbose:gc -Xms24g -Xmx24g -Xmn1g -Xss256k
>> >> -XX:PermSize=256m -XX:MaxPermSize=256m
>> >> -XX:+PrintTenuringDistribution
>> -XX:+UseConcMarkSweepGC
>> >> -XX:+CMSParallelRemarkEnabled
>> >> -XX:CMSInitiatingOccupancyFraction=70
>> >> -XX:+CMSClassUnloadingEnabled -XX:+PrintGCDetails
>> >> -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>> >> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:+UseCompressedStrings
>> >> -XX:+UseMembar
>> >>
>> >>
>> >>
>> ------------------------------------------------------------------------
>> >>
>> >> _______________________________________________
>> >> hotspot-gc-use mailing list
>> >> hotspot-gc-use at openjdk.java.net
>> <mailto:hotspot-gc-use at openjdk.java.net>
>> >> <mailto:hotspot-gc-use at openjdk.java.net
>> <mailto:hotspot-gc-use at openjdk.java.net>>
>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >>
>> >> _______________________________________________
>> >> hotspot-gc-use mailing list
>> >> hotspot-gc-use at openjdk.java.net
>> <mailto:hotspot-gc-use at openjdk.java.net>
>> >> <mailto:hotspot-gc-use at openjdk.java.net
>> <mailto:hotspot-gc-use at openjdk.java.net>>
>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >>
>> >>
>> > _______________________________________________
>> > hotspot-gc-use mailing list
>> > hotspot-gc-use at openjdk.java.net
>> <mailto:hotspot-gc-use at openjdk.java.net>
>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> <mailto:hotspot-gc-use at openjdk.java.net>
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20110505/1e67b14c/attachment.htm>
More information about the hotspot-gc-dev
mailing list