Periodic long minor GC pauses

Thu May 5 19:24:25 UTC 2011

A long shot, but...

If this is a recent version of Linux, perhaps you're running the Completely
Fair Scheduler and this is an artifact.  If so, try reverting to the old 
scheduler.

Paul

On 5/5/11 3:11 PM, Shane Cox wrote:
> Jon,
>
> Thanks for the suggestion.  Interestingly enough, adding 
> -XX:+AlwaysPreTouch increased the frequency of the problem by 2x.  
> Although this didn't solve the problem, I think it offers a clue.
>
> Touching all of the memory pages on startup will leave fewer pages on 
> the free list, no?  Could fewer pages on the free list trigger an OS 
> behavior that interferes with an activity in the GC prologue?  All 
> that comes to mind is the linux page scanner (recycling pages or 
> pruning the page cache) - not sure how this could interfere with the 
> GC prologue.
>
> According to the free command, swap usage remained 0 throughout the 
> test ... so I'm not suspecting swap at this time.
>
> FYI, I ran the application/test on a Solaris 10 host and was unable to 
> recreate the problem.  So the problem appears to be linux-specific (RHEL).
>
>
> Any additional thoughts would be appreciated.
>
> Shane
>
>
> On Wed, Apr 27, 2011 at 12:34 AM, Jon Masamitsu 
> <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>> wrote:
>
>     Shane,
>
>     Have you tried running with -XX:+AlwaysPreTouch ?  We've occasionally
>     seen  intermittent
>     long pauses as the heap grows into newly committed pages.  This flag
>     causes pages
>     to be touched as they are committed.  I don't know how this fits into
>     Ramki's
>     observation but  it might be worth a shot.
>
>     Jon
>
>
>     On 4/26/2011 12:40 PM, Y. S. Ramakrishna wrote:
>     > Well-spotted; it's a version of the same problem as near as
>     > i can tell. Please make sure to include a sizable GC log with
>     > your bug report (starting from VM start-up, so we can see if
>     > there is any clue in when the problem first starts during
>     > the life of the VM).
>     >
>     > thanks.
>     > -- ramki
>     >
>     > On 04/26/11 11:29, Shane Cox wrote:
>     >> Below is an example from a Remark.  Of the total 1.3 seconds of
>     elapsed
>     >> time, 1.2 seconds is found between the first two timestamps.
>      However,
>     >> I'm not savvy enough to know whether this is the same problem
>     or simply
>     >> the result of a long scavenge that occurs as part of the
>     Remark.  Is
>     >> there any way to tell?
>     >>
>     >> 2011-04-25T14:38:40.215-0400: 9466.139: [GC[YG occupancy: 712500 K
>     >> (943744 K)]9467.353: [Rescan (parallel) , 0.0106370
>     secs]9467.374: [weak
>     >> refs processing, 0.0159250 secs]9467.390: [class unloading,
>     0.0180420
>     >> secs]9467.408: [scrub symbol&  string tables, 0.0458500 secs] [1
>     >> CMS-remark: 12520949K(24117248K)] 13233450K(25060992K),
>     0.1052950 secs]
>     >> [Times: user=0.13 sys=0.01, real=1.32 secs]
>     >>
>     >>
>     >> On Tue, Apr 26, 2011 at 2:17 PM, Y. S. Ramakrishna
>     >> <y.s.ramakrishna at oracle.com
>     <mailto:y.s.ramakrishna at oracle.com><mailto:y.s.ramakrishna at oracle.com
>     <mailto:y.s.ramakrishna at oracle.com>>>  wrote:
>     >>
>     >>      I had a quick look and all i could find was the GC prologue
>     >>      code (although i didn't look all that carefully).
>     >>      Bascially, GC is invoked, it prints this timestamp,
>     >>      does a bit of global book-keeping and some initialization,
>     >>      and then goes over each generation in the heap and
>     >>      says "i am going to do a collection, do whatever you need
>     >>      to do before i do the collection", and the generations
>     each do a bit of
>     >>      book-keeping and any relevant initialization.
>     >>
>     >>      The only thing i can see in the gc prologues other than a bit
>     >>      of lightweight book-keeping is some reporting code that could
>     >>      potentially be heavyweight. But you do not have any of those
>     >>      enabled in your option set, so there should not be anything
>     >>      obviously heavyweight going on.
>     >>
>     >>      I'd suggest filing a bug under the category of
>     >>      jvm/hotspot/garbage_collector
>     >>      so someone in support can work with you to get this
>     diagnosed...
>     >>
>     >>      Three questions when you file the bug:
>     >>      (1) have you seen this start happening recently? (version?)
>     >>      (2) can you check if the longer pauses are "random" or do
>     >>         they always happen "during" CMS concurrent cycles or
>     >>         always outside of such cycles?
>     >>      (3) test set-up.
>     >>
>     >>      -- ramki
>     >>
>     >>
>     >>      On 04/26/11 10:45, Y. S. Ramakrishna wrote:
>     >>
>     >>          The pause is definitely in the beginning, before GC
>     collection code
>     >>          itself runs; witness the timestamps:-
>     >>
>     >>          2011-04-26T12:50:45.251-0400: 2120.686: [GC 2123.075:
>     [ParNew:
>     >>          943744K->79296K(943744K), 0.0559560 secs]
>     >>          4989811K->4187520K(25060992K), 0.0563970 secs] [Times:
>     user=0.31
>     >>          sys=0.09, real=2.45 secs]
>     >>
>     >>          The first timestamp is 2120.686 and the next one is
>     2123.075, so
>     >>          we have
>     >>          about 2.389 s between those two. If you add to that
>     the GC time
>     >>          of 0.056 s,
>     >>          you get 2.445 which is close enough to the 2.45 s
>     reported.
>     >>
>     >>          So we need to figure out what happens in the JVM
>     between those two
>     >>          time-stamps and we can at least bound the culprit.
>     >>
>     >>          -- ramki
>     >>
>     >>          On 04/26/11 10:36, Shane Cox wrote:
>     >>
>     >>              Periodically, our Java app on Linux experiences a
>     long Minor
>     >>              GC pause that cannot be accounted for by the GC
>     time in the
>     >>              log file.  Instead, the pause is captured as
>     "real" (wall
>     >>              clock) time and is observable in our application
>     logs.  An
>     >>              example is below.  The GC completed in 56ms, but the
>     >>              application was paused for 2.45 seconds.
>     >>
>     >>              2011-04-26T12:50:41.722-0400: 2117.157: [GC 2117.157:
>     >>              [ParNew: 943439K->104832K(943744K), 0.0481790 secs]
>     >>              4909998K->4086751K(25060992K), 0.0485110 secs] [Times:
>     >>              user=0.34 sys=0.03, real=0.04 secs]
>     >>              2011-04-26T12:50:43.882-0400: 2119.317: [GC 2119.317:
>     >>              [ParNew: 942852K->104832K(943744K), 0.0738000 secs]
>     >>              4924772K->4150899K(25060992K), 0.0740980 secs] [Times:
>     >>              user=0.45 sys=0.12, real=0.07 secs]
>     >>              2011-04-26T12:50:45.251-0400: 2120.686: [GC 2123.075:
>     >>              [ParNew: 943744K->79296K(943744K), 0.0559560 secs]
>     >>              4989811K->4187520K(25060992K), 0.0563970 secs] [Times:
>     >>              user=0.31 sys=0.09, *real=2.45 secs]*
>     >>              2011-04-26T12:50:48.493-0400: 2123.928: [GC 2123.928:
>     >>              [ParNew: 918208K->81040K(943744K), 0.0396620 secs]
>     >>              5026432K->4189265K(25060992K), 0.0400030 secs] [Times:
>     >>              user=0.32 sys=0.00, real=0.04 secs]
>     >>              2011-04-26T12:50:51.010-0400: 2126.445: [GC 2126.445:
>     >>              [ParNew: 919952K->104832K(943744K), 0.0845070 secs]
>     >>              5028177K->4268050K(25060992K), 0.0848300 secs] [Times:
>     >>              user=0.52 sys=0.11, real=0.09 secs]
>     >>
>     >>
>     >>              Initially I suspected swapping, but according to
>     the free
>     >>              command, 0 bytes of swap are in use.
>     >> >free -m
>     >>                          total       used       free     shared
>     >>               buffers     cached
>     >>              Mem:         32168      28118       4050          0
>     >>               824      12652
>     >>              -/+ buffers/cache:      14641      17527
>     >>              Swap:         8191          0       8191
>     >>
>     >>
>     >>              Next, I read about a problem relating to
>     mprotect() on Linux
>     >>              that can be worked around with -XX:+UseMember.  I
>     tried
>     >>              that, but I still see the same unexplainable pauses.
>     >>
>     >>
>     >>              Any suggestions/ideas?  We've upgraded to the
>     latest JDK,
>     >>              but no luck.
>     >>
>     >>              Thanks,
>     >>              Shane
>     >>
>     >>
>     >>              java version "1.6.0_25"
>     >>              Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
>     >>              Java HotSpot(TM) 64-Bit Server VM (build 20.0-b11,
>     mixed mode)
>     >>
>     >>
>     >>              Linux 2.6.18-128.el5 #1 SMP Wed Jan 21 08:45:05
>     EST 2009
>     >>              x86_64 x86_64 x86_64 GNU/Linux
>     >>
>     >>
>     >>              -verbose:gc  -Xms24g -Xmx24g -Xmn1g -Xss256k
>     >>              -XX:PermSize=256m -XX:MaxPermSize=256m
>     >>              -XX:+PrintTenuringDistribution -XX:+UseConcMarkSweepGC
>     >>              -XX:+CMSParallelRemarkEnabled
>     >>              -XX:CMSInitiatingOccupancyFraction=70
>     >>              -XX:+CMSClassUnloadingEnabled -XX:+PrintGCDetails
>     >>              -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>     >>              -XX:+HeapDumpOnOutOfMemoryError
>     -XX:+UseCompressedStrings
>     >>              -XX:+UseMembar
>     >>
>     >>
>     >>            
>      ------------------------------------------------------------------------
>     >>
>     >>              _______________________________________________
>     >>              hotspot-gc-use mailing list
>     >> hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     >> <mailto:hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>>
>     >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>     >>
>     >>          _______________________________________________
>     >>          hotspot-gc-use mailing list
>     >> hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     >> <mailto:hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>>
>     >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>     >>
>     >>
>     > _______________________________________________
>     > hotspot-gc-use mailing list
>     > hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>     _______________________________________________
>     hotspot-gc-use mailing list
>     hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20110505/e1b945aa/attachment.htm>