Periodic long minor GC pauses

Shane Cox shane.cox at gmail.com
Thu May 5 19:11:45 UTC 2011


Jon,

Thanks for the suggestion.  Interestingly enough, adding -XX:+AlwaysPreTouch
increased the frequency of the problem by 2x.  Although this didn't solve
the problem, I think it offers a clue.

Touching all of the memory pages on startup will leave fewer pages on the
free list, no?  Could fewer pages on the free list trigger an OS behavior
that interferes with an activity in the GC prologue?  All that comes to mind
is the linux page scanner (recycling pages or pruning the page cache) - not
sure how this could interfere with the GC prologue.

According to the free command, swap usage remained 0 throughout the test ...
so I'm not suspecting swap at this time.

FYI, I ran the application/test on a Solaris 10 host and was unable to
recreate the problem.  So the problem appears to be linux-specific (RHEL).


Any additional thoughts would be appreciated.

Shane


On Wed, Apr 27, 2011 at 12:34 AM, Jon Masamitsu <jon.masamitsu at oracle.com>wrote:

> Shane,
>
> Have you tried running with -XX:+AlwaysPreTouch ?  We've occasionally
> seen  intermittent
> long pauses as the heap grows into newly committed pages.  This flag
> causes pages
> to be touched as they are committed.  I don't know how this fits into
> Ramki's
> observation but  it might be worth a shot.
>
> Jon
>
>
> On 4/26/2011 12:40 PM, Y. S. Ramakrishna wrote:
> > Well-spotted; it's a version of the same problem as near as
> > i can tell. Please make sure to include a sizable GC log with
> > your bug report (starting from VM start-up, so we can see if
> > there is any clue in when the problem first starts during
> > the life of the VM).
> >
> > thanks.
> > -- ramki
> >
> > On 04/26/11 11:29, Shane Cox wrote:
> >> Below is an example from a Remark.  Of the total 1.3 seconds of elapsed
> >> time, 1.2 seconds is found between the first two timestamps.  However,
> >> I'm not savvy enough to know whether this is the same problem or simply
> >> the result of a long scavenge that occurs as part of the Remark.  Is
> >> there any way to tell?
> >>
> >> 2011-04-25T14:38:40.215-0400: 9466.139: [GC[YG occupancy: 712500 K
> >> (943744 K)]9467.353: [Rescan (parallel) , 0.0106370 secs]9467.374: [weak
> >> refs processing, 0.0159250 secs]9467.390: [class unloading, 0.0180420
> >> secs]9467.408: [scrub symbol&  string tables, 0.0458500 secs] [1
> >> CMS-remark: 12520949K(24117248K)] 13233450K(25060992K), 0.1052950 secs]
> >> [Times: user=0.13 sys=0.01, real=1.32 secs]
> >>
> >>
> >> On Tue, Apr 26, 2011 at 2:17 PM, Y. S. Ramakrishna
> >> <y.s.ramakrishna at oracle.com<mailto:y.s.ramakrishna at oracle.com>>  wrote:
> >>
> >>      I had a quick look and all i could find was the GC prologue
> >>      code (although i didn't look all that carefully).
> >>      Bascially, GC is invoked, it prints this timestamp,
> >>      does a bit of global book-keeping and some initialization,
> >>      and then goes over each generation in the heap and
> >>      says "i am going to do a collection, do whatever you need
> >>      to do before i do the collection", and the generations each do a
> bit of
> >>      book-keeping and any relevant initialization.
> >>
> >>      The only thing i can see in the gc prologues other than a bit
> >>      of lightweight book-keeping is some reporting code that could
> >>      potentially be heavyweight. But you do not have any of those
> >>      enabled in your option set, so there should not be anything
> >>      obviously heavyweight going on.
> >>
> >>      I'd suggest filing a bug under the category of
> >>      jvm/hotspot/garbage_collector
> >>      so someone in support can work with you to get this diagnosed...
> >>
> >>      Three questions when you file the bug:
> >>      (1) have you seen this start happening recently? (version?)
> >>      (2) can you check if the longer pauses are "random" or do
> >>         they always happen "during" CMS concurrent cycles or
> >>         always outside of such cycles?
> >>      (3) test set-up.
> >>
> >>      -- ramki
> >>
> >>
> >>      On 04/26/11 10:45, Y. S. Ramakrishna wrote:
> >>
> >>          The pause is definitely in the beginning, before GC collection
> code
> >>          itself runs; witness the timestamps:-
> >>
> >>          2011-04-26T12:50:45.251-0400: 2120.686: [GC 2123.075: [ParNew:
> >>          943744K->79296K(943744K), 0.0559560 secs]
> >>          4989811K->4187520K(25060992K), 0.0563970 secs] [Times:
> user=0.31
> >>          sys=0.09, real=2.45 secs]
> >>
> >>          The first timestamp is 2120.686 and the next one is 2123.075,
> so
> >>          we have
> >>          about 2.389 s between those two. If you add to that the GC time
> >>          of 0.056 s,
> >>          you get 2.445 which is close enough to the 2.45 s reported.
> >>
> >>          So we need to figure out what happens in the JVM between those
> two
> >>          time-stamps and we can at least bound the culprit.
> >>
> >>          -- ramki
> >>
> >>          On 04/26/11 10:36, Shane Cox wrote:
> >>
> >>              Periodically, our Java app on Linux experiences a long
> Minor
> >>              GC pause that cannot be accounted for by the GC time in the
> >>              log file.  Instead, the pause is captured as "real" (wall
> >>              clock) time and is observable in our application logs.  An
> >>              example is below.  The GC completed in 56ms, but the
> >>              application was paused for 2.45 seconds.
> >>
> >>              2011-04-26T12:50:41.722-0400: 2117.157: [GC 2117.157:
> >>              [ParNew: 943439K->104832K(943744K), 0.0481790 secs]
> >>              4909998K->4086751K(25060992K), 0.0485110 secs] [Times:
> >>              user=0.34 sys=0.03, real=0.04 secs]
> >>              2011-04-26T12:50:43.882-0400: 2119.317: [GC 2119.317:
> >>              [ParNew: 942852K->104832K(943744K), 0.0738000 secs]
> >>              4924772K->4150899K(25060992K), 0.0740980 secs] [Times:
> >>              user=0.45 sys=0.12, real=0.07 secs]
> >>              2011-04-26T12:50:45.251-0400: 2120.686: [GC 2123.075:
> >>              [ParNew: 943744K->79296K(943744K), 0.0559560 secs]
> >>              4989811K->4187520K(25060992K), 0.0563970 secs] [Times:
> >>              user=0.31 sys=0.09, *real=2.45 secs]*
> >>              2011-04-26T12:50:48.493-0400: 2123.928: [GC 2123.928:
> >>              [ParNew: 918208K->81040K(943744K), 0.0396620 secs]
> >>              5026432K->4189265K(25060992K), 0.0400030 secs] [Times:
> >>              user=0.32 sys=0.00, real=0.04 secs]
> >>              2011-04-26T12:50:51.010-0400: 2126.445: [GC 2126.445:
> >>              [ParNew: 919952K->104832K(943744K), 0.0845070 secs]
> >>              5028177K->4268050K(25060992K), 0.0848300 secs] [Times:
> >>              user=0.52 sys=0.11, real=0.09 secs]
> >>
> >>
> >>              Initially I suspected swapping, but according to the free
> >>              command, 0 bytes of swap are in use.
> >>               >free -m
> >>                          total       used       free     shared
> >>               buffers     cached
> >>              Mem:         32168      28118       4050          0
> >>               824      12652
> >>              -/+ buffers/cache:      14641      17527
> >>              Swap:         8191          0       8191
> >>
> >>
> >>              Next, I read about a problem relating to mprotect() on
> Linux
> >>              that can be worked around with -XX:+UseMember.  I tried
> >>              that, but I still see the same unexplainable pauses.
> >>
> >>
> >>              Any suggestions/ideas?  We've upgraded to the latest JDK,
> >>              but no luck.
> >>
> >>              Thanks,
> >>              Shane
> >>
> >>
> >>              java version "1.6.0_25"
> >>              Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
> >>              Java HotSpot(TM) 64-Bit Server VM (build 20.0-b11, mixed
> mode)
> >>
> >>
> >>              Linux 2.6.18-128.el5 #1 SMP Wed Jan 21 08:45:05 EST 2009
> >>              x86_64 x86_64 x86_64 GNU/Linux
> >>
> >>
> >>              -verbose:gc  -Xms24g -Xmx24g -Xmn1g -Xss256k
> >>              -XX:PermSize=256m -XX:MaxPermSize=256m
> >>              -XX:+PrintTenuringDistribution -XX:+UseConcMarkSweepGC
> >>              -XX:+CMSParallelRemarkEnabled
> >>              -XX:CMSInitiatingOccupancyFraction=70
> >>              -XX:+CMSClassUnloadingEnabled -XX:+PrintGCDetails
> >>              -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
> >>              -XX:+HeapDumpOnOutOfMemoryError -XX:+UseCompressedStrings
> >>              -XX:+UseMembar
> >>
> >>
> >>
>  ------------------------------------------------------------------------
> >>
> >>              _______________________________________________
> >>              hotspot-gc-use mailing list
> >>              hotspot-gc-use at openjdk.java.net
> >>              <mailto:hotspot-gc-use at openjdk.java.net>
> >>
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >>
> >>          _______________________________________________
> >>          hotspot-gc-use mailing list
> >>          hotspot-gc-use at openjdk.java.net
> >>          <mailto:hotspot-gc-use at openjdk.java.net>
> >>          http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >>
> >>
> > _______________________________________________
> > hotspot-gc-use mailing list
> > hotspot-gc-use at openjdk.java.net
> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20110505/20306967/attachment.htm>
-------------- next part --------------
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


More information about the hotspot-gc-dev mailing list