Periodic long minor GC pauses

Thu May 5 19:24:58 UTC 2011

Forgot Shane...

Paul

On 5/5/11 3:24 PM, Paul Hohensee wrote:
> A long shot, but...
>
> If this is a recent version of Linux, perhaps you're running the 
> Completely
> Fair Scheduler and this is an artifact.  If so, try reverting to the 
> old scheduler.
>
> Paul
>
> On 5/5/11 3:11 PM, Shane Cox wrote:
>> Jon,
>>
>> Thanks for the suggestion.  Interestingly enough, adding 
>> -XX:+AlwaysPreTouch increased the frequency of the problem by 2x.  
>> Although this didn't solve the problem, I think it offers a clue.
>>
>> Touching all of the memory pages on startup will leave fewer pages on 
>> the free list, no?  Could fewer pages on the free list trigger an OS 
>> behavior that interferes with an activity in the GC prologue?  All 
>> that comes to mind is the linux page scanner (recycling pages or 
>> pruning the page cache) - not sure how this could interfere with the 
>> GC prologue.
>>
>> According to the free command, swap usage remained 0 throughout the 
>> test ... so I'm not suspecting swap at this time.
>>
>> FYI, I ran the application/test on a Solaris 10 host and was unable 
>> to recreate the problem.  So the problem appears to be linux-specific 
>> (RHEL).
>>
>>
>> Any additional thoughts would be appreciated.
>>
>> Shane
>>
>>
>> On Wed, Apr 27, 2011 at 12:34 AM, Jon Masamitsu 
>> <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>> wrote:
>>
>>     Shane,
>>
>>     Have you tried running with -XX:+AlwaysPreTouch ?  We've occasionally
>>     seen  intermittent
>>     long pauses as the heap grows into newly committed pages.  This flag
>>     causes pages
>>     to be touched as they are committed.  I don't know how this fits into
>>     Ramki's
>>     observation but  it might be worth a shot.
>>
>>     Jon
>>
>>
>>     On 4/26/2011 12:40 PM, Y. S. Ramakrishna wrote:
>>     > Well-spotted; it's a version of the same problem as near as
>>     > i can tell. Please make sure to include a sizable GC log with
>>     > your bug report (starting from VM start-up, so we can see if
>>     > there is any clue in when the problem first starts during
>>     > the life of the VM).
>>     >
>>     > thanks.
>>     > -- ramki
>>     >
>>     > On 04/26/11 11:29, Shane Cox wrote:
>>     >> Below is an example from a Remark.  Of the total 1.3 seconds
>>     of elapsed
>>     >> time, 1.2 seconds is found between the first two timestamps.
>>      However,
>>     >> I'm not savvy enough to know whether this is the same problem
>>     or simply
>>     >> the result of a long scavenge that occurs as part of the
>>     Remark.  Is
>>     >> there any way to tell?
>>     >>
>>     >> 2011-04-25T14:38:40.215-0400: 9466.139: [GC[YG occupancy: 712500 K
>>     >> (943744 K)]9467.353: [Rescan (parallel) , 0.0106370
>>     secs]9467.374: [weak
>>     >> refs processing, 0.0159250 secs]9467.390: [class unloading,
>>     0.0180420
>>     >> secs]9467.408: [scrub symbol&  string tables, 0.0458500 secs] [1
>>     >> CMS-remark: 12520949K(24117248K)] 13233450K(25060992K),
>>     0.1052950 secs]
>>     >> [Times: user=0.13 sys=0.01, real=1.32 secs]
>>     >>
>>     >>
>>     >> On Tue, Apr 26, 2011 at 2:17 PM, Y. S. Ramakrishna
>>     >> <y.s.ramakrishna at oracle.com
>>     <mailto:y.s.ramakrishna at oracle.com><mailto:y.s.ramakrishna at oracle.com
>>     <mailto:y.s.ramakrishna at oracle.com>>>  wrote:
>>     >>
>>     >>      I had a quick look and all i could find was the GC prologue
>>     >>      code (although i didn't look all that carefully).
>>     >>      Bascially, GC is invoked, it prints this timestamp,
>>     >>      does a bit of global book-keeping and some initialization,
>>     >>      and then goes over each generation in the heap and
>>     >>      says "i am going to do a collection, do whatever you need
>>     >>      to do before i do the collection", and the generations
>>     each do a bit of
>>     >>      book-keeping and any relevant initialization.
>>     >>
>>     >>      The only thing i can see in the gc prologues other than a bit
>>     >>      of lightweight book-keeping is some reporting code that could
>>     >>      potentially be heavyweight. But you do not have any of those
>>     >>      enabled in your option set, so there should not be anything
>>     >>      obviously heavyweight going on.
>>     >>
>>     >>      I'd suggest filing a bug under the category of
>>     >>      jvm/hotspot/garbage_collector
>>     >>      so someone in support can work with you to get this
>>     diagnosed...
>>     >>
>>     >>      Three questions when you file the bug:
>>     >>      (1) have you seen this start happening recently? (version?)
>>     >>      (2) can you check if the longer pauses are "random" or do
>>     >>         they always happen "during" CMS concurrent cycles or
>>     >>         always outside of such cycles?
>>     >>      (3) test set-up.
>>     >>
>>     >>      -- ramki
>>     >>
>>     >>
>>     >>      On 04/26/11 10:45, Y. S. Ramakrishna wrote:
>>     >>
>>     >>          The pause is definitely in the beginning, before GC
>>     collection code
>>     >>          itself runs; witness the timestamps:-
>>     >>
>>     >>          2011-04-26T12:50:45.251-0400: 2120.686: [GC 2123.075:
>>     [ParNew:
>>     >>          943744K->79296K(943744K), 0.0559560 secs]
>>     >>          4989811K->4187520K(25060992K), 0.0563970 secs]
>>     [Times: user=0.31
>>     >>          sys=0.09, real=2.45 secs]
>>     >>
>>     >>          The first timestamp is 2120.686 and the next one is
>>     2123.075, so
>>     >>          we have
>>     >>          about 2.389 s between those two. If you add to that
>>     the GC time
>>     >>          of 0.056 s,
>>     >>          you get 2.445 which is close enough to the 2.45 s
>>     reported.
>>     >>
>>     >>          So we need to figure out what happens in the JVM
>>     between those two
>>     >>          time-stamps and we can at least bound the culprit.
>>     >>
>>     >>          -- ramki
>>     >>
>>     >>          On 04/26/11 10:36, Shane Cox wrote:
>>     >>
>>     >>              Periodically, our Java app on Linux experiences a
>>     long Minor
>>     >>              GC pause that cannot be accounted for by the GC
>>     time in the
>>     >>              log file.  Instead, the pause is captured as
>>     "real" (wall
>>     >>              clock) time and is observable in our application
>>     logs.  An
>>     >>              example is below.  The GC completed in 56ms, but the
>>     >>              application was paused for 2.45 seconds.
>>     >>
>>     >>              2011-04-26T12:50:41.722-0400: 2117.157: [GC 2117.157:
>>     >>              [ParNew: 943439K->104832K(943744K), 0.0481790 secs]
>>     >>              4909998K->4086751K(25060992K), 0.0485110 secs]
>>     [Times:
>>     >>              user=0.34 sys=0.03, real=0.04 secs]
>>     >>              2011-04-26T12:50:43.882-0400: 2119.317: [GC 2119.317:
>>     >>              [ParNew: 942852K->104832K(943744K), 0.0738000 secs]
>>     >>              4924772K->4150899K(25060992K), 0.0740980 secs]
>>     [Times:
>>     >>              user=0.45 sys=0.12, real=0.07 secs]
>>     >>              2011-04-26T12:50:45.251-0400: 2120.686: [GC 2123.075:
>>     >>              [ParNew: 943744K->79296K(943744K), 0.0559560 secs]
>>     >>              4989811K->4187520K(25060992K), 0.0563970 secs]
>>     [Times:
>>     >>              user=0.31 sys=0.09, *real=2.45 secs]*
>>     >>              2011-04-26T12:50:48.493-0400: 2123.928: [GC 2123.928:
>>     >>              [ParNew: 918208K->81040K(943744K), 0.0396620 secs]
>>     >>              5026432K->4189265K(25060992K), 0.0400030 secs]
>>     [Times:
>>     >>              user=0.32 sys=0.00, real=0.04 secs]
>>     >>              2011-04-26T12:50:51.010-0400: 2126.445: [GC 2126.445:
>>     >>              [ParNew: 919952K->104832K(943744K), 0.0845070 secs]
>>     >>              5028177K->4268050K(25060992K), 0.0848300 secs]
>>     [Times:
>>     >>              user=0.52 sys=0.11, real=0.09 secs]
>>     >>
>>     >>
>>     >>              Initially I suspected swapping, but according to
>>     the free
>>     >>              command, 0 bytes of swap are in use.
>>     >> >free -m
>>     >>                          total       used       free     shared
>>     >>               buffers     cached
>>     >>              Mem:         32168      28118       4050          0
>>     >>               824      12652
>>     >>              -/+ buffers/cache:      14641      17527
>>     >>              Swap:         8191          0       8191
>>     >>
>>     >>
>>     >>              Next, I read about a problem relating to
>>     mprotect() on Linux
>>     >>              that can be worked around with -XX:+UseMember.  I
>>     tried
>>     >>              that, but I still see the same unexplainable pauses.
>>     >>
>>     >>
>>     >>              Any suggestions/ideas?  We've upgraded to the
>>     latest JDK,
>>     >>              but no luck.
>>     >>
>>     >>              Thanks,
>>     >>              Shane
>>     >>
>>     >>
>>     >>              java version "1.6.0_25"
>>     >>              Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
>>     >>              Java HotSpot(TM) 64-Bit Server VM (build
>>     20.0-b11, mixed mode)
>>     >>
>>     >>
>>     >>              Linux 2.6.18-128.el5 #1 SMP Wed Jan 21 08:45:05
>>     EST 2009
>>     >>              x86_64 x86_64 x86_64 GNU/Linux
>>     >>
>>     >>
>>     >>              -verbose:gc  -Xms24g -Xmx24g -Xmn1g -Xss256k
>>     >>              -XX:PermSize=256m -XX:MaxPermSize=256m
>>     >>              -XX:+PrintTenuringDistribution
>>     -XX:+UseConcMarkSweepGC
>>     >>              -XX:+CMSParallelRemarkEnabled
>>     >>              -XX:CMSInitiatingOccupancyFraction=70
>>     >>              -XX:+CMSClassUnloadingEnabled -XX:+PrintGCDetails
>>     >>              -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>     >>              -XX:+HeapDumpOnOutOfMemoryError
>>     -XX:+UseCompressedStrings
>>     >>              -XX:+UseMembar
>>     >>
>>     >>
>>     >>            
>>      ------------------------------------------------------------------------
>>     >>
>>     >>              _______________________________________________
>>     >>              hotspot-gc-use mailing list
>>     >> hotspot-gc-use at openjdk.java.net
>>     <mailto:hotspot-gc-use at openjdk.java.net>
>>     >> <mailto:hotspot-gc-use at openjdk.java.net
>>     <mailto:hotspot-gc-use at openjdk.java.net>>
>>     >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>     >>
>>     >>          _______________________________________________
>>     >>          hotspot-gc-use mailing list
>>     >> hotspot-gc-use at openjdk.java.net
>>     <mailto:hotspot-gc-use at openjdk.java.net>
>>     >> <mailto:hotspot-gc-use at openjdk.java.net
>>     <mailto:hotspot-gc-use at openjdk.java.net>>
>>     >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>     >>
>>     >>
>>     > _______________________________________________
>>     > hotspot-gc-use mailing list
>>     > hotspot-gc-use at openjdk.java.net
>>     <mailto:hotspot-gc-use at openjdk.java.net>
>>     > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>     _______________________________________________
>>     hotspot-gc-use mailing list
>>     hotspot-gc-use at openjdk.java.net
>>     <mailto:hotspot-gc-use at openjdk.java.net>
>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20110505/1e67b14c/attachment.htm>