<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#ffffff" text="#000000">
    Forgot Shane...<br>
    <br>
    Paul<br>
    <br>
    On 5/5/11 3:24 PM, Paul Hohensee wrote:
    <blockquote cite="mid:4DC2F969.10902@oracle.com" type="cite">
      <meta content="text/html; charset=ISO-8859-1"
        http-equiv="Content-Type">
      A long shot, but...<br>
      <br>
      If this is a recent version of Linux, perhaps you're running the
      Completely<br>
      Fair Scheduler and this is an artifact.  If so, try reverting to
      the old scheduler.<br>
      <br>
      Paul<br>
      <br>
      On 5/5/11 3:11 PM, Shane Cox wrote:
      <blockquote
        cite="mid:BANLkTikTOhmFNQeViDr0Lr=3wMAhfjCYvA@mail.gmail.com"
        type="cite">Jon,<br>
        <br>
        Thanks for the suggestion.  Interestingly enough, adding
        -XX:+AlwaysPreTouch increased the frequency of the problem by
        2x.  Although this didn't solve the problem, I think it offers a
        clue.<br>
        <br>
        Touching all of the memory pages on startup will leave fewer
        pages on the free list, no?  Could fewer pages on the free list
        trigger an OS behavior that interferes with an activity in the
        GC prologue?  All that comes to mind is the linux page scanner
        (recycling pages or pruning the page cache) - not sure how this
        could interfere with the GC prologue.<br>
        <br>
        According to the free command, swap usage remained 0 throughout
        the test ... so I'm not suspecting swap at this time.<br>
        <br>
        FYI, I ran the application/test on a Solaris 10 host and was
        unable to recreate the problem.  So the problem appears to be
        linux-specific (RHEL).<br>
        <br>
        <br>
        Any additional thoughts would be appreciated.<br>
        <br>
        Shane<br>
        <br>
        <br>
        <div class="gmail_quote">On Wed, Apr 27, 2011 at 12:34 AM, Jon
          Masamitsu <span dir="ltr"><<a moz-do-not-send="true"
              href="mailto:jon.masamitsu@oracle.com">jon.masamitsu@oracle.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt
            0.8ex; border-left: 1px solid rgb(204, 204, 204);
            padding-left: 1ex;">Shane,<br>
            <br>
            Have you tried running with -XX:+AlwaysPreTouch ?  We've
            occasionally<br>
            seen  intermittent<br>
            long pauses as the heap grows into newly committed pages.
             This flag<br>
            causes pages<br>
            to be touched as they are committed.  I don't know how this
            fits into<br>
            Ramki's<br>
            observation but  it might be worth a shot.<br>
            <font color="#888888"><br>
              Jon<br>
            </font>
            <div>
              <div class="h5"><br>
                <br>
                On 4/26/2011 12:40 PM, Y. S. Ramakrishna wrote:<br>
                > Well-spotted; it's a version of the same problem as
                near as<br>
                > i can tell. Please make sure to include a sizable
                GC log with<br>
                > your bug report (starting from VM start-up, so we
                can see if<br>
                > there is any clue in when the problem first starts
                during<br>
                > the life of the VM).<br>
                ><br>
                > thanks.<br>
                > -- ramki<br>
                ><br>
                > On 04/26/11 11:29, Shane Cox wrote:<br>
                >> Below is an example from a Remark.  Of the
                total 1.3 seconds of elapsed<br>
                >> time, 1.2 seconds is found between the first
                two timestamps.  However,<br>
                >> I'm not savvy enough to know whether this is
                the same problem or simply<br>
                >> the result of a long scavenge that occurs as
                part of the Remark.  Is<br>
                >> there any way to tell?<br>
                >><br>
                >> 2011-04-25T14:38:40.215-0400: 9466.139: [GC[YG
                occupancy: 712500 K<br>
                >> (943744 K)]9467.353: [Rescan (parallel) ,
                0.0106370 secs]9467.374: [weak<br>
                >> refs processing, 0.0159250 secs]9467.390:
                [class unloading, 0.0180420<br>
                >> secs]9467.408: [scrub symbol&  string
                tables, 0.0458500 secs] [1<br>
                >> CMS-remark: 12520949K(24117248K)]
                13233450K(25060992K), 0.1052950 secs]<br>
                >> [Times: user=0.13 sys=0.01, real=1.32 secs]<br>
                >><br>
                >><br>
                >> On Tue, Apr 26, 2011 at 2:17 PM, Y. S.
                Ramakrishna<br>
                >> <<a moz-do-not-send="true"
                  href="mailto:y.s.ramakrishna@oracle.com">y.s.ramakrishna@oracle.com</a><mailto:<a
                  moz-do-not-send="true"
                  href="mailto:y.s.ramakrishna@oracle.com">y.s.ramakrishna@oracle.com</a>>>

                 wrote:<br>
                >><br>
                >>      I had a quick look and all i could find
                was the GC prologue<br>
                >>      code (although i didn't look all that
                carefully).<br>
                >>      Bascially, GC is invoked, it prints this
                timestamp,<br>
                >>      does a bit of global book-keeping and some
                initialization,<br>
                >>      and then goes over each generation in the
                heap and<br>
                >>      says "i am going to do a collection, do
                whatever you need<br>
                >>      to do before i do the collection", and the
                generations each do a bit of<br>
                >>      book-keeping and any relevant
                initialization.<br>
                >><br>
                >>      The only thing i can see in the gc
                prologues other than a bit<br>
                >>      of lightweight book-keeping is some
                reporting code that could<br>
                >>      potentially be heavyweight. But you do not
                have any of those<br>
                >>      enabled in your option set, so there
                should not be anything<br>
                >>      obviously heavyweight going on.<br>
                >><br>
                >>      I'd suggest filing a bug under the
                category of<br>
                >>      jvm/hotspot/garbage_collector<br>
                >>      so someone in support can work with you to
                get this diagnosed...<br>
                >><br>
                >>      Three questions when you file the bug:<br>
                >>      (1) have you seen this start happening
                recently? (version?)<br>
                >>      (2) can you check if the longer pauses are
                "random" or do<br>
                >>         they always happen "during" CMS
                concurrent cycles or<br>
                >>         always outside of such cycles?<br>
                >>      (3) test set-up.<br>
                >><br>
                >>      -- ramki<br>
                >><br>
                >><br>
                >>      On 04/26/11 10:45, Y. S. Ramakrishna
                wrote:<br>
                >><br>
                >>          The pause is definitely in the
                beginning, before GC collection code<br>
                >>          itself runs; witness the timestamps:-<br>
                >><br>
                >>          2011-04-26T12:50:45.251-0400:
                2120.686: [GC 2123.075: [ParNew:<br>
                >>          943744K->79296K(943744K), 0.0559560
                secs]<br>
                >>          4989811K->4187520K(25060992K),
                0.0563970 secs] [Times: user=0.31<br>
                >>          sys=0.09, real=2.45 secs]<br>
                >><br>
                >>          The first timestamp is 2120.686 and
                the next one is 2123.075, so<br>
                >>          we have<br>
                >>          about 2.389 s between those two. If
                you add to that the GC time<br>
                >>          of 0.056 s,<br>
                >>          you get 2.445 which is close enough to
                the 2.45 s reported.<br>
                >><br>
                >>          So we need to figure out what happens
                in the JVM between those two<br>
                >>          time-stamps and we can at least bound
                the culprit.<br>
                >><br>
                >>          -- ramki<br>
                >><br>
                >>          On 04/26/11 10:36, Shane Cox wrote:<br>
                >><br>
                >>              Periodically, our Java app on
                Linux experiences a long Minor<br>
                >>              GC pause that cannot be accounted
                for by the GC time in the<br>
                >>              log file.  Instead, the pause is
                captured as "real" (wall<br>
                >>              clock) time and is observable in
                our application logs.  An<br>
                >>              example is below.  The GC
                completed in 56ms, but the<br>
                >>              application was paused for 2.45
                seconds.<br>
                >><br>
                >>              2011-04-26T12:50:41.722-0400:
                2117.157: [GC 2117.157:<br>
                >>              [ParNew:
                943439K->104832K(943744K), 0.0481790 secs]<br>
                >>              4909998K->4086751K(25060992K),
                0.0485110 secs] [Times:<br>
                >>              user=0.34 sys=0.03, real=0.04
                secs]<br>
                >>              2011-04-26T12:50:43.882-0400:
                2119.317: [GC 2119.317:<br>
                >>              [ParNew:
                942852K->104832K(943744K), 0.0738000 secs]<br>
                >>              4924772K->4150899K(25060992K),
                0.0740980 secs] [Times:<br>
                >>              user=0.45 sys=0.12, real=0.07
                secs]<br>
                >>              2011-04-26T12:50:45.251-0400:
                2120.686: [GC 2123.075:<br>
                >>              [ParNew:
                943744K->79296K(943744K), 0.0559560 secs]<br>
                >>              4989811K->4187520K(25060992K),
                0.0563970 secs] [Times:<br>
                >>              user=0.31 sys=0.09, *real=2.45
                secs]*<br>
                >>              2011-04-26T12:50:48.493-0400:
                2123.928: [GC 2123.928:<br>
                >>              [ParNew:
                918208K->81040K(943744K), 0.0396620 secs]<br>
                >>              5026432K->4189265K(25060992K),
                0.0400030 secs] [Times:<br>
                >>              user=0.32 sys=0.00, real=0.04
                secs]<br>
                >>              2011-04-26T12:50:51.010-0400:
                2126.445: [GC 2126.445:<br>
                >>              [ParNew:
                919952K->104832K(943744K), 0.0845070 secs]<br>
                >>              5028177K->4268050K(25060992K),
                0.0848300 secs] [Times:<br>
                >>              user=0.52 sys=0.11, real=0.09
                secs]<br>
                >><br>
                >><br>
                >>              Initially I suspected swapping,
                but according to the free<br>
                >>              command, 0 bytes of swap are in
                use.<br>
                >>               >free -m<br>
                >>                          total       used      
                free     shared<br>
                >>               buffers     cached<br>
                >>              Mem:         32168      28118    
                  4050          0<br>
                >>               824      12652<br>
                >>              -/+ buffers/cache:      14641    
                 17527<br>
                >>              Swap:         8191          0    
                  8191<br>
                >><br>
                >><br>
                >>              Next, I read about a problem
                relating to mprotect() on Linux<br>
                >>              that can be worked around with
                -XX:+UseMember.  I tried<br>
                >>              that, but I still see the same
                unexplainable pauses.<br>
                >><br>
                >><br>
                >>              Any suggestions/ideas?  We've
                upgraded to the latest JDK,<br>
                >>              but no luck.<br>
                >><br>
                >>              Thanks,<br>
                >>              Shane<br>
                >><br>
                >><br>
                >>              java version "1.6.0_25"<br>
                >>              Java(TM) SE Runtime Environment
                (build 1.6.0_25-b06)<br>
                >>              Java HotSpot(TM) 64-Bit Server VM
                (build 20.0-b11, mixed mode)<br>
                >><br>
                >><br>
                >>              Linux 2.6.18-128.el5 #1 SMP Wed
                Jan 21 08:45:05 EST 2009<br>
                >>              x86_64 x86_64 x86_64 GNU/Linux<br>
                >><br>
                >><br>
                >>              -verbose:gc  -Xms24g -Xmx24g
                -Xmn1g -Xss256k<br>
                >>              -XX:PermSize=256m
                -XX:MaxPermSize=256m<br>
                >>              -XX:+PrintTenuringDistribution
                -XX:+UseConcMarkSweepGC<br>
                >>              -XX:+CMSParallelRemarkEnabled<br>
                >>            
                 -XX:CMSInitiatingOccupancyFraction=70<br>
                >>              -XX:+CMSClassUnloadingEnabled
                -XX:+PrintGCDetails<br>
                >>              -XX:+PrintGCDateStamps
                -XX:+PrintHeapAtGC<br>
                >>              -XX:+HeapDumpOnOutOfMemoryError
                -XX:+UseCompressedStrings<br>
                >>              -XX:+UseMembar<br>
                >><br>
                >><br>
                >>            
 ------------------------------------------------------------------------<br>
                >><br>
                >>            
                 _______________________________________________<br>
                >>              hotspot-gc-use mailing list<br>
                >>              <a moz-do-not-send="true"
                  href="mailto:hotspot-gc-use@openjdk.java.net">hotspot-gc-use@openjdk.java.net</a><br>
                >>              <mailto:<a
                  moz-do-not-send="true"
                  href="mailto:hotspot-gc-use@openjdk.java.net">hotspot-gc-use@openjdk.java.net</a>><br>
                >>              <a moz-do-not-send="true"
                  href="http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use"
                  target="_blank">http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use</a><br>
                >><br>
                >>        
                 _______________________________________________<br>
                >>          hotspot-gc-use mailing list<br>
                >>          <a moz-do-not-send="true"
                  href="mailto:hotspot-gc-use@openjdk.java.net">hotspot-gc-use@openjdk.java.net</a><br>
                >>          <mailto:<a moz-do-not-send="true"
                  href="mailto:hotspot-gc-use@openjdk.java.net">hotspot-gc-use@openjdk.java.net</a>><br>
                >>          <a moz-do-not-send="true"
                  href="http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use"
                  target="_blank">http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use</a><br>
                >><br>
                >><br>
                > _______________________________________________<br>
                > hotspot-gc-use mailing list<br>
                > <a moz-do-not-send="true"
                  href="mailto:hotspot-gc-use@openjdk.java.net">hotspot-gc-use@openjdk.java.net</a><br>
                > <a moz-do-not-send="true"
                  href="http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use"
                  target="_blank">http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use</a><br>
                _______________________________________________<br>
                hotspot-gc-use mailing list<br>
                <a moz-do-not-send="true"
                  href="mailto:hotspot-gc-use@openjdk.java.net">hotspot-gc-use@openjdk.java.net</a><br>
                <a moz-do-not-send="true"
                  href="http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use"
                  target="_blank">http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use</a><br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
        <pre wrap=""><fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
hotspot-gc-use mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:hotspot-gc-use@openjdk.java.net">hotspot-gc-use@openjdk.java.net</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use">http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use</a>
</pre>
      </blockquote>
    </blockquote>
  </body>
</html>