<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">Ok, thanks…<div><br></div><div>The 3 before the 1129 and the 6 before the 11459 are the node numbers</div><div><br></div><div>I’ll dig around in the source for any way of finding out what the cause of safe points are (I’m not aware of a better -XX: option) … frankly I’ll probably just report this in the user hotspot-gc-use thread (it isn’t causing any real issues) and see if other people report it too.</div><div><br></div><div>Thanks,</div><div><br></div><div>Graham</div><div><br><div><div>On Jun 19, 2014, at 12:23 PM, Jon Masamitsu <<a href="mailto:jon.masamitsu@oracle.com">jon.masamitsu@oracle.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">

  
    <meta content="text/html; charset=windows-1252" http-equiv="Content-Type">

  
  <div text="#000000" bgcolor="#FFFFFF">

    Graham,<br>

    <br>

    I don't have any guesses about what is causing the difference in<br>

    the number of safepoints.  As you note the messages you are<br>

    counting are not specifically GC pauses.    The differences you<br>

    are seeing are huge (1129 vs 11459, which, by the way, I don't <br>

    see those numbers in the tables; did you mean 31129 vs 611459?).<br>

    If those numbers are really GC related I would expect some<br>

    dramatic effect on the application behavior.<br>

    <br>

    Sometimes better GC means applications run faster so<br>

    maybe more work is getting done in nodes 4-6.<br>

    The scale of the difference is surprising though.<br>

    <br>

    Jon<br>

    <br>

    <div class="moz-cite-prefix">On 6/18/2014 8:19 PM, graham sanderson

      wrote:<br>

    </div>

    <blockquote cite="mid:D17A9AC7-A7D0-4AE5-B8F0-2917F7480244@vast.com" type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=windows-1252">

      The options are working great and as expected (faster initial

      mark, and no long pauses after abortable preclean timeout). One

      weird thing though which I’m curious about: I’m showing some data

      for six JVMs (calling them nodes - they are on separate machines)

      <div><br>

      </div>

      <div>all with :

        <div><br>

        </div>

        <div>Linux version 2.6.32-431.3.1.el6.x86_64 (<a moz-do-not-send="true" href="mailto:mockbuild@c6b10.bsys.dev.centos.org">mockbuild@c6b10.bsys.dev.centos.org</a>)

          (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP

          Fri Jan 3 21:39:27 UTC 2014</div>

        <div>JDK 1.7.0_60-b19</div>

        <div>16 gig old gen</div>

        <div>8 gig new (6.4 gig eden)</div>

        <div>-XX:+UseParNewGC</div>

        <div>-XX:+UseConcMarkSweepGC</div>

        <div>-XX:+CMSParallelRemarkEnabled</div>

        <div>256 gig RAM</div>

        <div>16 cores (sandy bridge)</div>

        <div><br>

        </div>

        <div>Nodes 4-6 also have </div>

        <div><br>

        </div>

        <div>-XX:+CMSEdenChunksRecordAlways</div>

        <div>-XX:+CMSParallelInitialMarkEnabled</div>

        <div><br>

        </div>

        <div>There are some application level config differences (which

          limit amount of certain objects kept in memory before flushing

          to disk) - 1&4 have the same app config, 2&5 have the

          same app config, 3&6 have the same app config</div>

        <div><br>

        </div>

        <div>This first dump, shows two days worth of total times

          application threads were stopped via grepping logs for <span style="background-color: rgb(0, 36, 2); color: rgb(0, 228,

            56); font-family: Menlo; font-size: 11px;">Total time for

            which application threads were stopped</span> and summing

          the values. worst case 4 minutes over the day is not too bad,

          so this isn’t a big issue</div>

        <div><br>

        </div>

        <div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-17

            : 1 154.623</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-17

            : 2 90.3006</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-17

            : 3 75.3602</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-17

            : 4 180.618</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-17

            : 5 107.668</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-17

            : 6 99.7783</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">-------</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-18

            : 1 190.741</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-18

            : 2 82.8865</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-18

            : 3 90.0098</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-18

            : 4 239.702</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-18

            : 5 149.332</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-18

            : 6 138.03</div>

        </div>

        <div><br>

        </div>

        <div>Notably however if you look via JMX/visualGC, the total GC

          time is actually lower on nodes 4 to 6 than the equivalent

          nodes 1 to 3. Now I know that biased lock revocation and other

          things cause safe point, so I figure something other than GC

          must be the cause… so I just did a count of log lines with  <span style="background-color: rgb(0, 36, 2); color: rgb(0, 228,

            56); font-family: Menlo; font-size: 11px;">Total time for

            which application threads were stopped</span> and got this:</div>

        <div><span style="background-color: rgb(0, 36, 2); color: rgb(0,

            228, 56); font-family: Menlo; font-size: 11px;"><br>

          </span></div>

        <div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-17

            : 1 19282</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-17

            : 2 6784</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-17

            : 3 1275</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-17

            : 4 26356</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-17

            : 5 14491</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-17

            : 6 8402</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">-------</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-18

            : 1 20943</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-18

            : 2 1134</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-18

            : 3 1129</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-18

            : 4 30289</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-18

            : 5 16508</div>

          <div style="margin: 0px; font-size: 11px; font-family: Menlo;

            color: rgb(0, 228, 56); background-color: rgb(0, 36, 2);">2014-06-18

            : 6 11459</div>

          <div>

            <div><br>

            </div>

            <div>I can’t cycle these nodes right now (to try each new

              parameter individually), but am curious whether you can

              think of why adding these parameters would have such a

              large effect on the number of safe point stops - e.g. 1129

              vs 11459 for otherwise identically configured nodes with

              very similar workload. Note the ratio is highest on nodes

              2 vs node 5 which spill the least into the old generation

              (so certainly fewer CMS cycles, and also fewer young gen

              collections) if that sparks any ideas.</div>

            <div><br>

            </div>

            <div>Thanks,</div>

            <div><br>

            </div>

            <div>Graham.</div>

            <div><br>

            </div>

            <div>P.S. It is entirely possible I don’t know exactly what <span style="background-color: rgb(0, 36, 2); color: rgb(0,

                228, 56); font-family: Menlo; font-size: 11px;">Total

                time for which application threads were stopped</span> refers

              to in all cases (I’m assuming it is a safe point stop)</div>

            <div><br>

            </div>

            <div>On Jun 16, 2014, at 1:54 PM, graham sanderson <<a moz-do-not-send="true" href="mailto:graham@vast.com">graham@vast.com</a>>

              wrote:</div>

            <br class="Apple-interchange-newline">

            <blockquote type="cite">

              <meta http-equiv="Content-Type" content="text/html;

                charset=windows-1252">

              <div style="word-wrap: break-word; -webkit-nbsp-mode:

                space; -webkit-line-break: after-white-space;">Thanks

                Jon; that’s exactly what i was hoping

                <div><br>

                  <div>

                    <div>On Jun 16, 2014, at 1:27 PM, Jon Masamitsu <<a moz-do-not-send="true" href="mailto:jon.masamitsu@oracle.com">jon.masamitsu@oracle.com</a>>

                      wrote:</div>

                    <br class="Apple-interchange-newline">

                    <blockquote type="cite">

                      <meta content="text/html; charset=windows-1252" http-equiv="Content-Type">

                      <div bgcolor="#FFFFFF" text="#000000"> <br>

                        <div class="moz-cite-prefix">On 06/12/2014 09:16

                          PM, graham sanderson wrote:<br>

                        </div>

                        <blockquote cite="mid:43FD20E2-774A-4F30-ACEA-F10395175C82@vast.com" type="cite">

                          <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

                          Hi, I hope this is the right list for this

                          question:

                          <div><br>

                          </div>

                          <div>I was investigating abortable preclean

                            timeouts in our app (and associated long

                            remark pause) so had a look at the old jdk6

                            code I had on my box, wondered about

                            recording eden chunks during certain eden

                            slow allocation paths (I wasn’t sure if TLAB

                            allocation is just a CAS bump), and saw what

                            looked perfect in the latest code, so was

                            excited to install 1.7.0_60-b19</div>

                          <div><br>

                          </div>

                          <div>I wanted to ask what you consider the

                            stability of these two options to be (I’m

                            pretty sure at least the first one is new in

                            this release)</div>

                          <div><br>

                          </div>

                          <div>I have just installed locally on my mac,

                            and am aware of <a moz-do-not-send="true" href="http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8021809">http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8021809</a> which


                            I could reproduce, however I wasn’t able to

                            reproduce it without -XX:-<span style="background-color: rgb(255, 255,

                              255); white-space: pre-wrap;">UseCMSCompactAtFullCollection

                              (is this your understanding too?)</span></div>

                        </blockquote>

                        <br>

                        Yes.<br>

                        <br>

                        <blockquote cite="mid:43FD20E2-774A-4F30-ACEA-F10395175C82@vast.com" type="cite">

                          <div><br>

                          </div>

                          <div>We are running our application with 8 gig

                            young generation (6.4g eden), on boxes with

                            32 cores… so parallelism is good for short

                            pauses</div>

                          <div><br>

                          </div>

                          <div>we already have</div>

                          <div><br>

                          </div>

                          <div>

                            <div>-XX:+UseParNewGC </div>

                            <div>-XX:+UseConcMarkSweepGC</div>

                            <div>-XX:+CMSParallelRemarkEnabled</div>

                          </div>

                          <div><br>

                          </div>

                          <div>we have seen a few long(isn) initial

                            marks, so </div>

                          <div><br>

                          </div>

                          <div>-XX:+CMSParallelInitialMarkEnabled sounds

                            good</div>

                          <div><br>

                          </div>

                          <div>as for </div>

                          <div><br>

                          </div>

                          <div>-XX:+CMSEdenChunksRecordAlways</div>

                          <div><br>

                          </div>

                          <div>my question is: what constitutes a slow

                            path such an eden chunk is potentially

                            recorded… TLAB allocation, or more horrific

                            things; basically (and I’ll test our app

                            with -XX:+CMSPrintEdenSurvivorChunks) is it

                            likely that I’ll actually get less samples

                            using -XX:+CMSEdenChunksRecordAlways in a

                            highly multithread app than I would with

                            sampling, or put another way… what sort of

                            app allocation patterns if any might avoid

                            the slow path altogether and might leave me

                            with just one chunk?</div>

                        </blockquote>

                        <br>

                        Fast path allocation is done from TLAB's.  If

                        you have to get<br>

                        a new TLAB, the call to get the new TLAB comes

                        from compiled<br>

                        code but the call is into the JVM and  that is

                        the slow path where<br>

                        the sampling is done.<br>

                        <br>

                        Jon<br>

                        <br>

                        <blockquote cite="mid:43FD20E2-774A-4F30-ACEA-F10395175C82@vast.com" type="cite">

                          <div><br>

                          </div>

                          <div>Thanks,</div>

                          <div><br>

                          </div>

                          <div>Graham</div>

                          <div><br>

                          </div>

                          <div>P.S. less relevant I think, but our old

                            generation is 16g</div>

                          <div>P.P.S. I suspect the abortable preclean

                            timeouts mostly happen after a burst of very

                            high allocation rate followed by an almost

                            complete lull… this is one of the patterns

                            that can happen in our application</div>

                        </blockquote>

                        <br>

                      </div>

                    </blockquote>

                  </div>

                  <br>

                </div>

              </div>

            </blockquote>

          </div>

          <br>

        </div>

      </div>

    </blockquote>

    <br>

  </div>


</blockquote></div><br></div></body></html>