<div dir="ltr">We have a faster implementation for card table scan, and on our implementation ~20s pause goes down to 1~2s with above-mentioned experiments. It doesn't solve the problem but it alleviates it (and actually shows that card table scan is faster).<div>If someone can sponsor this, I will create a patch for the review.<br><div><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 13, 2015 at 6:36 PM, Srinivas Ramakrishna <span dir="ltr"><<a href="mailto:ysr1729@gmail.com" target="_blank">ysr1729@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Jon --<div><br></div><div>thanks, just a comment inline below....<br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Wed, May 13, 2015 at 4:06 PM, Jon Masamitsu <span dir="ltr"><<a href="mailto:jon.masamitsu@oracle.com" target="_blank">jon.masamitsu@oracle.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000"><span>
    <br>
    <br>
    <div>On 5/13/2015 11:40 AM, Srinivas
      Ramakrishna wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">Hi Jon --
        <div><br>
        </div>
        <div>My expectation is that for any large block spanning
          multiple cards, if you walk the BOT entries for that block
          from lower addresses to higher addresses, the BOT entries
          should be monotone increasing (albeit not strictly monotone)
          until the last card for that block (but one). In the example
          you gave below, is D > C > N_words, with the BOT entries
          in some interior window of the same block/object going (low to
          high addresses in the window) like " .... DDDDCCCCDDDD...." ?
          In other words, there are lower-valued entries C within a
          range of higher-valued entries D, with both C and D denoting
          logarithmic entries for the block in question?</div>
      </div>
    </blockquote>
    <br></span>
    Yes, that's what I'm seeing.<span><br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div>If you look in the BOT methods, there was a parameter
          "reducing" passed to some of the mark_block() methods which
          was supposed to do some kind of assertion checking towards
          maintenance of this monotonicity, but looking at it now, that
          may not be sufficient and the assert could be strengthened at
          the "end" of the range being marked. More worryingly, I
          believe I had turned off even that assert before I left (ca
          6uXX?) because it was failing at that time, but the
          corresponding bug hadn't been fixed, and seems as though it
          still isn't. This might possibly  have some bearing on the
          non-monotonicity in the entries you see.</div>
      </div>
    </blockquote>
    <br></span>
    Thanks for that bit of information.   So what I'm seeing is not<br>
    necessarily the cause for the long pauses.  But see my next comment.<span><br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div>I'll hopefully get some time to go over this code again. I
          was looking at the code on the train ride into work this
          morning, and trying to remember the details, but have not
          quite delved again into all of its details yet or why the
          assert I mention above should be failing.</div>
      </div>
    </blockquote>
    <br></span>
    After the logarithmic stride back to find the block containing the<br>
    start of the object, there is a object by object walk forward to
    find the<br>
    to find the start of the object<br>
    <br>
    <blockquote type="cite">   558    while (n <= addr) {<br>
         559      debug_only(HeapWord* last = q);   // for debugging<br>
         560      q = n;<br>
         561      n += _sp->block_size(n);<br>
    </blockquote>
    [...]<br>
    <br>
    From instrumentation I think the long pauses originate in this
    forward walk<br>
    (mostly in the block_size() method).   So maybe the BOT
    "DDDDDCCCCC...CDDDD"<br>
    indicates that BOT at the start of an objects is being trashed and
    we're<br>
    going way too far back.  I'm trying to use VerifyBlockOffsetArray
    but it seems to<br>
    have bit rotted.    I'll keep trying.</div></blockquote><div><br></div></span><div>Ah, that's what I was afraid of. This must likely mean that when a large block is split, the suffix from the split</div><div>(which is typically returned to the free list) isn't getting fixed  up as was the intention, or when the sweeper</div><div>coalesces smaller blocks into a larger one, the new larger block isn't getting fixed correctly (although that seems</div><div>unlikely). Using VerifyBlockOffsetArray and asserting if the forward walk is more than a card's worth in length</div><div>would seem to be one way to catch that badness closer to when it happens...</div><div><br></div><div>Thanks for the update, Jon!</div><div>-- ramki</div><div><div class="h5"><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><span><font color="#888888"><br>
    <br>
    Jon</font></span><div><div><br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div>It might be worthwhile (if it is possible) to do a bisect
          on whatever the first known JDK (7uXX?) where the performance
          regression appears and try to narrow down the change that may
          have caused the regression, using James' test program. I have
          a sneaking suspicion (because of the elided assert) that it
          might well be a change I had made towards closing some
          BOT-walking/card-scanning cracks when initializing CMS objects
          ca 6uXX.</div>
        <div><br>
        </div>
        <div>James, in case you have read thus far, can you provide the
          exact version number of the 6uXX Oracle jdk you used that did
          _not_ show the performance regression for your test program?</div>
        <div><br>
        </div>
        <div>Thanks!</div>
        <div>-- ramki</div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Tue, May 12, 2015 at 1:50 PM, Jon
          Masamitsu <span dir="ltr"><<a href="mailto:jon.masamitsu@oracle.com" target="_blank">jon.masamitsu@oracle.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000"> Ramki,<br>
              <br>
              The logarithmic BOT is taking big steps back to<br>
              find the start of a large object.  But something is odd.<br>
              <br>
              Is there a reason why the BOT entries would decrease for a
              while<br>
              and then increase without a value that gives the start of
              an object?<br>
              <br>
              &_array->_offset_array[index] = 0xffffffff6550f600
              "DDDDDDDDDDDDDD ..."<br>
              <br>
              and then down the road.<br>
              <br>
               print &_array->_offset_array[0x110600U],
              _array->_offset_array[0x10f600U ><br>
              &_array->_offset_array[0x110600U] =
              0xffffffff65510600
              "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC<br>
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDD<br>
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD<br>
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD<br>
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD<br>
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD<br>
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD<br>
              DDDDDDDDDDDDDDDDDDDDDDDDDD<br>
              <br>
              'D' is 0x44<br>
              <br>
              N_words is 0x40<span><font color="#888888"><br>
                  <br>
                  Jon</font></span>
              <div>
                <div><br>
                  <br>
                  <div>On 05/12/2015 08:46 AM, Srinivas Ramakrishna
                    wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div>Thanks for that background Jon; I'll try and
                      look at the 8uxx code and run James's example when
                      I get a few spare cycles (which won't be soon,
                      unfortunately).</div>
                    <div><br>
                    </div>
                    <div>-- ramki<br>
                      <br>
                      ysr1729</div>
                    <div><br>
                      On May 8, 2015, at 14:25, Jon Masamitsu <<a href="mailto:jon.masamitsu@oracle.com" target="_blank">jon.masamitsu@oracle.com</a>>
                      wrote:<br>
                      <br>
                    </div>
                    <blockquote type="cite">
                      <div> Ramki,<br>
                        <br>
                        I compared the memory/blockOffsetTable.* files
                        between 6u35 and 7 and there <br>
                        were essentially no differences.  The .cpp file
                        had some changes in assertions.<br>
                        The other two were the same except for comments.<br>
                        <br>
                        Between 6 and 6u35 (didn't pin point it), we
                        changed from the rather<br>
                        weak N_powers=3  :-) to N_powers=14.<br>
                        <br>
                        There seem to be two problematic cases.<br>
                        <br>
                        1)  Spike in ParNew times that lasts for 2 (or a
                        few) GC's<br>
                        2) Continually growing ParNew times<br>
                        <br>
                        From just reading the CR<br>
                        <br>
                        <a href="https://bugs.openjdk.java.net/browse/JDK-8079274" target="_blank">https://bugs.openjdk.java.net/browse/JDK-8079274</a><br>
                        <br>
                        the test case seems to reproduce 1).  The test
                        case<br>
                        has a large object that lives till the
                        application exits.<br>
                        If it were a broken logarithmic BOT, I would not
                        expect<br>
                        the ParNew pauses to shrink again.  <br>
                        <br>
                        Jon<br>
                        <br>
                        <br>
                        <br>
                        <div>On 5/8/2015 11:49 AM, Srinivas Ramakrishna
                          wrote:<br>
                        </div>
                        <blockquote type="cite">
                          <div dir="ltr">This is indeed interesting...
                            <div><br>
                            </div>
                            <div>Jesper, did you figure out why (as
                              James seems to note) the problem is seen
                              in JDK 8 but not in JDK 6 (he's using
                              ParNew in both cases). Also, isn't the BOT
                              logarithmic, which will allow threads to
                              "quickly" traverse back to the start of
                              even a large object? Did something change
                              in the BOT that might have led to what
                              James sees as a regression, or did the
                              range that he characterizes as a "blind
                              spot" merely move from one place to
                              another?</div>
                            <div><br>
                            </div>
                            <div>Unfortunately, I have not been keeping
                              up with the changes in this area and am
                              not working with JDK 6 code much, so can't
                              offer any clues, but am surprised that
                              this would be a regression. (I would
                              expect the same performance pathologies in
                              this code to exist in 6 and forward -- has
                              it been established conclusively that this
                              is indeed a regression?)</div>
                            <div><br>
                            </div>
                            <div>thanks!</div>
                            <div>-- ramki</div>
                          </div>
                          <div class="gmail_extra"><br>
                            <div class="gmail_quote">On Fri, May 8, 2015
                              at 6:40 AM, James Lent <span dir="ltr"><<a href="mailto:jlent@tivo.com" target="_blank">jlent@tivo.com</a>></span>
                              wrote:<br>
                              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                <div bgcolor="#FFFFFF" text="#000000">
                                  The explanation of the problem added
                                  by Jesper Wilhelmsson on <span> <span title="2015-05-06 08:59">2015-05-06
                                      08:59 </span></span>to
                                  JDK-8079274 is interesting, but, not
                                  very satisfying. <br>
                                  <br>
                                  1) The problem affects more than this
                                  one specific breakpoint.  It impacts
                                  objects in at least the range of about
                                  480M to 512M.  The comment "A larger
                                  object probably triggers some higher
                                  level code that are able to skip
                                  scanning of this part of the memory"
                                  does not inspire confidence. It makes
                                  even more concerned that we do not
                                  understand this issue and that there
                                  could be other triggers.  It seems
                                  like "the higher level code" has a 32M
                                  blind spot.  Further I can create
                                  scenarios where the GC code takes
                                  progressively longer to clear the
                                  issue.  Presumably it completes the
                                  lower block scan during each GC
                                  cycle.  Why does it take longer and
                                  longer each time (my guess is it has
                                  something to do with the number of
                                  objects moved to survivor space).  If
                                  I cache 1 in 100 instead of 1 in 10
                                  then there are 11 progressively slower
                                  ParNewGC (after several fast one) and
                                  the issue takes minutes to clear: <br>
                                  <br>
                                  $ /usr/lib/jvm/java-7-oracle/bin/java
                                  -verbose:gc -XX:+PrintGCTimeStamps
                                  -XX:+PrintGCDetails -XX:+UseParNewGC
                                  -XX:+UseConcMarkSweepGC -Xmx4G -Xms4G
                                  LongParNewPause 536870384 100 100 <br>
                                  0.308: [GC0.308: [ParNew:
                                  272640K->3029K(306688K), 0.0113180
                                  secs] 796927K->527316K(4160256K),
                                  0.0114040 secs] [Times: user=0.05
                                  sys=0.00, real=0.02 secs] <br>
                                  0.367: [GC0.367: [ParNew:
                                  275669K->7766K(306688K), 0.0140840
                                  secs] 799956K->532053K(4160256K),
                                  0.0141630 secs] [Times: user=0.04
                                  sys=0.01, real=0.02 secs] <br>
                                  0.430: [GC0.430: [ParNew:
                                  280406K->10057K(306688K), 0.0120810
                                  secs] 804693K->534344K(4160256K),
                                  0.0121690 secs] [Times: user=0.05
                                  sys=0.00, real=0.02 secs] <br>
                                  0.490: [GC0.490: [ParNew:
                                  282697K->15383K(306688K), 0.0128540
                                  secs] 806984K->539670K(4160256K),
                                  0.0129390 secs] [Times: user=0.05
                                  sys=0.00, real=0.02 secs] <br>
                                  0.551: [GC0.551: [ParNew:
                                  288023K->21006K(306688K), 0.0146480
                                  secs] 812310K->545294K(4160256K),
                                  0.0147650 secs] [Times: user=0.05
                                  sys=0.01, real=0.01 secs] <br>
                                  0.613: [GC0.613: [ParNew:
                                  293646K->26805K(306688K), 0.0141250
                                  secs] 817934K->551092K(4160256K),
                                  0.0142050 secs] [Times: user=0.05
                                  sys=0.00, real=0.01 secs] <br>
                                  0.675: [GC0.675: [ParNew:
                                  299445K->24258K(306688K), 0.4816840
                                  secs] 823732K->551436K(4160256K),
                                  0.4817620 secs] [Times: user=1.90
                                  sys=0.00, real=0.48 secs] <br>
                                  1.205: [GC1.205: [ParNew:
                                  296898K->17914K(306688K), 3.4216550
                                  secs] 824076K->547788K(4160256K),
                                  3.4217180 secs] [Times: user=13.50
                                  sys=0.00, real=3.42 secs] <br>
                                  4.674: [GC4.674: [ParNew:
                                  290554K->17433K(306688K), 6.3406820
                                  secs] 820428K->550031K(4160256K),
                                  6.3407410 secs] [Times: user=25.00
                                  sys=0.02, real=6.34 secs] <br>
                                  11.062: [GC11.062: [ParNew:
                                  290073K->17315K(306688K),
                                  10.9493130 secs]
                                  822671K->552636K(4160256K),
                                  10.9493890 secs] [Times: user=42.98
                                  sys=0.03, real=10.95 secs] <br>
                                  22.059: [GC22.059: [ParNew:
                                  289955K->17758K(306688K),
                                  16.0539650 secs]
                                  825276K->555798K(4160256K),
                                  16.0540250 secs] [Times: user=62.44
                                  sys=0.09, real=16.05 secs] <br>
                                  38.161: [GC38.161: [ParNew:
                                  290398K->17745K(306688K),
                                  20.6900820 secs]
                                  828438K->558512K(4160256K),
                                  20.6901420 secs] [Times: user=80.79
                                  sys=0.08, real=20.69 secs] <br>
                                  58.898: [GC58.898: [ParNew:
                                  290385K->21826K(306688K),
                                  24.6781700 secs]
                                  831152K->565310K(4160256K),
                                  24.6782710 secs] [Times: user=95.58
                                  sys=0.05, real=24.68 secs] <br>
                                  83.625: [GC83.625: [ParNew:
                                  294466K->21834K(306688K),
                                  30.3501800 secs]
                                  837950K->568044K(4160256K),
                                  30.3502610 secs] [Times: user=115.80
                                  sys=0.11, real=30.35 secs] <br>
                                  114.022: [GC114.022: [ParNew:
                                  294474K->21836K(306688K),
                                  33.6648800 secs]
                                  840684K->570772K(4160256K),
                                  33.6649390 secs] [Times: user=127.38
                                  sys=0.14, real=33.67 secs] <br>
                                  147.736: [GC147.736: [ParNew:
                                  294476K->18454K(306688K),
                                  38.9828430 secs]
                                  843412K->570117K(4160256K),
                                  38.9829040 secs] [Times: user=149.14
                                  sys=0.15, real=38.99 secs] <br>
                                  186.766: [GC186.766: [ParNew:
                                  291094K->26602K(306688K),
                                  39.8726730 secs]
                                  842757K->580977K(4160256K),
                                  39.8727330 secs] [Times: user=156.35
                                  sys=0.11, real=39.88 secs] <br>
                                  226.686: [GC226.686: [ParNew:
                                  299242K->26612K(306688K), 0.0174470
                                  secs] 853617K->583714K(4160256K),
                                  0.0175300 secs] [Times: user=0.06
                                  sys=0.00, real=0.02 secs] <br>
                                  <br>
                                  2) The suggestion to use only 1 thread
                                  is not acceptable to me.  Use parallel
                                  GC with one thread?  Further it did
                                  not work for me (in fact it made
                                  things worse).  "user" time does now
                                  equal "real" time (and the new size is
                                  1/4 as big) so I think the parameter
                                  took effect: <br>
                                  <br>
                                  $ /usr/lib/jvm/java-8-oracle/bin/java
                                  -verbose:gc -XX:+PrintGCTimeStamps
                                  -XX:+PrintGCDetails -XX:+UseParNewGC
                                  -XX:+UseConcMarkSweepGC -Xmx4G -Xms4G
                                  -XX:ParallelGCThreads=1
                                  LongParNewPause 536870384 100 10 <br>
                                  0.226: [GC (Allocation Failure) 0.226:
                                  [ParNew: 68160K->6980K(76672K),
                                  0.0489756 secs]
                                  592447K->531267K(4185792K),
                                  0.0490718 secs] [Times: user=0.05
                                  sys=0.00, real=0.05 secs] <br>
                                  0.291: [GC (Allocation Failure) 0.291:
                                  [ParNew: 75140K->8512K(76672K),
                                  29.8707281 secs]
                                  599427K->539479K(4185792K),
                                  29.8707954 secs] [Times: user=29.80
                                  sys=0.00, real=29.88 secs] <br>
                                  30.180: [GC (Allocation Failure)
                                  30.180: [ParNew:
                                  76672K->8093K(76672K), 30.6357699
                                  secs] 607639K->545772K(4185792K),
                                  30.6358257 secs] [Times: user=30.57
                                  sys=0.00, real=30.63 secs] <br>
                                  60.837: [GC (Allocation Failure)
                                  60.837: [ParNew:
                                  76253K->6830K(76672K), 69.2115255
                                  secs] 613932K->552169K(4185792K),
                                  69.2115783 secs] [Times: user=69.05
                                  sys=0.01, real=69.22 secs] <br>
                                  130.068: [GC (Allocation Failure)
                                  130.068: [ParNew:
                                  74990K->8226K(76672K), 132.5118903
                                  secs] 620329K->560381K(4185792K),
                                  132.5119459 secs] [Times: user=132.21
                                  sys=0.00, real=132.52 secs] <br>
                                  262.603: [GC (Allocation Failure)
                                  262.603: [ParNew:
                                  76386K->8512K(76672K), 0.0400138
                                  secs] 628541K->568748K(4185792K),
                                  0.0400687 secs] [Times: user=0.04
                                  sys=0.00, real=0.04 secs] <br>
                                  <br>
                                  3) Applications should not be expected
                                  to tune the GC settings to account for
                                  the size of one static object.  The
                                  connection between the size of this
                                  object and the GC performance was not
                                  easy to even find.  Further the
                                  parameter ParGCCardsPerStrideChunk is
                                  pretty low level and not one I think
                                  many application developers are
                                  familiar with.<br>
                                  <br>
                                  Error: VM option
                                  'ParGCCardsPerStrideChunk' is
                                  diagnostic and must be enabled via
                                  -XX:+UnlockDiagnosticVMOptions.<span><br>
                                    Error: Could not create the Java
                                    Virtual Machine.<br>
                                    Error: A fatal exception has
                                    occurred. Program will exit.<br>
                                    <br>
                                  </span> 4) I can not reproduce this
                                  issue in Oracle 6.  This appears to be
                                  a regression.  How can fixing a
                                  regression be called an enhancement
                                  request? <br>
                                  <span> <br>
                                    <br>
                                    <br>
                                    <br>
                                    <br>
                                    <br>
                                    <br>
                                    <hr> <font color="Gray" face="Arial" size="1"><br>
                                      This email and any attachments may
                                      contain confidential and
                                      privileged material for the sole
                                      use of the intended recipient. Any
                                      review, copying, or distribution
                                      of this email (or any attachments)
                                      by others is prohibited. If you
                                      are not the intended recipient,
                                      please contact the sender
                                      immediately and permanently delete
                                      this email and any attachments. No
                                      employee or agent of TiVo Inc. is
                                      authorized to conclude any binding
                                      agreement on behalf of TiVo Inc.
                                      by email. Binding agreements with
                                      TiVo Inc. may only be made by a
                                      signed written agreement.<br>
                                    </font> </span></div>
                              </blockquote>
                            </div>
                            <br>
                          </div>
                        </blockquote>
                        <br>
                      </div>
                    </blockquote>
                  </blockquote>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div></div></div><br></div></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div><span style="font-size:12.8000001907349px">Jungwoo Ha | Java Platform Team | <a href="mailto:jwha@google.com" target="_blank">jwha@google.com</a></span><br></div><div><br></div></div></div>
</div>