<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    Final(?) version, only difference is the removed assert in sample():<br>
    <br>
    <a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~mikael/7158457/webrev.02/">http://cr.openjdk.java.net/~mikael/7158457/webrev.02/</a><br>
    <br>
    I'm also looking for a sponsor for this fix - any takers?<br>
    <br>
    Thanks,<br>
    Mikael<br>
    <br>
    On 2012-05-05 05:57, Srinivas Ramakrishna wrote:
    <blockquote
cite="mid:CABzyjym-WtPkzokKACr1Arid3kiJXze2VX-8JxsC7hUmF4eY-A@mail.gmail.com"
      type="cite">Mikael -- I agree that this fix would be good to get
      in, at least to address the immediate crash.<br>
      <br>
      reviewed.<br>
      -- ramki<br>
      <br>
      <div class="gmail_quote">On Fri, May 4, 2012 at 3:03 AM, Mikael
        Vidstedt <span dir="ltr"><<a moz-do-not-send="true"
            href="mailto:mikael.vidstedt@oracle.com" target="_blank">mikael.vidstedt@oracle.com</a>></span>
        wrote:<br>
        <blockquote class="gmail_quote" style="margin:0 0 0
          .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div bgcolor="#FFFFFF" text="#000000"> <br>
            Update:<br>
            <br>
            I'd like to continue working on this issue, and as part of
            that look at how the promoted and pretenured counters are
            used today and potentially update the logic given the
            insights around bytes vs. words etc.<br>
            <br>
            I think it would be helpful to address the crash problem
            short term though, and so I'd like to get feedback on the
            updated webrev:<br>
            <br>
            <a moz-do-not-send="true"
              href="http://cr.openjdk.java.net/%7Emikael/7158457/webrev.01/"
              target="_blank">http://cr.openjdk.java.net/~mikael/7158457/webrev.01/</a><br>
            <br>
            It's essentially the same as the previous one with Igor's
            suggested _is_old check improvement. Would it be ok to make
            this fix and clone the bug to track the follow-up work
            cleaning up the actual logic/heuristics?<br>
            <br>
            Thanks,<br>
            Mikael
            <div>
              <div class="h5"><br>
                <br>
                On 2012-04-26 03:38, Mikael Vidstedt wrote:
                <blockquote type="cite"> <br>
                  On 2012-04-24 23:50, Igor Veresov wrote:
                  <blockquote type="cite">
                    <div>Wasn't it the case that _avg_pretenured value
                      is formed incorrectly? I don't think it should be
                      sampled at every allocation, I think it should
                      measure the amount of data allocated in the old
                      gen between young collections, also if you
                      remember we agreed that the dimensions are wrong.
                      Or were you able to find a better explanation as
                      opposed to what we discussed before?</div>
                  </blockquote>
                  <br>
                  Thanks for reminding me - I believe you're absolutely
                  right. For a while I was thinking it actually did make
                  sense after all, but I just did an experiment to see
                  what actually happens at runtime:<br>
                  <br>
                  The _avg_pretenured counter is indeed sampled every
                  time an allocation is made directly in old gen. The
                  actual value reflects the average size *in words* of
                  the object that was allocated. Put differently, it's
                  the average size of pretenured objects in words.<br>
                  <br>
                  The counter is used in
                  PSAdaptiveSizePolicy::update_averages to calculate the
                  average amount of promoted data:<br>
                  <br>
                  <font face="Courier New, Courier, monospace">avg_promoted()->sample(promoted


                    + _avg_pretenured->padded_average());<br>
                  </font><br>
                  The promoted parameter is the number of *bytes* that
                  were just promoted. To sum it up, that appears to
                  imply that there are two problems with the above
                  computation:<br>
                  <br>
                  1. It's taking the size of everything that was just
                  promoted and adds the size of an average prenured
                  object (as opposed to the total size of all recently
                  pretenured objects)<br>
                  2. It's adding variables with different dimensions -
                  promoted is in bytes, and the _avg_pretenured padded
                  average is in words<br>
                  <br>
                  Since that effectively means _avg_pretenured it's off
                  by a factor (word_size * number_of_objects) I'm
                  guessing it will in the common case not really matter
                  to the calculation of avg_promoted...<br>
                  <br>
                  <br>
                  <blockquote type="cite">
                    <div><br>
                    </div>
                    <div>As for the treatment of the symptom, the code
                      looks good to me. Do you think it might be
                      beneficial to check the old value of _is_old
                      before assigning to it? Would cause less memory
                      traffic, if increment_count() is called
                      frequently.</div>
                    <div>
                      <pre style="font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;word-spacing:0px"><span style="color:blue">  60   void  increment_count() {</span>
<span style="color:blue">  61     _sample_count++;</span><font color="#0000ff">
</font></pre>
                      <pre style="font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;word-spacing:0px"><span style="font-family:Helvetica;white-space:normal"><pre style="font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;word-spacing:0px"><span style="color:blue">  </span><span><font color="#ff0000">62     if (!_is_old && _sample_count > OLD_THRESHOLD) {</font></span></pre></span><span style="color:blue">  63       _is_old = true;</span>
<span style="color:blue">  64     }</span>
<span style="color:blue">  65   }</span></pre>
                    </div>
                  </blockquote>
                  <br>
                  Good suggestion. I'll update my change with this in
                  case we need something urgently, but given the above
                  issues it's likely a good idea to take another pass at
                  this.<br>
                  <br>
                  Thanks,<br>
                  Mikael<br>
                  <br>
                  <blockquote type="cite">
                    <div>
                      <div><br>
                      </div>
                    </div>
                    <div>igor</div>
                    <br>
                    <div>
                      <div>On Apr 24, 2012, at 8:01 AM, Mikael Vidstedt
                        wrote:</div>
                      <br>
                      <blockquote type="cite">
                        <div><br>
                          Hi all,<br>
                          <br>
                          The statistical counters in gcUtil are used to
                          keep track of historical information about
                          various key metrics in the garbage collectors.
                          Built in to the core AdaptiveWeightedAverage
                          base class is the concept of aging the values,
                          essentially treating the first 100 values
                          differently and putting more weight on them
                          since there's not yet enough historical data
                          built up.<br>
                          <br>
                          In the class there is a 32-bit counter
                          (_sample_count) that incremented for every
                          sample and used to compute scale the weight of
                          the added value (see
                          compute_adaptive_average), and the scaling
                          logic divides 100 by the count. In the normal
                          case this is not a problem - the counters are
                          reset every once in a while and/or grow very
                          slowly. In some pathological cases the counter
                          will however continue to increment and
                          eventually overflow/wrap, meaning the 32-bit
                          count will go back to zero and the division in
                          compute_adaptive_average will lead to a
                          div-by-zero crash.<br>
                          <br>
                          The test case where this is observed is a test
                          that stress tests allocation in combination
                          with the GC locker. Specifically, the test is
                          multi-threaded which pounds on
                          java.util.zip.Deflater.deflate, which
                          internally uses the GetPrimitiveArrayCritical
                          JNI function to temporarily lock out the GC
                          (using the GC locker). The garbage collector
                          used is in this case the parallel scavenger
                          and the the counter that overflows is
                          _avg_pretenured. _avg_pretenured is
                          incremented/sampled every time an allocation
                          is made directly in the old gen, which I
                          believe is more likely when the GC locker is
                          active.<br>
                          <br>
                          <br>
                          The suggested fix is to only perform the
                          division in compute_adaptive_average when it
                          is relevant, which currently is for the first
                          100 values. Once there are more than 100
                          samples there is no longer a need to scale the
                          weight.<br>
                          <br>
                          This problem is tracked in 7158457 (stress:
                          jdk7 u4 core dumps during megacart stress test
                          run).<br>
                          <br>
                          Please review and comment on the webrev below:<br>
                          <br>
                          <a moz-do-not-send="true"
                            href="http://cr.openjdk.java.net/%7Emikael/7158457/webrev.00"
                            target="_blank">http://cr.openjdk.java.net/~mikael/7158457/webrev.00</a><br>
                          <br>
                          Thanks,<br>
                          Mikael<br>
                          <br>
                        </div>
                      </blockquote>
                    </div>
                    <br>
                  </blockquote>
                  <br>
                </blockquote>
                <br>
              </div>
            </div>
          </div>
        </blockquote>
      </div>
      <br>
    </blockquote>
    <br>
  </body>
</html>