<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    Hi Jungwoo,<br>
    <br>
    <div class="moz-cite-prefix">On 2013-11-21 18:37, Jungwoo Ha wrote:<br>
    </div>
    <blockquote
cite="mid:CA+n_jhiHyRoJrwvgXKV5X6vSsZWgGLJMmNYmMSLCwqmcyBtmzg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra"><br>
          <br>
          <div class="gmail_quote">On Thu, Nov 21, 2013 at 2:11 AM,
            Bengt Rutisson <span dir="ltr"><<a
                moz-do-not-send="true"
                href="mailto:bengt.rutisson@oracle.com" target="_blank">bengt.rutisson@oracle.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div><br>
                  Hi Jungwoo,<br>
                  <br>
                  Thanks for contributing this change. Nice work.<br>
                  <br>
                  First a couple general question.<br>
                  <br>
                  Why did you only make this change for CMS? Shouldn't
                  the other collectors benefit from this as well? You
                  said you tried the other GCs as well. What were the
                  results for them?<br>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>The change will affect all collectors because it is
              setting ParalellGCThreads and ConcGCThreads. <br>
            </div>
            <div>I mostly tested on CMS for convenience of evaluation
              and needs.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Right. Sorry, I missed that ParallelGCThreads is initialized by
    Abstract_VM_Version::parallel_worker_threads() that you now
    override. So, you are right, this change will affect all collectors.<br>
    <br>
    But doesn't that mean that you need to do the same changes for
    ConcGCThreads for G1 as you do for CMS?<br>
    <br>
    <br>
    <blockquote
cite="mid:CA+n_jhiHyRoJrwvgXKV5X6vSsZWgGLJMmNYmMSLCwqmcyBtmzg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div> Do we need the flag AdjustGCThreadsToCores? It
                  seems like this is a good change and it is always nice
                  to reduce the number of flags we have. If you feel
                  unsure about the change it may be good to be able to
                  disable it, but in general I think it would be better
                  to decide that this is the default behavior and go
                  with it.<br>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>I have no strong opinion. If you guys can confirm the
              improvements and feel like you don't need the flag, that
              would be ideal.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Yes, this is an interesting dilemma. Personally I don't think I have
    the cycles right now to do any performance testing of this fix. But
    I assume that you don't have access to enough machines and
    benchmarks to test this properly on all supported platforms. I'll
    bring this up internally to see if there is any process for handling
    this type of situation...<br>
    <br>
    Since your change is for Linux only and you have done the
    measurements there, I'm fine with your results. I would prefer
    skipping the AdjustGCThreadsToCores flag based on the data you
    provided. Especially since we can still set ParallelGCThreads
    explicitly on the command line if we need to work around a
    regression somewhere.<br>
    <br>
    <blockquote
cite="mid:CA+n_jhiHyRoJrwvgXKV5X6vSsZWgGLJMmNYmMSLCwqmcyBtmzg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div> <br>
                  os_linux_x86.cpp<br>
                  <br>
                  This is where the important part of your change is. I
                  don't know enough about Linux internals to validate
                  the changes you have made there. I will leave that out
                  of my review and let others (Runtime team?) with more
                  Linux knowledge review it.<br>
                  <br>
                  <br>
                  Some code comments:<br>
                  <br>
                  Since core_count() returns 0 for all platforms except
                  linux_x86, couldn't the following code be moved to a
                  platform independent place?<br>
                  <br>
                   758 int os::active_core_count() {<br>
                   759   if (os::core_count() <= 0) {<br>
                   760    
                  os::set_core_count(os::active_processor_count());<br>
                   761   }<br>
                   762   return os::core_count();<br>
                   763 }<br>
                  <br>
                  That way this code would not have to be duplicated in
                  all other platforms:<br>
                  <br>
                   726 int os::active_core_count() {<br>
                   727   return os::active_processor_count();<br>
                   728 }<br>
                  <br>
                  We could just always use os::active_core_count() and
                  rely on the value it returns.<br>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>Sounds good to me.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Good.<br>
    <br>
    <blockquote
cite="mid:CA+n_jhiHyRoJrwvgXKV5X6vSsZWgGLJMmNYmMSLCwqmcyBtmzg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div> <br>
                  <br>
                  VM_Version::calc_parallel_worker_threads()<br>
                  <br>
                   862     unsigned int ncpus = (unsigned int)
                  os::active_core_count();<br>
                   863     return (ncpus <= 8) ? ncpus : ncpus / 2;<br>
                  <br>
                  Isn't it strange that if we have 8 cores we get 8
                  parallel threads, but if we have 12 cores we only get
                  6? Wouldn't it make more sense to treat this limit as
                  a max value, so:<br>
                  <br>
                  return (ncpus <= 8) ? ncpus : MAX2(8, ncpus / 2);<br>
                  <br>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>I think that's better.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Good. :)<br>
    <br>
    <blockquote
cite="mid:CA+n_jhiHyRoJrwvgXKV5X6vSsZWgGLJMmNYmMSLCwqmcyBtmzg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div> <br>
                  concurrentMarkSweepGeneration.cpp<br>
                  <br>
                   639       if (AdjustGCThreadsToCores) {<br>
                   640         FLAG_SET_DEFAULT(ConcGCThreads,
                  ParallelGCThreads / 2);<br>
                   641       } else {<br>
                   642         FLAG_SET_DEFAULT(ConcGCThreads, (3 +
                  ParallelGCThreads) / 4);<br>
                   643       }<br>
                  <br>
                  This looks a little odd to me. Why do we add 3? That
                  does not seem to scale when we have a large number of
                  parallel threads. Did you mean to multiply by 3?<br>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>(3 + ParallelGCThreads) / 4 is the upstream default,
              and I didn't change it.</div>
            <div>ParallelGCThreads can be much smaller with the change,
              so I divided into 2 instead of 4.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Sorry, I looked too quickly at the diff. You're right of course...<br>
    <br>
    <blockquote
cite="mid:CA+n_jhiHyRoJrwvgXKV5X6vSsZWgGLJMmNYmMSLCwqmcyBtmzg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div> Also, do we really need to use a different
                  calculation for the ConcGCThreads? It would have been
                  nice to be able to use the same logic for both cases.
                  Did you measure any performance difference in the two
                  cases?<br>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>Maybe not. Most of the pause time is on the young
              generation in CMS. I have seen pause time reduction on old
              generation as well, but anyway that is small.</div>
            <div>As I mentioned earlier, ParallelGCThreads has changed
              when Hyperthreading or CPU mask is turned on. Hence using
              the same heuristics can cause too small ConcGCThreads.</div>
            <div>I'd like to use something like
              "os::active_core_counts() / 4" and make it independent of
              using ParallelGCThreads.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Yes, I think it sounds like a good idea to decouple
    ParallelGCThreads and ConcGCThreads. Maybe we should do that as a
    separate change and just leave any chagnes to ConcGCThreads out of
    this patch?<br>
    <br>
    Thanks,<br>
    Bengt<br>
    <br>
    <blockquote
cite="mid:CA+n_jhiHyRoJrwvgXKV5X6vSsZWgGLJMmNYmMSLCwqmcyBtmzg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>Jungwoo</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>