<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix"><br>

      Hi Jungwoo,<br>

      <br>

      Thanks for contributing this change. Nice work.<br>

      <br>

      First a couple general question.<br>

      <br>

      Why did you only make this change for CMS? Shouldn't the other

      collectors benefit from this as well? You said you tried the other

      GCs as well. What were the results for them?<br>

      <br>

      Do we need the flag AdjustGCThreadsToCores? It seems like this is

      a good change and it is always nice to reduce the number of flags

      we have. If you feel unsure about the change it may be good to be

      able to disable it, but in general I think it would be better to

      decide that this is the default behavior and go with it.<br>

      <br>

      os_linux_x86.cpp<br>

      <br>

      This is where the important part of your change is. I don't know

      enough about Linux internals to validate the changes you have made

      there. I will leave that out of my review and let others (Runtime

      team?) with more Linux knowledge review it.<br>

      <br>

      <br>

      Some code comments:<br>

      <br>

      Since core_count() returns 0 for all platforms except linux_x86,

      couldn't the following code be moved to a platform independent

      place?<br>

      <br>

       758 int os::active_core_count() {<br>

       759   if (os::core_count() <= 0) {<br>

       760     os::set_core_count(os::active_processor_count());<br>

       761   }<br>

       762   return os::core_count();<br>

       763 }<br>

      <br>

      That way this code would not have to be duplicated in all other

      platforms:<br>

      <br>

       726 int os::active_core_count() {<br>

       727   return os::active_processor_count();<br>

       728 }<br>

      <br>

      We could just always use os::active_core_count() and rely on the

      value it returns.<br>

      <br>

      <br>

      VM_Version::calc_parallel_worker_threads()<br>

      <br>

       862     unsigned int ncpus = (unsigned int)

      os::active_core_count();<br>

       863     return (ncpus <= 8) ? ncpus : ncpus / 2;<br>

      <br>

      Isn't it strange that if we have 8 cores we get 8 parallel

      threads, but if we have 12 cores we only get 6? Wouldn't it make

      more sense to treat this limit as a max value, so:<br>

      <br>

      return (ncpus <= 8) ? ncpus : MAX2(8, ncpus / 2);<br>

      <br>

      <br>

      <br>

      concurrentMarkSweepGeneration.cpp<br>

      <br>

       639       if (AdjustGCThreadsToCores) {<br>

       640         FLAG_SET_DEFAULT(ConcGCThreads, ParallelGCThreads /

      2);<br>

       641       } else {<br>

       642         FLAG_SET_DEFAULT(ConcGCThreads, (3 +

      ParallelGCThreads) / 4);<br>

       643       }<br>

      <br>

      This looks a little odd to me. Why do we add 3? That does not seem

      to scale when we have a large number of parallel threads. Did you

      mean to multiply by 3?<br>

      <br>

      Also, do we really need to use a different calculation for the

      ConcGCThreads? It would have been nice to be able to use the same

      logic for both cases. Did you measure any performance difference

      in the two cases?<br>

      <br>

      <br>

      Thanks,<br>

      Bengt<br>

      <br>

      <br>

      On 11/20/13 12:35 AM, Jungwoo Ha wrote:<br>

    </div>

    <blockquote

cite="mid:CA+n_jhiXF1tqA2AoiT0XDUH6zPwd=ePwrd-aSgTMakGB7Y5ckg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><span style="font-family:arial,sans-serif;font-size:13px">Hi, </span>

          <div style="font-family:arial,sans-serif;font-size:13px"><br>

          </div>

          <div style="font-family:arial,sans-serif;font-size:13px">I am

            sending this webrev for the review.</div>

          <div style="font-family:arial,sans-serif;font-size:13px">(On

            behalf of Jon Masamitsu, it is upload here)</div>

          <div style="font-family:arial,sans-serif;font-size:13px"><a

              moz-do-not-send="true"

              href="http://cr.openjdk.java.net/%7Ejmasa/8028554/webrev.00/"

              target="_blank">http://cr.openjdk.java.net/~jmasa/8028554/webrev.00/</a><br>

          </div>

          <div style="font-family:arial,sans-serif;font-size:13px"><br>

          </div>

          <div style="font-family:arial,sans-serif;font-size:13px">The

            feature is a new heuristics to calculate the default

            ParallelGCThreads and ConGCThreads.</div>

          <div style="font-family:arial,sans-serif;font-size:13px">In

            x86, hyperthreading is generally bad for GC because of the

            cache contention.</div>

          <div style="font-family:arial,sans-serif;font-size:13px">Hence,

            using all the hyper-threaded cores will slow down the

            overall GC performance.</div>

          <div style="font-family:arial,sans-serif;font-size:13px">Current

            hotspot reads the number of processors that the Linux

            reports,</div>

          <div style="font-family:arial,sans-serif;font-size:13px">which

            treats all hyper-threaded cores equally.</div>

          <div style="font-family:arial,sans-serif;font-size:13px">Second

            problem is that when cpu mask is set, not all the cores are

            available for the GC.</div>

          <div style="font-family:arial,sans-serif;font-size:13px"><br>

          </div>

          <div style="font-family:arial,sans-serif;font-size:13px">

            The patch improves the heuristics by evaluating the actual

            available physical cores</div>

          <div style="font-family:arial,sans-serif;font-size:13px">from

            the proc filesystem and the CPU mask, and use that as the

            basis for calculating the ParallelGCThreads and

            ConcGCThreads.</div>

          <div style="font-family:arial,sans-serif;font-size:13px"><br>

          </div>

          <div style="font-family:arial,sans-serif;font-size:13px">The

            improvements of GC pause time is significant. We evaluated

            on Nehalem, Westmere, Sandybridge as well as several AMD

            processors. We also evaluated on various CPU mask

            configuration and single/dual socket configurations. </div>

          <div style="font-family:arial,sans-serif;font-size:13px">In

            almost all cases, there were speed up in GC pause time by

            10~50%.</div>

          <div style="font-family:arial,sans-serif;font-size:13px"><br>

          </div>

          <div style="font-family:arial,sans-serif;font-size:13px">

            We primarily use CMS collector for the evaluation, but we

            also tested on other GCs as well.</div>

          <div style="font-family:arial,sans-serif;font-size:13px">Please

            take a look and let me know if this patch can be accepted.</div>

          <div style="font-family:arial,sans-serif;font-size:13px"><br>

          </div>

          <div style="font-family:arial,sans-serif;font-size:13px">Thanks,</div>

          <div style="font-family:arial,sans-serif;font-size:13px">Jungwoo

            Ha</div>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

  </body>

</html>