<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Hi Stefan,<br>
    <br>
    <div class="moz-cite-prefix">On 2015-07-02 17:03, Jon Masamitsu
      wrote:<br>
    </div>
    <blockquote cite="mid:559552A9.8030008@oracle.com" type="cite">
      <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
      <br>
      <br>
      <div class="moz-cite-prefix">On 07/02/2015 04:48 AM, Stefan
        Karlsson wrote:<br>
      </div>
      <blockquote cite="mid:55952520.6030609@oracle.com" type="cite">
        <meta content="text/html; charset=UTF-8"
          http-equiv="Content-Type">
        <br>
        <br>
        <div class="moz-cite-prefix">On 2015-07-01 18:31, Jon Masamitsu
          wrote:<br>
        </div>
        <blockquote cite="mid:559415DD.7030604@oracle.com" type="cite">
          <meta content="text/html; charset=UTF-8"
            http-equiv="Content-Type">
          <br>
          <br>
          <div class="moz-cite-prefix">On 6/12/2015 7:52 AM, Stefan
            Karlsson wrote:<br>
          </div>
          <blockquote cite="mid:557AF21B.2090102@oracle.com" type="cite">
            <meta http-equiv="content-type" content="text/html;
              charset=UTF-8">
            Hi all,<br>
            <br>
            The current implementation to distribute tasks to GC worker
            threads often cause long latencies (multiple milliseconds)
            when the threads are started and stopped. <br>
            <br>
            The main reason is that the worker threads have to fight
            over the Monitor lock when they are woken up from the call
            to Monitor::wait. Another reason is that all worker threads
            call notify_all when they finish a task and there wakes all
            all sleeping worker threads, which will yet again force the
            worker threads to fight over the lock. <br>
            <br>
            I propose that we use semaphores instead, so that the worker
            threads don't have to fight over a lock when they are woken
            up.<br>
            <br>
            <br>
            The patches build upon the following patch which introduces
            a Semaphore utility class. This patch will sent out for
            review on the hotspot-dev, since it affects non-GC parts of
            the code:<br>
             <a moz-do-not-send="true" class="moz-txt-link-freetext"
              href="http://cr.openjdk.java.net/%7Estefank/8087322/webrev.00/">http://cr.openjdk.java.net/~stefank/8087322/webrev.00/</a><br>
             <a moz-do-not-send="true" class="moz-txt-link-freetext"
              href="https://bugs.openjdk.java.net/browse/JDK-8087322">https://bugs.openjdk.java.net/browse/JDK-8087322</a><br>
            <br>
            <br>
            The first patch that I would like to get reviewed is:<br>
             <a moz-do-not-send="true" class="moz-txt-link-freetext"
              href="http://cr.openjdk.java.net/%7Estefank/8087323/webrev.00/">http://cr.openjdk.java.net/~stefank/8087323/webrev.00/</a><br>
             <a moz-do-not-send="true" class="moz-txt-link-freetext"
              href="https://bugs.openjdk.java.net/browse/JDK-8087323">https://bugs.openjdk.java.net/browse/JDK-8087323</a>
            - Unify and split the work gang classes <br>
            <br>
            It prepares for JDK-8087324, by separating the generic
            WorkGang implementation from the more elaborate
            YieldingFlexibleWorkGang (CMS) implementation. By having
            this part as a separate patch, I hope it will be easier to
            review JDK-8087324. The patch changes the work gang
            inheritance from:<br>
            <br>
            AbstractWorkGang<br>
             WorkGang<br>
              FlexibleWorkGang<br>
               YieldingFlexibleWorkGang<br>
            <br>
            to:<br>
            <br>
            AbstractWorkGang<br>
             WorkGang<br>
             YieldingFlexibleWorkGang<br>
            <br>
            Parts of the FlexibleWorkGang and WorkGang code that is
            going to be used by both concrete work gang classes, has
            been moved into AbstractWorkGang. I've duplicated some code
            in WorkGang and YieldingFlexibleWorkGang, but that code will
            be removed from WorkGang in the following patch.<br>
          </blockquote>
        </blockquote>
      </blockquote>
    </blockquote>
    The changes for 8087323 looks good. I agree with Jons comments below
    about removing the function is_Yielding... and the cast in
    YieldingFlexibleGangTask.<br>
    <br>
    Thanks,<br>
    Stefan<br>
    <br>
    <blockquote cite="mid:559552A9.8030008@oracle.com" type="cite">
      <blockquote cite="mid:55952520.6030609@oracle.com" type="cite">
        <blockquote cite="mid:559415DD.7030604@oracle.com" type="cite">
          <blockquote cite="mid:557AF21B.2090102@oracle.com" type="cite">
          </blockquote>
          <br>
          <a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/%7Estefank/8087323/webrev.00/src/share/vm/gc/cms/yieldingWorkgroup.hpp.frames.html">http://cr.openjdk.java.net/~stefank/8087323/webrev.00/src/share/vm/gc/cms/yieldingWorkgroup.hpp.frames.html</a><br>
          <br>
          There seems to be only one definition of
          is_YieldingFlexibleGang_task() now.  Is that right?  Is that
          useful?<br>
          <br>
          <pre> 131   NOT_PRODUCT(virtual bool is_YieldingFlexibleGang_task() const {
 132     return true;
 133   })</pre>
        </blockquote>
        <br>
        I agree. I don't think we need it anymore.<br>
        <br>
      </blockquote>
      Thanks.<br>
      <br>
      <blockquote cite="mid:55952520.6030609@oracle.com" type="cite"> <br>
        <blockquote cite="mid:559415DD.7030604@oracle.com" type="cite">
          <br>
          Not a change  in your patch but<br>
          <br>
          <pre><span class="changed">  86   AbstractWorkGang(const char* name, uint workers, bool are_GC_task_threads, bool are_ConcurrentGC_threads) :</span>
<span class="changed">  87       _name(name),</span>
<span class="changed">  88       _total_workers(workers),</span>
<span class="changed">  89       _active_workers(UseDynamicNumberOfGCThreads ? 1U : workers),</span>
<span class="changed">  90       _are_GC_task_threads(are_GC_task_threads),</span>
<span class="changed">  91       _are_Concurren</span></pre>
          <br>
          _active_workers is always calculated as >= 2 unless
          _total_workers is only 1.<br>
          So line 89 should be<br>
          <br>
          <span class="changed">_active_workers(UseDynamicNumberOfGCThreads


            ? </span>MIN2(2, workers) : workers)<br>
          <br>
          Should I file a CR for that?  Or do you want to include it.<br>
        </blockquote>
        <br>
        I'm not sure that what is proposed above is correct. I see that
        AdaptiveSizePolicy::calc_active_workers returns 2 as a minimum,
        but both ConcurrentMark::calc_parallel_marking_threads and
        AdaptiveSizePolicy::calc_active_conc_workers can return 1.<br>
        <br>
        I also don't think it should be AbstractWorkGang's
        responsibility to have the knowledge about the minimum number of
        worker threads that are used when UseDynamicNumberOfGCThreads
        are turned on. Maybe we should set it to 0, and let the
        calc_*_active_workers setup the default value.<br>
      </blockquote>
      <br>
      I think that at one time I had tried to set the default to 0 and
      something failed.  I can see the point though.<br>
      <br>
      <blockquote cite="mid:55952520.6030609@oracle.com" type="cite"> <br>
        I would prefer to handle any changes, to this part of the code,
        as separate RFEs.<br>
      </blockquote>
      <br>
      Fair enough.  If I think it's worth doing, I'll file and RFE. 
      Probably something more than just setting<br>
      it to 2 (maybe picking a default value with the help of
      AdaptiveSizePolicy).<br>
      <br>
      <br>
      <blockquote cite="mid:55952520.6030609@oracle.com" type="cite"> <br>
        <blockquote cite="mid:559415DD.7030604@oracle.com" type="cite">
          Have you considered (maybe for a later patch) changing
          YieldingFlexibleWorkGang to<br>
          simply YieldingWorkGang?  The "Flexible" attribute of
          YieldingFlexibleWorkGang having<br>
          been moved into AbstractWorkGang.<br>
        </blockquote>
        <br>
        I thought about it, but didn't think it was important enough to
        warrant that change in this patch. I wouldn't mind if a RFE was
        created to change the name.<br>
      </blockquote>
      <br>
      I'll file the RFE  if you agree with the point that "Flexible"
      describes what AbstractWorkGang does now.<br>
      <br>
      <blockquote cite="mid:55952520.6030609@oracle.com" type="cite"> <br>
        <blockquote cite="mid:559415DD.7030604@oracle.com" type="cite">
          <br>
          <a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://cr.openjdk.java.net/%7Estefank/8087323/webrev.00/src/share/vm/gc/cms/yieldingWorkgroup.cpp.frames.html">http://cr.openjdk.java.net/~stefank/8087323/webrev.00/src/share/vm/gc/cms/yieldingWorkgroup.cpp.frames.html</a><br>
          <br>
          Is the cast at 53 necessary? I see it in the original code
          too.<br>
          <br>
          <pre><span class="changed">  50 AbstractGangWorker* YieldingFlexibleWorkGang::allocate_worker(uint which) {</span>
  51   YieldingFlexibleGangWorker* new_member =
  52       new YieldingFlexibleGangWorker(this, which);
  53   return (YieldingFlexibleGangWorker*) new_member;
  54 }</pre>
        </blockquote>
        <br>
        Yes, this is unnecessary.<br>
      </blockquote>
      <br>
      Thanks.<br>
      <br>
      Jon<br>
      <br>
      <blockquote cite="mid:55952520.6030609@oracle.com" type="cite"> <br>
        <blockquote cite="mid:559415DD.7030604@oracle.com" type="cite">
          <br>
          The rest looks good.<br>
        </blockquote>
        <br>
        Thanks.<br>
        <br>
        <blockquote cite="mid:559415DD.7030604@oracle.com" type="cite">
          <br>
          I'll do the second patch next.<br>
        </blockquote>
        <br>
        Great.<br>
        <br>
        StefanK<br>
        <br>
        <blockquote cite="mid:559415DD.7030604@oracle.com" type="cite">
          <br>
          Jon<br>
          <br>
          <blockquote cite="mid:557AF21B.2090102@oracle.com" type="cite">
            <br>
            <br>
            The second patch I'd like to get reviewed is:<br>
             <a moz-do-not-send="true" class="moz-txt-link-freetext"
              href="http://cr.openjdk.java.net/%7Estefank/8087324/webrev.00/">http://cr.openjdk.java.net/~stefank/8087324/webrev.00/</a><br>
             <a moz-do-not-send="true" class="moz-txt-link-freetext"
              href="https://bugs.openjdk.java.net/browse/JDK-8087324">https://bugs.openjdk.java.net/browse/JDK-8087324</a>
            - Use semaphores when starting and stopping GC task threads
            <br>
            <br>
            It first simplifies the way we distribute the tasks to the
            GC worker threads. For example, the coordinator thread
            dispatches a task to a specific number of workers, and then
            waits for all work to be completed. There's no risk that
            multiple tasks will be scheduled simultaneously, so there's
            no need for the sequences number that is used in the current
            implementation.<br>
            <br>
            The patch contains two task dispatch / thread
            synchronization implementations:<br>
            <br>
            The first implementation uses Monitors, similar to what we
            did before the patch, but with a slightly lower overhead
            since the code calls notify_all less often. It still suffers
            from the "thundering heard" problem. When the coordinator
            thread signals that the worker threads should start, they
            all wake up from Monitor::wait and they all try to lock the
            Monitor.<br>
            <br>
            The second, and the more interesting, implementation uses
            semaphores. When the worker threads wake up from the
            semaphore wait, they don't have to serialize the execution
            by taking a lock. This greatly decreases the time it takes
            to start and stop the worker threads.<br>
            <br>
            The semaphore implementation is used on all platforms where
            the Semaphore class has been implemented in JDK-8087322. So,
            on some OS:es the code will revert to the Monitor-based
            solution until a Semaphore class has been implemented for
            that OS. So, porters might want to consider implementing the
            Sempahore class.<br>
            <br>
            There's also a diagnostic vm option
            (-XX:+/-UseSemaphoreGCThreadsSynchronization) to turn off
            the Semaphore-based implementation, which can be used to
            debug this new code. It's mainly targeted towards support
            and sustaining engineering.<br>
            <br>
            <br>
            The patches have been performance tested on Linux, Solaris,
            OSX, and Windows.<br>
            <br>
            The effects of the patch can be seen by running benchmarks
            with small young gen sizes, which triggers frequent and
            short GCs.<br>
            <br>
            For example, here are runs from the SPECjvm2008
            xml.transform benchmark with:<br>
            -Xmx1g -Xms1g -Xmn64m -XX:+PrintGC -XX:+UseG1GC -jar
            SPECjvm2008.jar -ikv xml.transform -it 30 -wt 30<br>
            <br>
            I got the following GC times:<br>
            <br>
            <tt>            Average    Median    99.9 percentile   Max</tt><tt><br>
            </tt><tt>Baseline:   </tt>
            <meta http-equiv="content-type" content="text/html;
              charset=UTF-8">
            <title></title>
            <meta name="generator" content="LibreOffice 4.4.0.3 (Linux)">
            <style type="text/css">
                body,div,table,thead,tbody,tfoot,tr,th,td,p { font-family:"Liberation Sans"; font-size:x-small }
        </style><tt> </tt>
            <meta http-equiv="content-type" content="text/html;
              charset=UTF-8">
            <title></title>
            <meta name="generator" content="LibreOffice 4.4.0.3 (Linux)">
            <style type="text/css">
                body,div,table,thead,tbody,tfoot,tr,th,td,p { font-family:"Liberation Sans"; font-size:x-small } </style><tt>8.76</tt><tt>
              ms    8.44 ms   25.9 ms           </tt><tt>3</tt><tt>4.7
              ms</tt><tt><br>
            </tt><tt>Monitor:</tt><tt>    6.1</tt><tt>7 </tt><tt>ms   
            </tt><tt>5.88 ms   26.0 ms           </tt><tt>49.1 ms</tt><tt><br>
            </tt><tt> </tt>
            <meta http-equiv="content-type" content="text/html;
              charset=UTF-8">
            <title></title>
            <meta name="generator" content="LibreOffice 4.4.0.3 (Linux)">
            <style type="text/css">
                body,div,table,thead,tbody,tfoot,tr,th,td,p { font-family:"Liberation Sans"; font-size:x-small }
        </style><tt>Semaphore:  </tt><tt>3.</tt><tt>43 ms    </tt><tt>3.26 </tt><tt>ms




                13.4 ms           33.4 ms</tt><tt><br>
            </tt><br>
            If I run an empty GC task 10 times per GC, by running the
            following code:<br>
            <a moz-do-not-send="true" class="moz-txt-link-freetext"
              href="http://cr.openjdk.java.net/%7Estefank/8087324/timedTask/">http://cr.openjdk.java.net/~stefank/8087324/timedTask/</a><br>
            <br>
            I get the following numbers to complete the empty GC tasks:<br>
            <br>
            <tt>            Average    Median    99.9 percentile   Max</tt><tt><br>
            </tt><tt>Baseline:   </tt><tt>1.43 ms    0.92 ms   3.43
              ms           9.30</tt><tt> ms</tt><tt><br>
              Monitor</tt><tt>:</tt><tt>    0.75</tt><tt> </tt><tt>ms   

            </tt><tt>0.72 ms   1.74 ms           2.78</tt><tt> ms</tt><tt><br>
            </tt><tt> </tt><tt>Semaphore:  </tt><tt>0.</tt><tt>07
              ms    </tt><tt>0.07 </tt><tt>ms   0.17 ms           0.26
              ms</tt><tt><br>
            </tt><br>
            <br>
            <br>
            The code has been tested with JPRT and our nightly testing
            suites. <br>
            <br>
            I've created a unit test to run a small test with both the
            semaphore implementation and the monitor implementation:<br>
            <a moz-do-not-send="true" class="moz-txt-link-freetext"
              href="http://cr.openjdk.java.net/%7Estefank/8087324/workgangTest/">http://cr.openjdk.java.net/~stefank/8087324/workgangTest/</a><br>
            <br>
            But since we currently don't have code to shutdown worker
            threads after they have been started, I don't want to push
            this test (or clean it up) until we have that in place. I
            created this bug for that:<br>
            <a moz-do-not-send="true" class="moz-txt-link-freetext"
              href="https://bugs.openjdk.java.net/browse/JDK-8087340">https://bugs.openjdk.java.net/browse/JDK-8087340</a><br>
            <br>
            Thanks,<br>
            StefanK<br>
            <br>
            <br>
            <br>
          </blockquote>
          <br>
        </blockquote>
        <br>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </body>
</html>