<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <p>On 11. 08. 22 15:09, Jordan Zimmerman wrote:<br>

    </p>

    <blockquote type="cite" cite="mid:50AB1F4A-E259-4A79-B028-4DA177713535@jordanzimmerman.com">

      

      Hi Jan,

      <div class=""><br class="">

      </div>

      <div class="">Thanks for the detailed reply. TBH I didn't spend

        much time on the test so your comments are appropriate. I wrote

        the test after JFR reported <span style="caret-color: rgb(36,

          41, 47); color: rgb(36, 41, 47); font-family: ui-monospace,

          SFMono-Regular, "SF Mono", Menlo, Consolas,

          "Liberation Mono", monospace; font-size:

          11.899999618530273px; background-color: rgba(175, 184, 193,

          0.2);" class="">SwitchBootstrap.typeSwitch</span> as a hotspot

        in a project I'm working on. I think different tests getting

        different lengths doesn't really poison the tests as both

        implementations have the same chances for list sizes and

        content.</div>

    </blockquote>

    <p><br>

    </p>

    <p>I think the length of the data has a fairly big effect. Because,

      each time the whole benchmark is executed, it will generated one

      set of data for testEnhancedSwitch, and another set of data for

      testManualSwitch, and perform the measurement on this (now static)

      data. So the data is not re-generated many times to average out

      the random differences.</p>

    <p><br>

    </p>

    <p>As a particular example (with '.thread(1)' + logging of the data

      size + improved PR 9779, but otherwise unmodified benchmark), I

      ran the whole benchmark several time, once I got:</p>

    <p>testEnhancedSwitch - data size: 1117</p>

    <p>testManualSwitch - data size: 1510</p>

    <p>results:</p>

    <p>TestEnhancedSwitch.testEnhancedSwitch  thrpt    5  85437.814 ±

      7840.590  ops/s<br>

      TestEnhancedSwitch.testManualSwitch    thrpt    5  56473.669 ± 

      632.442  ops/s<br>

      <br>

    </p>

    <p><br>

    </p>

    <p>And another time, I got:</p>

    <p>testEnhancedSwitch - data size: 1988</p>

    <p>testManualSwitch - data size: 1735</p>

    <p>results:</p>

    <p>TestEnhancedSwitch.testEnhancedSwitch  thrpt    5  43699.620 ±

      6157.698  ops/s<br>

      TestEnhancedSwitch.testManualSwitch    thrpt    5  50338.482 ±

      6817.907  ops/s<br>

    </p>

    <p><br>

    </p>

    <p>So,  the (random) data size apparently has a quite significant

      impact on the results.</p>

    <p><br>

    </p>

    <blockquote type="cite" cite="mid:50AB1F4A-E259-4A79-B028-4DA177713535@jordanzimmerman.com">

      <div class="">

        <div><br class="">

        </div>

        <div>> I wonder how much effect has the use of

          ConcurrentHashMap</div>

        <div><br class="">

        </div>

        <div>I tried the test with both a simple HashMap and

          ConcurrentHashMap and the delta was similar as I recall.</div>

      </div>

    </blockquote>

    <p><br>

    </p>

    <p>Looking at the image from JFR, I see that the test is spending

      significantly more time in ConcurrentHashMap.get than in

      doTypeSwitch. So while that should not affect the relative order,

      it probably has an effect on the precision of the benchmark.</p>

    <p><br>

    </p>

    <p>Jan<br>

    </p>

    <p><br>

    </p>

    <blockquote type="cite" cite="mid:50AB1F4A-E259-4A79-B028-4DA177713535@jordanzimmerman.com">

      <div class="">

        <div><br class="">

        </div>

        <div>PR 9779 looks promising. Anyway, as a Java user I would

          expect that the compiler can write better code than I can

          manually FWIW.</div>

      </div>

    </blockquote>

    <blockquote type="cite" cite="mid:50AB1F4A-E259-4A79-B028-4DA177713535@jordanzimmerman.com">

      <div class="">

        <div><br class="">

        </div>

        <div>Cheers.</div>

        <div><br class="">

        </div>

        <div>-Jordan</div>

        <div><br class="">

        </div>

        <div><br class="">

          <blockquote type="cite" class="">

            <div class="">On Aug 11, 2022, at 1:26 PM, Jan Lahoda <<a href="mailto:jan.lahoda@oracle.com" class="moz-txt-link-freetext" moz-do-not-send="true">jan.lahoda@oracle.com</a>>

              wrote:</div>

            <br class="Apple-interchange-newline">

            <div class="">

              <div class="">

                <p class="">Hi Jordan,</p>

                <p class=""><br class="">

                </p>

                <p class="">Thanks for the report. Yes, the performance

                  of various pattern matching switches is something that

                  we'd like to improve, which is a task that will

                  probably take a while. Currently, one PR relevant to

                  your benchmark is:</p>

                <p class=""><a class="moz-txt-link-freetext" href="https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/9779__;!!ACWV5N9M2RV99hQ!Nehj9qIam0olQgIzMrtV32YHWJcDifTCVg1D9hVxC2TLob-7mocqYBJJVubG8WVtNNfH0TiQA8yPTK_NyR8TZJg$" moz-do-not-send="true">https://github.com/openjdk/jdk/pull/9779</a></p>

                <p class=""><br class="">

                </p>

                <p class="">Looking at the benchmark, I have a few

                  comments/questions:</p>

                <p class="">1. I see the "Data" generate the test List

                  of a random length between 1000 and 2000, but as far

                  as I can tell, different testcases will get a List of

                  a different length. So the testcases are not really

                  the same, as their input has a different length. Do I

                  miss something here?</p>

                <p class="">2. The actual content of the List is also

                  random, but, again, the content is not the same for

                  all the testcases, which I believe could skew the

                  results (consider input data which could have a

                  majority of Fruit.Apple, and a different set of data

                  which would have a majority of Fruit.Pear - the tasks

                  to solve this is not the same). The effect of this is

                  probably limited, though.<br class="">

                </p>

                <p class="">3. The test uses 4 threads, but when I run

                  it with this setting, the error margins are very wide,

                  making the results much less reliable (per my

                  understanding). Which may be a consequence of the

                  limited amount (4 physical) of cores available on my

                  laptop.</p>

                <p class=""><br class="">

                </p>

                <p class="">I've tweaked the test to use input data of

                  length 1000 for all cases, and new Random(0) to

                  generate the data.</p>

                <p class=""><br class="">

                </p>

                <p class="">The for one thread (testEnhancedSwitch uses

                  the code from PR 9779, testEnhancedSwitchLegacy uses

                  the code currently in the mainline, testManualSwitch

                  is the same as in your testcase):</p>

                <p class="">TestEnhancedSwitch.testEnhancedSwitch       

                  thrpt    5   95020.310 ±  689.833  ops/s<br class="">

                  TestEnhancedSwitch.testEnhancedSwitchLegacy  thrpt   

                  5   68175.714 ± 2245.512  ops/s<br class="">

                  TestEnhancedSwitch.testManualSwitch          thrpt   

                  5  102640.203 ± 2384.880  ops/s<br class="">

                </p>

                <p class="">And for two threads:</p>

                <p class="">TestEnhancedSwitch.testEnhancedSwitch       

                  thrpt    5  47714.842 ± 2206.843  ops/s<br class="">

                  TestEnhancedSwitch.testEnhancedSwitchLegacy  thrpt   

                  5  47080.128 ± 1679.960  ops/s<br class="">

                  TestEnhancedSwitch.testManualSwitch          thrpt   

                  5  41116.334 ± 4938.590  ops/s<br class="">

                </p>

                <p class=""><br class="">

                </p>

                <p class="">(In the multi threaded mode, I wonder how

                  much effect has the use of ConcurrentHashMap.)</p>

                <p class=""><br class="">

                </p>

                <p class="">Thanks,</p>

                <p class="">    Jan</p>

                <p class=""><br class="">

                </p>

                <div class="moz-cite-prefix">On 10. 08. 22 12:04, Jordan

                  Zimmerman wrote:<br class="">

                </div>

                <blockquote type="cite" cite="mid:D4C8E4D0-3FEB-4F4B-A863-0B44CFC0B13F@jordanzimmerman.com" class=""> Hi Folks,

                  <div class=""><br class="">

                  </div>

                  <div class="">I've been experimenting with Pattern

                    Matching for switch (Third Preview). I noticed that

                    the performance of these enhanced switches is far

                    worse than manual matching. Is this due to this only

                    being a preview and optimizations have yet to be

                    done? Anyway, I thought I'd mention what I found as

                    an FYI.</div>

                  <div class=""><br class="">

                  </div>

                  <div class="">Here's the jmh benchmark I used:</div>

                  <div class=""><span class="Apple-tab-span" style="white-space:pre"> </span></div>

                  <div class=""><span class="Apple-tab-span" style="white-space:pre"> </span><a href="https://urldefense.com/v3/__https://gist.github.com/Randgalt/a68ceee62cd8127431cbe6e7afbfdf44__;!!ACWV5N9M2RV99hQ!Nehj9qIam0olQgIzMrtV32YHWJcDifTCVg1D9hVxC2TLob-7mocqYBJJVubG8WVtNNfH0TiQA8yPTK_NMS25uJ0$" class="moz-txt-link-freetext" moz-do-not-send="true">https://gist.github.com/Randgalt/a68ceee62cd8127431cbe6e7afbfdf44</a></div>

                  <div class=""><br class="">

                  </div>

                  <div class="">Here are the results:</div>

                  <div class=""><br class="">

                  </div>

                  <div class="">

                    <div class=""><font class="" face="Courier New">Benchmark

                                                      Mode  Cnt    

                         Score       Error  Units</font></div>

                    <div class=""><font class="" face="Courier New">TestEnhancedSwitch.testEnhancedSwitch

                         thrpt    5  30789.482 ± 17667.365  ops/s</font></div>

                    <div class=""><font class="" face="Courier New">TestEnhancedSwitch.testManualSwitch

                           thrpt    5  44651.612 ±  5135.641  ops/s</font></div>

                  </div>

                  <div class=""><br class="">

                  </div>

                  <div class="">Cheers.</div>

                  <div class=""><br class="">

                  </div>

                  <div class="">-Jordan</div>

                </blockquote>

              </div>

            </div>

          </blockquote>

        </div>

        <br class="">

      </div>

    </blockquote>

  </body>

</html>