[External] : Re: Performance of Pattern Matching for switch (Third Preview)

Thu Aug 11 16:25:10 UTC 2022

On 11. 08. 22 15:09, Jordan Zimmerman wrote:

> Hi Jan,
>
> Thanks for the detailed reply. TBH I didn't spend much time on the 
> test so your comments are appropriate. I wrote the test after JFR 
> reported SwitchBootstrap.typeSwitch as a hotspot in a project I'm 
> working on. I think different tests getting different lengths doesn't 
> really poison the tests as both implementations have the same chances 
> for list sizes and content.

I think the length of the data has a fairly big effect. Because, each 
time the whole benchmark is executed, it will generated one set of data 
for testEnhancedSwitch, and another set of data for testManualSwitch, 
and perform the measurement on this (now static) data. So the data is 
not re-generated many times to average out the random differences.

As a particular example (with '.thread(1)' + logging of the data size + 
improved PR 9779, but otherwise unmodified benchmark), I ran the whole 
benchmark several time, once I got:

testEnhancedSwitch - data size: 1117

testManualSwitch - data size: 1510

results:

TestEnhancedSwitch.testEnhancedSwitch  thrpt    5  85437.814 ± 7840.590  
ops/s
TestEnhancedSwitch.testManualSwitch    thrpt    5  56473.669 ± 632.442  
ops/s

And another time, I got:

testEnhancedSwitch - data size: 1988

testManualSwitch - data size: 1735

results:

TestEnhancedSwitch.testEnhancedSwitch  thrpt    5  43699.620 ± 6157.698  
ops/s
TestEnhancedSwitch.testManualSwitch    thrpt    5  50338.482 ± 6817.907  
ops/s

So,  the (random) data size apparently has a quite significant impact on 
the results.

>
> > I wonder how much effect has the use of ConcurrentHashMap
>
> I tried the test with both a simple HashMap and ConcurrentHashMap and 
> the delta was similar as I recall.

Looking at the image from JFR, I see that the test is spending 
significantly more time in ConcurrentHashMap.get than in doTypeSwitch. 
So while that should not affect the relative order, it probably has an 
effect on the precision of the benchmark.

Jan

>
> PR 9779 looks promising. Anyway, as a Java user I would expect that 
> the compiler can write better code than I can manually FWIW.
>
> Cheers.
>
> -Jordan
>
>
>> On Aug 11, 2022, at 1:26 PM, Jan Lahoda <jan.lahoda at oracle.com> wrote:
>>
>> Hi Jordan,
>>
>>
>> Thanks for the report. Yes, the performance of various pattern 
>> matching switches is something that we'd like to improve, which is a 
>> task that will probably take a while. Currently, one PR relevant to 
>> your benchmark is:
>>
>> https://github.com/openjdk/jdk/pull/9779
>>
>>
>> Looking at the benchmark, I have a few comments/questions:
>>
>> 1. I see the "Data" generate the test List of a random length between 
>> 1000 and 2000, but as far as I can tell, different testcases will get 
>> a List of a different length. So the testcases are not really the 
>> same, as their input has a different length. Do I miss something here?
>>
>> 2. The actual content of the List is also random, but, again, the 
>> content is not the same for all the testcases, which I believe could 
>> skew the results (consider input data which could have a majority of 
>> Fruit.Apple, and a different set of data which would have a majority 
>> of Fruit.Pear - the tasks to solve this is not the same). The effect 
>> of this is probably limited, though.
>>
>> 3. The test uses 4 threads, but when I run it with this setting, the 
>> error margins are very wide, making the results much less reliable 
>> (per my understanding). Which may be a consequence of the limited 
>> amount (4 physical) of cores available on my laptop.
>>
>>
>> I've tweaked the test to use input data of length 1000 for all cases, 
>> and new Random(0) to generate the data.
>>
>>
>> The for one thread (testEnhancedSwitch uses the code from PR 9779, 
>> testEnhancedSwitchLegacy uses the code currently in the mainline, 
>> testManualSwitch is the same as in your testcase):
>>
>> TestEnhancedSwitch.testEnhancedSwitch thrpt    5   95020.310 ±  
>> 689.833  ops/s
>> TestEnhancedSwitch.testEnhancedSwitchLegacy  thrpt 5   68175.714 ± 
>> 2245.512  ops/s
>> TestEnhancedSwitch.testManualSwitch          thrpt 5  102640.203 ± 
>> 2384.880  ops/s
>>
>> And for two threads:
>>
>> TestEnhancedSwitch.testEnhancedSwitch thrpt    5  47714.842 ± 
>> 2206.843  ops/s
>> TestEnhancedSwitch.testEnhancedSwitchLegacy  thrpt 5  47080.128 ± 
>> 1679.960  ops/s
>> TestEnhancedSwitch.testManualSwitch          thrpt 5  41116.334 ± 
>> 4938.590  ops/s
>>
>>
>> (In the multi threaded mode, I wonder how much effect has the use of 
>> ConcurrentHashMap.)
>>
>>
>> Thanks,
>>
>>     Jan
>>
>>
>> On 10. 08. 22 12:04, Jordan Zimmerman wrote:
>>> Hi Folks,
>>>
>>> I've been experimenting with Pattern Matching for switch (Third 
>>> Preview). I noticed that the performance of these enhanced switches 
>>> is far worse than manual matching. Is this due to this only being a 
>>> preview and optimizations have yet to be done? Anyway, I thought I'd 
>>> mention what I found as an FYI.
>>>
>>> Here's the jmh benchmark I used:
>>> https://gist.github.com/Randgalt/a68ceee62cd8127431cbe6e7afbfdf44
>>>
>>> Here are the results:
>>>
>>> Benchmark                               Mode  Cnt  Score       Error 
>>>  Units
>>> TestEnhancedSwitch.testEnhancedSwitch  thrpt    5  30789.482 ± 
>>> 17667.365  ops/s
>>> TestEnhancedSwitch.testManualSwitch    thrpt    5  44651.612 ± 
>>>  5135.641  ops/s
>>>
>>> Cheers.
>>>
>>> -Jordan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20220811/0aea8da3/attachment-0001.htm>