[External] : Re: Performance of Pattern Matching for switch (Third Preview)
Jan Lahoda
jan.lahoda at oracle.com
Thu Aug 11 16:25:10 UTC 2022
On 11. 08. 22 15:09, Jordan Zimmerman wrote:
> Hi Jan,
>
> Thanks for the detailed reply. TBH I didn't spend much time on the
> test so your comments are appropriate. I wrote the test after JFR
> reported SwitchBootstrap.typeSwitch as a hotspot in a project I'm
> working on. I think different tests getting different lengths doesn't
> really poison the tests as both implementations have the same chances
> for list sizes and content.
I think the length of the data has a fairly big effect. Because, each
time the whole benchmark is executed, it will generated one set of data
for testEnhancedSwitch, and another set of data for testManualSwitch,
and perform the measurement on this (now static) data. So the data is
not re-generated many times to average out the random differences.
As a particular example (with '.thread(1)' + logging of the data size +
improved PR 9779, but otherwise unmodified benchmark), I ran the whole
benchmark several time, once I got:
testEnhancedSwitch - data size: 1117
testManualSwitch - data size: 1510
results:
TestEnhancedSwitch.testEnhancedSwitch thrpt 5 85437.814 ± 7840.590
ops/s
TestEnhancedSwitch.testManualSwitch thrpt 5 56473.669 ± 632.442
ops/s
And another time, I got:
testEnhancedSwitch - data size: 1988
testManualSwitch - data size: 1735
results:
TestEnhancedSwitch.testEnhancedSwitch thrpt 5 43699.620 ± 6157.698
ops/s
TestEnhancedSwitch.testManualSwitch thrpt 5 50338.482 ± 6817.907
ops/s
So, the (random) data size apparently has a quite significant impact on
the results.
>
> > I wonder how much effect has the use of ConcurrentHashMap
>
> I tried the test with both a simple HashMap and ConcurrentHashMap and
> the delta was similar as I recall.
Looking at the image from JFR, I see that the test is spending
significantly more time in ConcurrentHashMap.get than in doTypeSwitch.
So while that should not affect the relative order, it probably has an
effect on the precision of the benchmark.
Jan
>
> PR 9779 looks promising. Anyway, as a Java user I would expect that
> the compiler can write better code than I can manually FWIW.
>
> Cheers.
>
> -Jordan
>
>
>> On Aug 11, 2022, at 1:26 PM, Jan Lahoda <jan.lahoda at oracle.com> wrote:
>>
>> Hi Jordan,
>>
>>
>> Thanks for the report. Yes, the performance of various pattern
>> matching switches is something that we'd like to improve, which is a
>> task that will probably take a while. Currently, one PR relevant to
>> your benchmark is:
>>
>> https://github.com/openjdk/jdk/pull/9779
>>
>>
>> Looking at the benchmark, I have a few comments/questions:
>>
>> 1. I see the "Data" generate the test List of a random length between
>> 1000 and 2000, but as far as I can tell, different testcases will get
>> a List of a different length. So the testcases are not really the
>> same, as their input has a different length. Do I miss something here?
>>
>> 2. The actual content of the List is also random, but, again, the
>> content is not the same for all the testcases, which I believe could
>> skew the results (consider input data which could have a majority of
>> Fruit.Apple, and a different set of data which would have a majority
>> of Fruit.Pear - the tasks to solve this is not the same). The effect
>> of this is probably limited, though.
>>
>> 3. The test uses 4 threads, but when I run it with this setting, the
>> error margins are very wide, making the results much less reliable
>> (per my understanding). Which may be a consequence of the limited
>> amount (4 physical) of cores available on my laptop.
>>
>>
>> I've tweaked the test to use input data of length 1000 for all cases,
>> and new Random(0) to generate the data.
>>
>>
>> The for one thread (testEnhancedSwitch uses the code from PR 9779,
>> testEnhancedSwitchLegacy uses the code currently in the mainline,
>> testManualSwitch is the same as in your testcase):
>>
>> TestEnhancedSwitch.testEnhancedSwitch thrpt 5 95020.310 ±
>> 689.833 ops/s
>> TestEnhancedSwitch.testEnhancedSwitchLegacy thrpt 5 68175.714 ±
>> 2245.512 ops/s
>> TestEnhancedSwitch.testManualSwitch thrpt 5 102640.203 ±
>> 2384.880 ops/s
>>
>> And for two threads:
>>
>> TestEnhancedSwitch.testEnhancedSwitch thrpt 5 47714.842 ±
>> 2206.843 ops/s
>> TestEnhancedSwitch.testEnhancedSwitchLegacy thrpt 5 47080.128 ±
>> 1679.960 ops/s
>> TestEnhancedSwitch.testManualSwitch thrpt 5 41116.334 ±
>> 4938.590 ops/s
>>
>>
>> (In the multi threaded mode, I wonder how much effect has the use of
>> ConcurrentHashMap.)
>>
>>
>> Thanks,
>>
>> Jan
>>
>>
>> On 10. 08. 22 12:04, Jordan Zimmerman wrote:
>>> Hi Folks,
>>>
>>> I've been experimenting with Pattern Matching for switch (Third
>>> Preview). I noticed that the performance of these enhanced switches
>>> is far worse than manual matching. Is this due to this only being a
>>> preview and optimizations have yet to be done? Anyway, I thought I'd
>>> mention what I found as an FYI.
>>>
>>> Here's the jmh benchmark I used:
>>> https://gist.github.com/Randgalt/a68ceee62cd8127431cbe6e7afbfdf44
>>>
>>> Here are the results:
>>>
>>> Benchmark Mode Cnt Score Error
>>> Units
>>> TestEnhancedSwitch.testEnhancedSwitch thrpt 5 30789.482 ±
>>> 17667.365 ops/s
>>> TestEnhancedSwitch.testManualSwitch thrpt 5 44651.612 ±
>>> 5135.641 ops/s
>>>
>>> Cheers.
>>>
>>> -Jordan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20220811/0aea8da3/attachment-0001.htm>
More information about the amber-dev
mailing list