[intrinsics]: performance before after (String::format)

Mon Feb 25 13:14:47 UTC 2019

Hi Claes,

On 2/24/19 8:13 AM, Claes Redestad wrote:
> On 2019-02-23 01:36, Vicente Romero wrote:
>>
>>
>> On 2/22/19 4:59 PM, Alex Buckley wrote:
>>> On 2/22/2019 1:46 PM, Vicente Romero wrote:
>>>> To complete the picture please find attached the performance 
>>>> results for
>>>> Objects.hash for a number of experiments. In general they don't 
>>>> look as
>>>> good as the ones for String::format. In general it seems like there is
>>>> no much gain unless the number of parameters is large and all the
>>>> parameters are constants. This is understandable because the compiler
>>>> generates an LDC of the result. In all other cases the performance is
>>>> just a bit better or a lot worst.
>>>
>>>                  Intrinsified  Vanilla  Speedup
>>> testHash1IntVariable    42564    42799       1x
>>> testHash2IntVariables   41573     9019       5x
>>> testHash100IntVariables     4       27       0.15x
>>>
>>> With a large number of parameters, you might hope that avoiding 
>>> double boxing (int -> Integer -> array store) gives us some win, 
>>> even for non-constant arguments. But something is happening that 
>>> kills the speedup, do you know what it is?
>>
>> I'm doing some research on this, my assumption is that HS was able to 
>> recognize the old pattern but it has issues with the MH graph being 
>> generated now. It could be that some nodes in the graph are more 
>> opaque. But this is just my opinion
>
> If I were to guess you're hitting some JIT limit - likely inlining-
> related - which cause a miscompilation at some point.. I've been
> mulling over whether we in general need to build in heuristics into our
> BSMs to generate simpler shapes once the number of arguments grow,
> e.g., only specialize for the first N arguments and emit a call to
> Objects.hash(Object[]) for the remainder.

right that should be playing a great part here, choosing an N as 
threshold could be a good compromise, the issue is that even for small 
Ns the improvement is almost negligible and it is type dependent.

>
> What value N and how to gracefully downgrade to a simpler implementation
> is implementation dependent, and might even be chosen differently
> depending on whether you're optimizing for peak performance or
> startup/footprint, as a smaller N could reduce potential for a BSM to
> emit combinatorially explosivs MH graphs.

thanks for your evaluation, very helpful!

>
> /Claes
Vicente