[intrinsics] performance improvements for the intrinsified version of Objects::hash
Vicente Romero
vicente.romero at oracle.com
Thu Feb 28 13:12:11 UTC 2019
On 2/27/19 8:18 PM, Alex Buckley wrote:
> Believing that the second column is intended to be "Intrinsics_02_26",
> not "Intrinsics_02_22":
that's correct sorry for the mistake on the column naming
>
> The speedups for reference variables get worse with more arguments
> (though they may still be faster than vanilla invocation for a good
> while), and the speedups for primitive variables get better with more
> arguments.
>
> One metric is how many variables can be passed and still have
> intrinsification offer a speedup relative to vanilla invocation. (The
> cliff between 60 and 70.) Another metric is how many variables can be
> passed before the speedup stops growing, even if intrinsification is
> always faster than vanilla invocation. (The global maximum of
> performance, between 10 and 40.) Presumably, each metric is governed
> by a different factor.
right good analysis, I will do some more research to try to see where
the execution time is going to
>
> Alex
Vicente
>
> On 2/26/2019 8:28 PM, Vicente Romero wrote:
>> Hi all,
>>
>> I have investigated further about the degradation of the intrinsified
>> version Objects::hash for reference types. I have made performance
>> measures for different number of arguments. Please see the results
>> attached. At least on my PC it seems like there is a cliff from 60 to 70
>> arguments. Up to 60 the intrinsified version is faster than vanilla
>> JDK13 but at 70 and on the intrinsified version start being slower.
>> Interesting, also if the current implementation starts being worst
>> starting at 70 non-primitive arguments, that seems like a very good
>> compromise.
>>
>> Thanks,
>> Vicente
>>
>> On 2/26/19 8:49 PM, Vicente Romero wrote:
>>> Hi all,
>>>
>>> I have just pushed [1] which improves the performance of the
>>> intrinsified version of Objects::hash in almost all of our performance
>>> test cases. This is a big improvement compared to the previous state
>>> but there is still work to be done. Please find attached a file with
>>> the benchmark results. It includes the performance numbers obtained
>>> with the intrinsics repo as of 02/22 plus the ones obtained, almost
>>> now :), after pushing [1]. As it can be seen there is a noticeable
>>> improvement in the performance. In the last performance measurement we
>>> found a noticeable degradation in performance for large number of
>>> arguments (~100), even for primitive types. Patch [1] improves the
>>> performance for both primitive and reference types with the difference
>>> that now the performance is much better than vanilla JDK13 for
>>> primitive types but it is still worst than vanilla for reference
>>> types. Although we are in better shape now compared to the state as of
>>> 02/22. Keep tuned :)
>>>
>>> Thanks,
>>> Vicente
>>>
>>> [1] http://hg.openjdk.java.net/amber/amber/rev/0f40d5752eb9
>>>
>>> On 2/22/19 4:46 PM, Vicente Romero wrote:
>>>> Hi,
>>>>
>>>> To complete the picture please find attached the performance results
>>>> for Objects.hash for a number of experiments. In general they don't
>>>> look as good as the ones for String::format. In general it seems like
>>>> there is no much gain unless the number of parameters is large and
>>>> all the parameters are constants. This is understandable because the
>>>> compiler generates an LDC of the result. In all other cases the
>>>> performance is just a bit better or a lot worst.
>>>>
>>>> Thanks,
>>>> Vicente
>>>>
>>>> On 2/22/19 12:33 PM, Vicente Romero wrote:
>>>>> Hi,
>>>>>
>>>>> I have executed some performance tests on the intrinsics code to
>>>>> compare the before and after. Please find the benchmark results and
>>>>> the JMH based benchmark attached. This benchmark is based on a
>>>>> previous one written by Hannes. The benchmark compares the execution
>>>>> between the JDK built from [1], referred here as JDK13, and [2]
>>>>> which is the amber repo, branch `intrinsics-project`.
>>>>>
>>>>> Some conclusions from the benchmark results:
>>>>>
>>>>> * the intrinsified code is faster in all cases, for which
>>>>> intrinsified code is produced, compared to the legit (JDK13
>>>>> vanilla) code
>>>>> * there are wide variations though
>>>>>
>>>>> For example for the test: `testStringFormatBoxedArray` which is
>>>>> basically benchmarking the performance of: `String.format("%s: %d ",
>>>>> args);` where args is: `static final Object[] args = { "Bob", i23
>>>>> };`, there is basically no visible gain as in this case the
>>>>> intrinsification is bailing out and producing same code as vanilla
>>>>> JDK13. This result is expected. The next test with not so much gain
>>>>> is: `testStringFormat1ConstantFloat` which is testing:
>>>>>
>>>>> `String.format("%g", 1.0)`
>>>>>
>>>>> the execution is ~2.5 times faster in the intrinsified version but
>>>>> nothing compared to: `testStringFormat1ConstantStr` which is ~40
>>>>> times faster. Another interesting conclusion is that the improvement
>>>>> fades out with the number of parameters for some cases but keeps
>>>>> constant for others. For example it is as fast to concatenate 1 or
>>>>> 100 strings but formating one primitive int is ~45 times faster vs a
>>>>> 3.5 improvement when formating a hundred.
>>>>>
>>>>> I have also attached the table I used to play with the numbers.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Vicente
>>>>>
>>>>> [1] http://hg.openjdk.java.net/jdk/jdk
>>>>>
>>>>> [2] http://hg.openjdk.java.net/amber/amber
>>>>>
>>>>
>>>
>>
More information about the amber-dev
mailing list