[intrinsics] performance improvements for the intrinsified version of Objects::hash

Vicente Romero vicente.romero at oracle.com
Wed Feb 27 04:28:15 UTC 2019


Hi all,

I have investigated further about the degradation of the intrinsified 
version Objects::hash for reference types. I have made performance 
measures for different number of arguments. Please see the results 
attached. At least on my PC it seems like there is a cliff from 60 to 70 
arguments. Up to 60 the intrinsified version is faster than vanilla 
JDK13 but at 70 and on the intrinsified version start being slower. 
Interesting, also if the current implementation starts being worst 
starting at 70 non-primitive arguments, that seems like a very good 
compromise.

Thanks,
Vicente

On 2/26/19 8:49 PM, Vicente Romero wrote:
> Hi all,
>
> I have just pushed [1] which improves the performance of the 
> intrinsified version of Objects::hash in almost all of our performance 
> test cases. This is a big improvement compared to the previous state 
> but there is still work to be done. Please find attached a file with 
> the benchmark results. It includes the performance numbers obtained 
> with the intrinsics repo as of 02/22 plus the ones obtained, almost 
> now :), after pushing [1]. As it can be seen there is a noticeable 
> improvement in the performance. In the last performance measurement we 
> found a noticeable degradation in performance for large number of 
> arguments (~100), even for primitive types. Patch [1] improves the 
> performance for both primitive and reference types with the difference 
> that now the performance is much better than vanilla JDK13 for 
> primitive types but it is still worst than vanilla for reference 
> types. Although we are in better shape now compared to the state as of 
> 02/22. Keep tuned :)
>
> Thanks,
> Vicente
>
> [1] http://hg.openjdk.java.net/amber/amber/rev/0f40d5752eb9
>
> On 2/22/19 4:46 PM, Vicente Romero wrote:
>> Hi,
>>
>> To complete the picture please find attached the performance results 
>> for Objects.hash for a number of experiments. In general they don't 
>> look as good as the ones for String::format. In general it seems like 
>> there is no much gain unless the number of parameters is large and 
>> all the parameters are constants. This is understandable because the 
>> compiler generates an LDC of the result. In all other cases the 
>> performance is just a bit better or a lot worst.
>>
>> Thanks,
>> Vicente
>>
>> On 2/22/19 12:33 PM, Vicente Romero wrote:
>>> Hi,
>>>
>>> I have executed some performance tests on the intrinsics code to 
>>> compare the before and after. Please find the benchmark results and 
>>> the JMH based benchmark attached. This benchmark is based on a 
>>> previous one written by Hannes. The benchmark compares the execution 
>>> between the JDK built from [1], referred here as JDK13, and [2] 
>>> which is the amber repo, branch `intrinsics-project`.
>>>
>>> Some conclusions from the benchmark results:
>>>
>>>   * the intrinsified code is faster in all cases, for which
>>>     intrinsified code is produced, compared to the legit (JDK13
>>>     vanilla) code
>>>   * there are wide variations though
>>>
>>> For example for the test: `testStringFormatBoxedArray` which is 
>>> basically benchmarking the performance of: `String.format("%s: %d ", 
>>> args);` where args is: `static final Object[] args = { "Bob", i23 
>>> };`, there is basically no visible gain as in this case the 
>>> intrinsification is bailing out and producing same code as vanilla 
>>> JDK13. This result is expected. The next test with not so much gain 
>>> is: `testStringFormat1ConstantFloat` which is testing:
>>>
>>>     `String.format("%g", 1.0)`
>>>
>>> the execution is ~2.5 times faster in the intrinsified version but 
>>> nothing compared to: `testStringFormat1ConstantStr` which is ~40 
>>> times faster. Another interesting conclusion is that the improvement 
>>> fades out with the number of parameters for some cases but keeps 
>>> constant for others. For example it is as fast to concatenate 1 or 
>>> 100 strings but formating one primitive int is ~45 times faster vs a 
>>> 3.5 improvement when formating a hundred.
>>>
>>> I have also attached the table I used to play with the numbers.
>>>
>>> Thanks,
>>>
>>> Vicente
>>>
>>> [1] http://hg.openjdk.java.net/jdk/jdk
>>>
>>> [2] http://hg.openjdk.java.net/amber/amber
>>>
>>
>

-------------- next part --------------
Test	                                    Intrinsics_02_22	    JDK13 Vanilla	    Intrinsics_02_26 / Vanilla
FormatterBenchmark.testHash1Reference	        35672.102	            38295.161	        0.9315041658
FormatterBenchmark.testHash2References	        16526.471	            12370.46	        1.3359625269
FormatterBenchmark.testHash10References	        9419.447	            4195.901	        2.2449164077
FormatterBenchmark.testHash40References	        1566.458	            1033.043	        1.5163531431
FormatterBenchmark.testHash50References	        1288.541	            864.578	            1.4903698683
FormatterBenchmark.testHash60References	        874.804	                815.057	            1.0733040757
FormatterBenchmark.testHash70References	        34.44	                626.605	            0.0549628554
FormatterBenchmark.testHash80References	        29.651	                534.358	            0.0554890167
FormatterBenchmark.testHash90References	        36.607	                479.613	            0.0763261213
FormatterBenchmark.testHash100References	    32.021	                423.948	            0.0755304896



More information about the amber-dev mailing list