[intrinsics] performance improvements for the intrinsified version of Objects::hash

Vicente Romero vicente.romero at oracle.com
Tue Mar 5 15:31:14 UTC 2019



On 3/5/19 9:02 AM, Hannes Wallnöfer wrote:
> Vicente,
>
> could it be that your your change in Object::hash bootstraps[1] mostly benefits invocations with very large numbers of parameters (like your original 100 parameter tests) but hurts performance with medium-to-lower number of parameters? I don’t have your latest benchmark sources, but I did some quick tests such as a testHash5Ints5Strings that suggest that may be the case.

that could be, but the intrinsified version is still faster for those 
cases with small number of arguments. That's probably why I have focused 
on the larger number of argument case but we can change priorities or 
even have different callsites depending on the number of arguments

>
> [1] http://hg.openjdk.java.net/amber/amber/rev/0f40d5752eb9
>
> Hannes

Vicente

>
>
>> Am 05.03.2019 um 03:52 schrieb Vicente Romero <vicente.romero at oracle.com>:
>>
>>
>>
>> On 3/4/19 8:11 PM, Alex Buckley wrote:
>>> // Adopting a zero-decimal-places policy because precision to multiple decimal places is less important than accuracy and repeatability.
>>>
>>> On 3/4/2019 4:28 PM, Vicente Romero wrote:
>>>> I have uploaded another round of experiments for Objects::hash, see [1].
>>>> The main variation I have included a variant of most of the tests in
>>>> which instead of invoking Objects::hash 10 times sequentially, the same
>>>> invocation occurs inside a loop which is executed 10 times. This shows
>>>> that when the call site is reused, the execution time trumps vanilla
>>>> JDK13 most of the time.
>>> That's not really the story though :-) Yes, the *Int*StringsLoop10 tests run faster with intrinsified invocation than with vanilla invocation, but generally, the *Int*StringsLoop10 tests enjoy less impressive speedups than the *Int*Strings tests. (Example: 25Int25Strings gets a 21x speedup, but 25Int25StringsLoop10 only gets a 2x speedup.)
>>>
>>> This is because the *Int*StringsLoop10 tests already run faster on vanilla JDK 13 than the *Int*Strings tests, presumably thanks to inlining ("the call site is reused").
>>>
>>> I guess that 1IntLoop10, 2IntsLoop10, and 2Ints2StringsLoop10 would have such high throughput on vanilla JDK 13 that their speedups with intrinsification might be significantly <1.
>> not in all cases, see [1] the new information is highlighted in yellow
>>> Alex
>> Vicente
>>
>> [1] http://cr.openjdk.java.net/~vromero/intrinsics_benchmark_results/benchmarkResults_intrinsics_all_data_v4.html



More information about the amber-dev mailing list