[intrinsics] performance improvements for the intrinsified version of Objects::hash

Tue Mar 5 02:52:24 UTC 2019


On 3/4/19 8:11 PM, Alex Buckley wrote:
> // Adopting a zero-decimal-places policy because precision to multiple 
> decimal places is less important than accuracy and repeatability.
>
> On 3/4/2019 4:28 PM, Vicente Romero wrote:
>> I have uploaded another round of experiments for Objects::hash, see [1].
>> The main variation I have included a variant of most of the tests in
>> which instead of invoking Objects::hash 10 times sequentially, the same
>> invocation occurs inside a loop which is executed 10 times. This shows
>> that when the call site is reused, the execution time trumps vanilla
>> JDK13 most of the time.
>
> That's not really the story though :-) Yes, the *Int*StringsLoop10 
> tests run faster with intrinsified invocation than with vanilla 
> invocation, but generally, the *Int*StringsLoop10 tests enjoy less 
> impressive speedups than the *Int*Strings tests. (Example: 
> 25Int25Strings gets a 21x speedup, but 25Int25StringsLoop10 only gets 
> a 2x speedup.)
>
> This is because the *Int*StringsLoop10 tests already run faster on 
> vanilla JDK 13 than the *Int*Strings tests, presumably thanks to 
> inlining ("the call site is reused").
>
> I guess that 1IntLoop10, 2IntsLoop10, and 2Ints2StringsLoop10 would 
> have such high throughput on vanilla JDK 13 that their speedups with 
> intrinsification might be significantly <1.

not in all cases, see [1] the new information is highlighted in yellow
>
> Alex

Vicente

[1] 
http://cr.openjdk.java.net/~vromero/intrinsics_benchmark_results/benchmarkResults_intrinsics_all_data_v4.html