[intrinsics] performance improvements for the intrinsified version of Objects::hash

Thu Mar 7 21:20:03 UTC 2019

Hi Aleksey,

Thansk for taking a look at this

On 3/7/19 4:10 PM, Aleksey Shipilev wrote:
> On 3/6/19 6:18 PM, Vicente Romero wrote:
>> I have produced a new iteration of performance results with a reorganization of the data proposed by
>> Alex, see [1], I added a link to the benchmark source
>>
>> [1]
>> http://cr.openjdk.java.net/~vromero/intrinsics_benchmark_results/v5/benchmarkResults_intrinsics_all_data_v5.html
> Haven't read the entire thread, Alex asked me for comments about the benchmark itself:
>
> *) These two guys are unused?
>
>      static final Integer i23 = 23;
>      static final Object[] args = { "Bob", i23 };

no, dead code, will remove it

>
> *) This repetitive block can be put on the IntrinsicsBenchmarks class:
>
>      @BenchmarkMode(Mode.Throughput)
>      @Fork(value = 1)
>      @Warmup(iterations=3)
>      @Measurement(iterations=5)
>      @OutputTimeUnit(TimeUnit.MILLISECONDS)

ok less verbosity

>
>   ...leaving only:
>
>      @Benchmark
>      public void testHash0001Int(Blackhole b) {
>         ...
>      }
>
> *) Is there a reason to do multiple ops per @Benchmark? Why can't it be just:
>
>      @Benchmark
>      public int testHash0001Int() {
>          int i = 1;
>          return Objects.hash(i);
>      }

not really, will simplify it

>
> *) Come to think about it, are those benchmark try to specifically measure the ability to fold local
> vars? If so, it could be just inlined?
>
>      @Benchmark
>      public int testHash0001Int() {
>          return Objects.hash(1);
>      }

not specifically local vars but this project is able to produce an ldc 
if the argument is / are constants, so we want to measure the case when 
the arguments are not constants. The constant case is superfast we don't 
have performance issues there

>
> *) If local var is sensible, I guess there is a difference between local "final int i = 1;" and "int
> i = 1;" from javac perspective?

yes: `final int i = 1;` is the same as a constant literal

>
> *) I guess what you want to check for performance when arguments read from instance fields -- that
> would be the target test case that simulates Object.hashCode(), no?

reading instance fields or local variables shouldn't make a big difference

>   JMH would break load commoning
> across @Benchmark calls, which makes two families of tests sensible:
>
>     @BenchmarkMode(Mode.Throughput)
>     @Fork(value = 1)
>     @Warmup(iterations=3)
>     @Measurement(iterations=5)
>     @OutputTimeUnit(TimeUnit.MILLISECONDS)
>     @State(Scope.Benchmark)
>     public class IntrinsicsBenchmark {
>         int i1 = 1;
>
>         @Benchmark
>         public int testHash0001Int_Standalone() {
>             // Single read of field with unknown value
>             return Objects.hash(i1);
>         }
>
>         @Benchmark
>         public int testHash0001Int_Merge() {
>             // Second read should be able to reuse/fold the value first read got
>            return Objects.hash(i1) + Objects.hash(i1);
>         }
>     }
>
> *) Probably needs the test that accepts already boxed varargs via Object[]?

yes we have separate tests for that

>
> *) Is there -prof perfasm output for any of those benchmarks, to see how generated code is actually
> looking? It might look especially funny for cases that fall off the cliff, like testHash0080String.

yep a big indy invocation

>
> -Aleksey
>
Thanks,
Vicente