[intrinsics] performance improvements for the intrinsified version of Objects::hash

Thu Mar 7 21:10:24 UTC 2019

On 3/6/19 6:18 PM, Vicente Romero wrote:
> I have produced a new iteration of performance results with a reorganization of the data proposed by
> Alex, see [1], I added a link to the benchmark source
> 
> [1]
> http://cr.openjdk.java.net/~vromero/intrinsics_benchmark_results/v5/benchmarkResults_intrinsics_all_data_v5.html

Haven't read the entire thread, Alex asked me for comments about the benchmark itself:

*) These two guys are unused?

    static final Integer i23 = 23;
    static final Object[] args = { "Bob", i23 };

*) This repetitive block can be put on the IntrinsicsBenchmarks class:

    @BenchmarkMode(Mode.Throughput)
    @Fork(value = 1)
    @Warmup(iterations=3)
    @Measurement(iterations=5)
    @OutputTimeUnit(TimeUnit.MILLISECONDS)

 ...leaving only:

    @Benchmark
    public void testHash0001Int(Blackhole b) {
       ...
    }

*) Is there a reason to do multiple ops per @Benchmark? Why can't it be just:

    @Benchmark
    public int testHash0001Int() {
        int i = 1;
        return Objects.hash(i);
    }

*) Come to think about it, are those benchmark try to specifically measure the ability to fold local
vars? If so, it could be just inlined?

    @Benchmark
    public int testHash0001Int() {
        return Objects.hash(1);
    }

*) If local var is sensible, I guess there is a difference between local "final int i = 1;" and "int
i = 1;" from javac perspective?

*) I guess what you want to check for performance when arguments read from instance fields -- that
would be the target test case that simulates Object.hashCode(), no? JMH would break load commoning
across @Benchmark calls, which makes two families of tests sensible:

   @BenchmarkMode(Mode.Throughput)
   @Fork(value = 1)
   @Warmup(iterations=3)
   @Measurement(iterations=5)
   @OutputTimeUnit(TimeUnit.MILLISECONDS)
   @State(Scope.Benchmark)
   public class IntrinsicsBenchmark {
       int i1 = 1;

       @Benchmark
       public int testHash0001Int_Standalone() {
           // Single read of field with unknown value
           return Objects.hash(i1);
       }

       @Benchmark
       public int testHash0001Int_Merge() {
           // Second read should be able to reuse/fold the value first read got
          return Objects.hash(i1) + Objects.hash(i1);
       }
   }

*) Probably needs the test that accepts already boxed varargs via Object[]?

*) Is there -prof perfasm output for any of those benchmarks, to see how generated code is actually
looking? It might look especially funny for cases that fall off the cliff, like testHash0080String.

-Aleksey