Allocation of array copy can be eliminated in particular cases

Сергей Цыпанов sergei.tsypanov at yandex.ru
Wed Nov 20 21:30:48 UTC 2019


Hello Vladimir,

thank you for your response!

> Moreover, the transformation is already there:
>
> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/opto/memnode.cpp#l2388

Comment in line 2389 seems confusing to me:

// This works even if the length is not constant (clone or newArray).

When we clone array isn't the length constant and equal to the length of original array? I guess it cannot be different.

> I haven't looked into the benchmarks you mentioned, but it looks like
> cloned_array.length access is not the reason why cloned array is still
> there.

Once I thought that cloned array is retained at run time because it's returned from method in original benchmark:

@Benchmark
public int getParameterTypes() { return method.getParameterTypes().length; }

To check whether this speculation is correct I've tried to change my benchmark in order to strip any additional logic from it [1]:

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class ArrayAllocationEliminationBenchmark {

  private int length = 10;

  //...

  @Benchmark
  public int baseline() {
    return new int[length].length;
  }

  @Benchmark
  public int baselineClone() {
    return new int[length].clone().length;
  }
  //...
}

Here I don't see any reason for runtime to hold cloned array:

1) int is returned from the method
2) cloned array doesn't escape the place where it's created

So the cloned array should be dropped off, but according to benchmarking results it's not:

JDK 11

                                                               Mode  Cnt     Score     Error   Units
baseline                                                       avgt   25    10,860 ±   0,604   ns/op
baseline:·gc.alloc.rate                                        avgt   25  4703,477 ± 215,986  MB/sec
baseline:·gc.alloc.rate.norm                                   avgt   25    56,000 ±   0,001    B/op
baseline:·gc.churn.CodeHeap_'non-profiled_nmethods'            avgt   25     0,002 ±   0,001  MB/sec
baseline:·gc.churn.CodeHeap_'non-profiled_nmethods'.norm       avgt   25    ≈ 10⁻⁴              B/op
baseline:·gc.churn.G1_Old_Gen                                  avgt   25  4711,586 ± 218,439  MB/sec
baseline:·gc.churn.G1_Old_Gen.norm                             avgt   25    56,094 ±   0,084    B/op
baseline:·gc.count                                             avgt   25  5400,000            counts
baseline:·gc.time                                              avgt   25  3926,000                ms

baselineClone                                                  avgt   25    21,906 ±   1,234   ns/op
baselineClone:·gc.alloc.rate                                   avgt   25  4667,440 ± 248,731  MB/sec
baselineClone:·gc.alloc.rate.norm                              avgt   25   112,000 ±   0,001    B/op
baselineClone:·gc.churn.CodeHeap_'non-profiled_nmethods'       avgt   25     0,008 ±   0,002  MB/sec
baselineClone:·gc.churn.CodeHeap_'non-profiled_nmethods'.norm  avgt   25    ≈ 10⁻⁴              B/op
baselineClone:·gc.churn.G1_Old_Gen                             avgt   25  4675,250 ± 247,341  MB/sec
baselineClone:·gc.churn.G1_Old_Gen.norm                        avgt   25   112,192 ±   0,162    B/op
baselineClone:·gc.count                                        avgt   25  5489,000            counts
baselineClone:·gc.time                                         avgt   25  4042,000                ms

JDK 13

                                                Mode  Cnt     Score     Error   Units
baseline                                        avgt   25    10,014 ±   0,227   ns/op
baseline:·gc.alloc.rate                         avgt   25  5082,913 ± 110,593  MB/sec
baseline:·gc.alloc.rate.norm                    avgt   25    56,000 ±   0,001    B/op
baseline:·gc.churn.G1_Eden_Space                avgt   25  5092,013 ± 110,500  MB/sec
baseline:·gc.churn.G1_Eden_Space.norm           avgt   25    56,100 ±   0,076    B/op
baseline:·gc.churn.G1_Survivor_Space            avgt   25     0,005 ±   0,001  MB/sec
baseline:·gc.churn.G1_Survivor_Space.norm       avgt   25    ≈ 10⁻⁴              B/op
baseline:·gc.count                              avgt   25  5753,000            counts
baseline:·gc.time                               avgt   25  3733,000                ms

baselineClone                                   avgt   25    26,619 ±   1,405   ns/op
baselineClone:·gc.alloc.rate                    avgt   25  3837,924 ± 185,292  MB/sec
baselineClone:·gc.alloc.rate.norm               avgt   25   112,000 ±   0,001    B/op
baselineClone:·gc.churn.G1_Eden_Space           avgt   25  3844,010 ± 185,460  MB/sec
baselineClone:·gc.churn.G1_Eden_Space.norm      avgt   25   112,178 ±   0,168    B/op
baselineClone:·gc.churn.G1_Survivor_Space       avgt   25     0,008 ±   0,001  MB/sec
baselineClone:·gc.churn.G1_Survivor_Space.norm  avgt   25    ≈ 10⁻⁴              B/op
baselineClone:·gc.count                         avgt   25  4668,000            counts
baselineClone:·gc.time                          avgt   25  2923,000                ms


>From this output I conclude that either I miss something from understanding of how compiler and runtime work, or this is a bug.

I will be happy to understand which of the two is correct :)

There is also good news though, the latest Graal can drop allocation off for baseline method [2]

With best regards,

Sergey Tsypanov

1) https://github.com/stsypanov/logeek-night-benchmark/blob/master/benchmark-runners/src/main/java/com/luxoft/logeek/benchmark/array/ArrayAllocationEliminationBenchmark.java

2) https://github.com/oracle/graal/issues/1847


More information about the hotspot-compiler-dev mailing list