scalar replacement of arrays affected by minor changes to surrounding code

Mon Sep 16 21:40:37 UTC 2019

Hi Govind,
When you use ... to pass parameters and receive the array, the array 
must be created to pass the parameters, so it is expected to get some 
allocation and GCs. You can see it in the bytecode for your loopSum:

   public void loopSum(org.openjdk.jmh.infra.Blackhole);
     descriptor: (Lorg/openjdk/jmh/infra/Blackhole;)V
     Code:
        0: aload_1
        1: iconst_2
        2: newarray       int
        4: dup
        5: iconst_0
        6: invokestatic  #6                  // Method next:()I
        9: iastore
       10: dup
       11: iconst_1
       12: invokestatic  #6                  // Method next:()I
       15: iastore
       16: invokestatic  #2                  // Method loop:([I)I
       19: invokevirtual #7                  // Method 
org/openjdk/jmh/infra/Blackhole.consume:(I)V
       22: return

If you want to reduce the object allocation maybe you can tweak your 
code to not pass arguments by ...
Regards,
Eric

On 9/16/19 11:19, Govind Jajoo wrote:
> Hi team,
> 
> We're seeing some unexpected behaviour with scalar replacement of arrays
> getting affected by subtle changes to surrounding code. If a newly created
> array is accessed in a loop or wrapped inside another object, the
> optimization gets disabled easily. For example when we run the following
> benchmark in jmh (jdk11/linux)
> 
> public class ArrayLoop {
>      private static Random s_r = new Random();
>      private static int next() { return s_r.nextInt() % 1000; }
> 
>      private static int loop(int... arr) {
>          int sum = 0;
>          for (int i = arr.length - 1; i >= 0; sum += arr[i--]) { ; }
>          return sum;
>      }
> 
>      @Benchmark
>      public void loopSum(Blackhole bh) {
>          bh.consume(loop(next(), next()));
>      }
> }
> 
> # JMH version: 1.21
> # VM version: JDK 11.0.4, OpenJDK 64-Bit Server VM, 11.0.4+11
> ArrayLoop.loopSum                                     avgt    3   26.124 ±
>     7.727   ns/op
> ArrayLoop.loopSum:·gc.alloc.rate                      avgt    3  700.529 ±
>   208.524  MB/sec
> ArrayLoop.loopSum:·gc.count                           avgt    3    5.000
>            counts
> 
> We see unexpected gc activity. When we avoid the loop by "unrolling" it and
> adding the following to the ArrayLoop class above
> 
>      // silly manually unrolled loop
>      private static int unrolled(int... arr) {
>          int sum = 0;
>          switch (arr.length) {
>              default: for (int i = arr.length - 1; i >= 4; sum += arr[i--])
> { ; }
>              case 4: sum += arr[3];
>              case 3: sum += arr[2];
>              case 2: sum += arr[1];
>              case 1: sum += arr[0];
>          }
>          return sum;
>      }
> 
>      @Benchmark
>      public void unrolledSum(Blackhole bh) {
>          bh.consume(unrolled(next(), next()));
>      }
> 
> #
> ArrayLoop.unrolledSum                                      avgt    3
> 25.076 ±    1.711   ns/op
> ArrayLoop.unrolledSum:·gc.alloc.rate                       avgt    3   ≈
> 10⁻⁴             MB/sec
> ArrayLoop.unrolledSum:·gc.count                            avgt    3      ≈
> 0             counts
> 
> scalar replacement kicks in as expected. Then to try out a more realistic
> scenario representing our usage, we added the following wrapper and
> benchmarks
> 
>      private static class ArrayWrapper {
>          final int[] arr;
>          ArrayWrapper(int... many) { arr = many; }
>          int loopSum() { return loop(arr); }
>          int unrolledSum() { return unrolled(arr); }
>      }
> 
>      @Benchmark
>      public void wrappedUnrolledSum(Blackhole bh) {
>          bh.consume(new ArrayWrapper(next(), next()).unrolledSum());
>      }
> 
>      @Benchmark
>      public void wrappedLoopSum(Blackhole bh) {
>          bh.consume(new ArrayWrapper(next(), next()).loopSum());
>      }
> 
> #
> ArrayLoop.wrappedLoopSum                                   avgt    3
> 26.190 ±   18.853   ns/op
> ArrayLoop.wrappedLoopSum:·gc.alloc.rate                    avgt    3
>   699.433 ±  512.953  MB/sec
> ArrayLoop.wrappedLoopSum:·gc.count                         avgt    3
>   6.000             counts
> ArrayLoop.wrappedUnrolledSum                               avgt    3
> 25.877 ±   13.348   ns/op
> ArrayLoop.wrappedUnrolledSum:·gc.alloc.rate                avgt    3
>   707.440 ±  360.702  MB/sec
> ArrayLoop.wrappedUnrolledSum:·gc.count                     avgt    3
>   6.000             counts
> 
> While the LoopSum behaviour is same as before here, even the UnrolledSum
> benchmark starts to show gc activity. What gives?
> 
> Thanks,
> Govind
> PS: MCVE available at https://github.com/gjajoo/EA/
>