scalar replacement of arrays affected by minor changes to surrounding code

Mon Sep 16 15:19:10 UTC 2019

Hi team,

We're seeing some unexpected behaviour with scalar replacement of arrays
getting affected by subtle changes to surrounding code. If a newly created
array is accessed in a loop or wrapped inside another object, the
optimization gets disabled easily. For example when we run the following
benchmark in jmh (jdk11/linux)

public class ArrayLoop {
    private static Random s_r = new Random();
    private static int next() { return s_r.nextInt() % 1000; }

    private static int loop(int... arr) {
        int sum = 0;
        for (int i = arr.length - 1; i >= 0; sum += arr[i--]) { ; }
        return sum;
    }

    @Benchmark
    public void loopSum(Blackhole bh) {
        bh.consume(loop(next(), next()));
    }
}

# JMH version: 1.21
# VM version: JDK 11.0.4, OpenJDK 64-Bit Server VM, 11.0.4+11
ArrayLoop.loopSum                                     avgt    3   26.124 ±
   7.727   ns/op
ArrayLoop.loopSum:·gc.alloc.rate                      avgt    3  700.529 ±
 208.524  MB/sec
ArrayLoop.loopSum:·gc.count                           avgt    3    5.000
          counts

We see unexpected gc activity. When we avoid the loop by "unrolling" it and
adding the following to the ArrayLoop class above

    // silly manually unrolled loop
    private static int unrolled(int... arr) {
        int sum = 0;
        switch (arr.length) {
            default: for (int i = arr.length - 1; i >= 4; sum += arr[i--])
{ ; }
            case 4: sum += arr[3];
            case 3: sum += arr[2];
            case 2: sum += arr[1];
            case 1: sum += arr[0];
        }
        return sum;
    }

    @Benchmark
    public void unrolledSum(Blackhole bh) {
        bh.consume(unrolled(next(), next()));
    }

#
ArrayLoop.unrolledSum                                      avgt    3
25.076 ±    1.711   ns/op
ArrayLoop.unrolledSum:·gc.alloc.rate                       avgt    3   ≈
10⁻⁴             MB/sec
ArrayLoop.unrolledSum:·gc.count                            avgt    3      ≈
0             counts

scalar replacement kicks in as expected. Then to try out a more realistic
scenario representing our usage, we added the following wrapper and
benchmarks

    private static class ArrayWrapper {
        final int[] arr;
        ArrayWrapper(int... many) { arr = many; }
        int loopSum() { return loop(arr); }
        int unrolledSum() { return unrolled(arr); }
    }

    @Benchmark
    public void wrappedUnrolledSum(Blackhole bh) {
        bh.consume(new ArrayWrapper(next(), next()).unrolledSum());
    }

    @Benchmark
    public void wrappedLoopSum(Blackhole bh) {
        bh.consume(new ArrayWrapper(next(), next()).loopSum());
    }

#
ArrayLoop.wrappedLoopSum                                   avgt    3
26.190 ±   18.853   ns/op
ArrayLoop.wrappedLoopSum:·gc.alloc.rate                    avgt    3
 699.433 ±  512.953  MB/sec
ArrayLoop.wrappedLoopSum:·gc.count                         avgt    3
 6.000             counts
ArrayLoop.wrappedUnrolledSum                               avgt    3
25.877 ±   13.348   ns/op
ArrayLoop.wrappedUnrolledSum:·gc.alloc.rate                avgt    3
 707.440 ±  360.702  MB/sec
ArrayLoop.wrappedUnrolledSum:·gc.count                     avgt    3
 6.000             counts

While the LoopSum behaviour is same as before here, even the UnrolledSum
benchmark starts to show gc activity. What gives?

Thanks,
Govind
PS: MCVE available at https://github.com/gjajoo/EA/