scalar replacement of arrays affected by minor changes to surrounding code
Govind Jajoo
gjajoo+java at gmail.com
Mon Sep 16 15:19:10 UTC 2019
Hi team,
We're seeing some unexpected behaviour with scalar replacement of arrays
getting affected by subtle changes to surrounding code. If a newly created
array is accessed in a loop or wrapped inside another object, the
optimization gets disabled easily. For example when we run the following
benchmark in jmh (jdk11/linux)
public class ArrayLoop {
private static Random s_r = new Random();
private static int next() { return s_r.nextInt() % 1000; }
private static int loop(int... arr) {
int sum = 0;
for (int i = arr.length - 1; i >= 0; sum += arr[i--]) { ; }
return sum;
}
@Benchmark
public void loopSum(Blackhole bh) {
bh.consume(loop(next(), next()));
}
}
# JMH version: 1.21
# VM version: JDK 11.0.4, OpenJDK 64-Bit Server VM, 11.0.4+11
ArrayLoop.loopSum avgt 3 26.124 ±
7.727 ns/op
ArrayLoop.loopSum:·gc.alloc.rate avgt 3 700.529 ±
208.524 MB/sec
ArrayLoop.loopSum:·gc.count avgt 3 5.000
counts
We see unexpected gc activity. When we avoid the loop by "unrolling" it and
adding the following to the ArrayLoop class above
// silly manually unrolled loop
private static int unrolled(int... arr) {
int sum = 0;
switch (arr.length) {
default: for (int i = arr.length - 1; i >= 4; sum += arr[i--])
{ ; }
case 4: sum += arr[3];
case 3: sum += arr[2];
case 2: sum += arr[1];
case 1: sum += arr[0];
}
return sum;
}
@Benchmark
public void unrolledSum(Blackhole bh) {
bh.consume(unrolled(next(), next()));
}
#
ArrayLoop.unrolledSum avgt 3
25.076 ± 1.711 ns/op
ArrayLoop.unrolledSum:·gc.alloc.rate avgt 3 ≈
10⁻⁴ MB/sec
ArrayLoop.unrolledSum:·gc.count avgt 3 ≈
0 counts
scalar replacement kicks in as expected. Then to try out a more realistic
scenario representing our usage, we added the following wrapper and
benchmarks
private static class ArrayWrapper {
final int[] arr;
ArrayWrapper(int... many) { arr = many; }
int loopSum() { return loop(arr); }
int unrolledSum() { return unrolled(arr); }
}
@Benchmark
public void wrappedUnrolledSum(Blackhole bh) {
bh.consume(new ArrayWrapper(next(), next()).unrolledSum());
}
@Benchmark
public void wrappedLoopSum(Blackhole bh) {
bh.consume(new ArrayWrapper(next(), next()).loopSum());
}
#
ArrayLoop.wrappedLoopSum avgt 3
26.190 ± 18.853 ns/op
ArrayLoop.wrappedLoopSum:·gc.alloc.rate avgt 3
699.433 ± 512.953 MB/sec
ArrayLoop.wrappedLoopSum:·gc.count avgt 3
6.000 counts
ArrayLoop.wrappedUnrolledSum avgt 3
25.877 ± 13.348 ns/op
ArrayLoop.wrappedUnrolledSum:·gc.alloc.rate avgt 3
707.440 ± 360.702 MB/sec
ArrayLoop.wrappedUnrolledSum:·gc.count avgt 3
6.000 counts
While the LoopSum behaviour is same as before here, even the UnrolledSum
benchmark starts to show gc activity. What gives?
Thanks,
Govind
PS: MCVE available at https://github.com/gjajoo/EA/
More information about the hotspot-compiler-dev
mailing list