scalar replacement of arrays affected by minor changes to surrounding code
Govind Jajoo
gjajoo+java at gmail.com
Mon Sep 16 22:07:34 UTC 2019
hi Eric,
We're operating well within the default limit of
-XX:EliminateAllocationArraySizeLimit
and as shown in the tests, escape analysis is able to identify and elide
the array allocations for hand-unrolled loops. What we're trying to figure
out is why a loop or an object wrapper is affecting this optimization?
We've tried with and without the ... args, but creating a temporary array
instead and it makes no difference (Examples checked in to the github
repo).
Are you suggesting that this optimization is not supported in presence of
loops?
Thanks,
Govind
On Mon, Sep 16, 2019 at 11:40 PM Eric Caspole <eric.caspole at oracle.com>
wrote:
> Hi Govind,
> When you use ... to pass parameters and receive the array, the array
> must be created to pass the parameters, so it is expected to get some
> allocation and GCs. You can see it in the bytecode for your loopSum:
>
> public void loopSum(org.openjdk.jmh.infra.Blackhole);
> descriptor: (Lorg/openjdk/jmh/infra/Blackhole;)V
> Code:
> 0: aload_1
> 1: iconst_2
> 2: newarray int
> 4: dup
> 5: iconst_0
> 6: invokestatic #6 // Method next:()I
> 9: iastore
> 10: dup
> 11: iconst_1
> 12: invokestatic #6 // Method next:()I
> 15: iastore
> 16: invokestatic #2 // Method loop:([I)I
> 19: invokevirtual #7 // Method
> org/openjdk/jmh/infra/Blackhole.consume:(I)V
> 22: return
>
> If you want to reduce the object allocation maybe you can tweak your
> code to not pass arguments by ...
> Regards,
> Eric
>
>
> On 9/16/19 11:19, Govind Jajoo wrote:
> > Hi team,
> >
> > We're seeing some unexpected behaviour with scalar replacement of arrays
> > getting affected by subtle changes to surrounding code. If a newly
> created
> > array is accessed in a loop or wrapped inside another object, the
> > optimization gets disabled easily. For example when we run the following
> > benchmark in jmh (jdk11/linux)
> >
> > public class ArrayLoop {
> > private static Random s_r = new Random();
> > private static int next() { return s_r.nextInt() % 1000; }
> >
> > private static int loop(int... arr) {
> > int sum = 0;
> > for (int i = arr.length - 1; i >= 0; sum += arr[i--]) { ; }
> > return sum;
> > }
> >
> > @Benchmark
> > public void loopSum(Blackhole bh) {
> > bh.consume(loop(next(), next()));
> > }
> > }
> >
> > # JMH version: 1.21
> > # VM version: JDK 11.0.4, OpenJDK 64-Bit Server VM, 11.0.4+11
> > ArrayLoop.loopSum avgt 3 26.124
> ±
> > 7.727 ns/op
> > ArrayLoop.loopSum:·gc.alloc.rate avgt 3 700.529
> ±
> > 208.524 MB/sec
> > ArrayLoop.loopSum:·gc.count avgt 3 5.000
> > counts
> >
> > We see unexpected gc activity. When we avoid the loop by "unrolling" it
> and
> > adding the following to the ArrayLoop class above
> >
> > // silly manually unrolled loop
> > private static int unrolled(int... arr) {
> > int sum = 0;
> > switch (arr.length) {
> > default: for (int i = arr.length - 1; i >= 4; sum +=
> arr[i--])
> > { ; }
> > case 4: sum += arr[3];
> > case 3: sum += arr[2];
> > case 2: sum += arr[1];
> > case 1: sum += arr[0];
> > }
> > return sum;
> > }
> >
> > @Benchmark
> > public void unrolledSum(Blackhole bh) {
> > bh.consume(unrolled(next(), next()));
> > }
> >
> > #
> > ArrayLoop.unrolledSum avgt 3
> > 25.076 ± 1.711 ns/op
> > ArrayLoop.unrolledSum:·gc.alloc.rate avgt 3 ≈
> > 10⁻⁴ MB/sec
> > ArrayLoop.unrolledSum:·gc.count avgt 3
> ≈
> > 0 counts
> >
> > scalar replacement kicks in as expected. Then to try out a more realistic
> > scenario representing our usage, we added the following wrapper and
> > benchmarks
> >
> > private static class ArrayWrapper {
> > final int[] arr;
> > ArrayWrapper(int... many) { arr = many; }
> > int loopSum() { return loop(arr); }
> > int unrolledSum() { return unrolled(arr); }
> > }
> >
> > @Benchmark
> > public void wrappedUnrolledSum(Blackhole bh) {
> > bh.consume(new ArrayWrapper(next(), next()).unrolledSum());
> > }
> >
> > @Benchmark
> > public void wrappedLoopSum(Blackhole bh) {
> > bh.consume(new ArrayWrapper(next(), next()).loopSum());
> > }
> >
> > #
> > ArrayLoop.wrappedLoopSum avgt 3
> > 26.190 ± 18.853 ns/op
> > ArrayLoop.wrappedLoopSum:·gc.alloc.rate avgt 3
> > 699.433 ± 512.953 MB/sec
> > ArrayLoop.wrappedLoopSum:·gc.count avgt 3
> > 6.000 counts
> > ArrayLoop.wrappedUnrolledSum avgt 3
> > 25.877 ± 13.348 ns/op
> > ArrayLoop.wrappedUnrolledSum:·gc.alloc.rate avgt 3
> > 707.440 ± 360.702 MB/sec
> > ArrayLoop.wrappedUnrolledSum:·gc.count avgt 3
> > 6.000 counts
> >
> > While the LoopSum behaviour is same as before here, even the UnrolledSum
> > benchmark starts to show gc activity. What gives?
> >
> > Thanks,
> > Govind
> > PS: MCVE available at https://github.com/gjajoo/EA/
> >
>
More information about the hotspot-compiler-dev
mailing list