scalar replacement of arrays affected by minor changes to surrounding code

dean.long at oracle.com dean.long at oracle.com
Tue Sep 17 21:33:18 UTC 2019


Hi Govind,

On 9/17/19 2:33 AM, Govind Jajoo wrote:
> dl -
>
> can you please elaborate on how the linked issue is similar?
>

Just that they both have a loop and a merge point.

> > limitation of the current EA implementation. Objects will not be 
> eliminated if there is merge point in which it is undefined which 
> object is referenced
> specifically where is the merge point in the sample code I've posted? 
> from what i can tell there's no ambiguity around which instance is 
> being referenced as there's only one.
You're right, in your case the merge is on the scalar "sum".  Have you 
tried turning on PrintEscapeAnalysis and PrintEliminateAllocations to 
see what's going wrong in your test?

dl

>
> Thanks,
> Govind
>
> On Tue, Sep 17, 2019 at 5:40 AM <dean.long at oracle.com 
> <mailto:dean.long at oracle.com>> wrote:
>
>     The problem sounds similar to this issue:
>     https://bugs.openjdk.java.net/browse/JDK-6853701
>
>     dl
>
>     On 9/16/19 3:07 PM, Govind Jajoo wrote:
>     > hi Eric,
>     >
>     > We're operating well within the default limit of
>     > -XX:EliminateAllocationArraySizeLimit
>     > and as shown in the tests, escape analysis is able to identify
>     and elide
>     > the array allocations for hand-unrolled loops. What we're trying
>     to figure
>     > out is why a loop or an object wrapper is affecting this
>     optimization?
>     > We've tried with and without the ... args, but creating a
>     temporary array
>     > instead and it makes no difference (Examples checked in to the
>     github
>     > repo).
>     >
>     > Are you suggesting that this optimization is not supported in
>     presence of
>     > loops?
>     >
>     > Thanks,
>     > Govind
>     >
>     >
>     > On Mon, Sep 16, 2019 at 11:40 PM Eric Caspole
>     <eric.caspole at oracle.com <mailto:eric.caspole at oracle.com>>
>     > wrote:
>     >
>     >> Hi Govind,
>     >> When you use ... to pass parameters and receive the array, the
>     array
>     >> must be created to pass the parameters, so it is expected to
>     get some
>     >> allocation and GCs. You can see it in the bytecode for your
>     loopSum:
>     >>
>     >>     public void loopSum(org.openjdk.jmh.infra.Blackhole);
>     >>       descriptor: (Lorg/openjdk/jmh/infra/Blackhole;)V
>     >>       Code:
>     >>          0: aload_1
>     >>          1: iconst_2
>     >>          2: newarray       int
>     >>          4: dup
>     >>          5: iconst_0
>     >>          6: invokestatic  #6                  // Method next:()I
>     >>          9: iastore
>     >>         10: dup
>     >>         11: iconst_1
>     >>         12: invokestatic  #6                  // Method next:()I
>     >>         15: iastore
>     >>         16: invokestatic  #2                  // Method loop:([I)I
>     >>         19: invokevirtual #7                  // Method
>     >> org/openjdk/jmh/infra/Blackhole.consume:(I)V
>     >>         22: return
>     >>
>     >> If you want to reduce the object allocation maybe you can tweak
>     your
>     >> code to not pass arguments by ...
>     >> Regards,
>     >> Eric
>     >>
>     >>
>     >> On 9/16/19 11:19, Govind Jajoo wrote:
>     >>> Hi team,
>     >>>
>     >>> We're seeing some unexpected behaviour with scalar replacement
>     of arrays
>     >>> getting affected by subtle changes to surrounding code. If a newly
>     >> created
>     >>> array is accessed in a loop or wrapped inside another object, the
>     >>> optimization gets disabled easily. For example when we run the
>     following
>     >>> benchmark in jmh (jdk11/linux)
>     >>>
>     >>> public class ArrayLoop {
>     >>>       private static Random s_r = new Random();
>     >>>       private static int next() { return s_r.nextInt() % 1000; }
>     >>>
>     >>>       private static int loop(int... arr) {
>     >>>           int sum = 0;
>     >>>           for (int i = arr.length - 1; i >= 0; sum +=
>     arr[i--]) { ; }
>     >>>           return sum;
>     >>>       }
>     >>>
>     >>>       @Benchmark
>     >>>       public void loopSum(Blackhole bh) {
>     >>>           bh.consume(loop(next(), next()));
>     >>>       }
>     >>> }
>     >>>
>     >>> # JMH version: 1.21
>     >>> # VM version: JDK 11.0.4, OpenJDK 64-Bit Server VM, 11.0.4+11
>     >>> ArrayLoop.loopSum      avgt    3   26.124
>     >> ±
>     >>>      7.727   ns/op
>     >>> ArrayLoop.loopSum:·gc.alloc.rate     avgt    3  700.529
>     >> ±
>     >>>    208.524  MB/sec
>     >>> ArrayLoop.loopSum:·gc.count      avgt    3    5.000
>     >>>             counts
>     >>>
>     >>> We see unexpected gc activity. When we avoid the loop by
>     "unrolling" it
>     >> and
>     >>> adding the following to the ArrayLoop class above
>     >>>
>     >>>       // silly manually unrolled loop
>     >>>       private static int unrolled(int... arr) {
>     >>>           int sum = 0;
>     >>>           switch (arr.length) {
>     >>>               default: for (int i = arr.length - 1; i >= 4; sum +=
>     >> arr[i--])
>     >>> { ; }
>     >>>               case 4: sum += arr[3];
>     >>>               case 3: sum += arr[2];
>     >>>               case 2: sum += arr[1];
>     >>>               case 1: sum += arr[0];
>     >>>           }
>     >>>           return sum;
>     >>>       }
>     >>>
>     >>>       @Benchmark
>     >>>       public void unrolledSum(Blackhole bh) {
>     >>>           bh.consume(unrolled(next(), next()));
>     >>>       }
>     >>>
>     >>> #
>     >>> ArrayLoop.unrolledSum           avgt    3
>     >>> 25.076 ±    1.711   ns/op
>     >>> ArrayLoop.unrolledSum:·gc.alloc.rate          avgt    3   ≈
>     >>> 10⁻⁴             MB/sec
>     >>> ArrayLoop.unrolledSum:·gc.count           avgt    3
>     >>    ≈
>     >>> 0             counts
>     >>>
>     >>> scalar replacement kicks in as expected. Then to try out a
>     more realistic
>     >>> scenario representing our usage, we added the following
>     wrapper and
>     >>> benchmarks
>     >>>
>     >>>       private static class ArrayWrapper {
>     >>>           final int[] arr;
>     >>>           ArrayWrapper(int... many) { arr = many; }
>     >>>           int loopSum() { return loop(arr); }
>     >>>           int unrolledSum() { return unrolled(arr); }
>     >>>       }
>     >>>
>     >>>       @Benchmark
>     >>>       public void wrappedUnrolledSum(Blackhole bh) {
>     >>>           bh.consume(new ArrayWrapper(next(),
>     next()).unrolledSum());
>     >>>       }
>     >>>
>     >>>       @Benchmark
>     >>>       public void wrappedLoopSum(Blackhole bh) {
>     >>>           bh.consume(new ArrayWrapper(next(), next()).loopSum());
>     >>>       }
>     >>>
>     >>> #
>     >>> ArrayLoop.wrappedLoopSum          avgt    3
>     >>> 26.190 ±   18.853   ns/op
>     >>> ArrayLoop.wrappedLoopSum:·gc.alloc.rate           avgt    3
>     >>>    699.433 ±  512.953  MB/sec
>     >>> ArrayLoop.wrappedLoopSum:·gc.count          avgt    3
>     >>>    6.000             counts
>     >>> ArrayLoop.wrappedUnrolledSum          avgt    3
>     >>> 25.877 ±   13.348   ns/op
>     >>> ArrayLoop.wrappedUnrolledSum:·gc.alloc.rate           avgt    3
>     >>>    707.440 ±  360.702  MB/sec
>     >>> ArrayLoop.wrappedUnrolledSum:·gc.count          avgt    3
>     >>>    6.000             counts
>     >>>
>     >>> While the LoopSum behaviour is same as before here, even the
>     UnrolledSum
>     >>> benchmark starts to show gc activity. What gives?
>     >>>
>     >>> Thanks,
>     >>> Govind
>     >>> PS: MCVE available at https://github.com/gjajoo/EA/
>     >>>
>



More information about the hotspot-compiler-dev mailing list