scalar replacement of arrays affected by minor changes to surrounding code

Nils Eliasson nils.eliasson at oracle.com
Tue Sep 17 07:35:38 UTC 2019


We also have a problem where array allocations that have lost their uses 
doesn't get eliminated.

If we can't prove that the array length is positive, the allocation must 
be replaced by a guard, checking for negative values and throwing 
NegativeArraySizeException.

I have a almost finished patch for this.

// Nils


On 2019-09-17 05:39, dean.long at oracle.com wrote:
> The problem sounds similar to this issue: 
> https://bugs.openjdk.java.net/browse/JDK-6853701
>
> dl
>
> On 9/16/19 3:07 PM, Govind Jajoo wrote:
>> hi Eric,
>>
>> We're operating well within the default limit of
>> -XX:EliminateAllocationArraySizeLimit
>> and as shown in the tests, escape analysis is able to identify and elide
>> the array allocations for hand-unrolled loops. What we're trying to 
>> figure
>> out is why a loop or an object wrapper is affecting this optimization?
>> We've tried with and without the ... args, but creating a temporary 
>> array
>> instead and it makes no difference (Examples checked in to the github
>> repo).
>>
>> Are you suggesting that this optimization is not supported in 
>> presence of
>> loops?
>>
>> Thanks,
>> Govind
>>
>>
>> On Mon, Sep 16, 2019 at 11:40 PM Eric Caspole <eric.caspole at oracle.com>
>> wrote:
>>
>>> Hi Govind,
>>> When you use ... to pass parameters and receive the array, the array
>>> must be created to pass the parameters, so it is expected to get some
>>> allocation and GCs. You can see it in the bytecode for your loopSum:
>>>
>>>     public void loopSum(org.openjdk.jmh.infra.Blackhole);
>>>       descriptor: (Lorg/openjdk/jmh/infra/Blackhole;)V
>>>       Code:
>>>          0: aload_1
>>>          1: iconst_2
>>>          2: newarray       int
>>>          4: dup
>>>          5: iconst_0
>>>          6: invokestatic  #6                  // Method next:()I
>>>          9: iastore
>>>         10: dup
>>>         11: iconst_1
>>>         12: invokestatic  #6                  // Method next:()I
>>>         15: iastore
>>>         16: invokestatic  #2                  // Method loop:([I)I
>>>         19: invokevirtual #7                  // Method
>>> org/openjdk/jmh/infra/Blackhole.consume:(I)V
>>>         22: return
>>>
>>> If you want to reduce the object allocation maybe you can tweak your
>>> code to not pass arguments by ...
>>> Regards,
>>> Eric
>>>
>>>
>>> On 9/16/19 11:19, Govind Jajoo wrote:
>>>> Hi team,
>>>>
>>>> We're seeing some unexpected behaviour with scalar replacement of 
>>>> arrays
>>>> getting affected by subtle changes to surrounding code. If a newly
>>> created
>>>> array is accessed in a loop or wrapped inside another object, the
>>>> optimization gets disabled easily. For example when we run the 
>>>> following
>>>> benchmark in jmh (jdk11/linux)
>>>>
>>>> public class ArrayLoop {
>>>>       private static Random s_r = new Random();
>>>>       private static int next() { return s_r.nextInt() % 1000; }
>>>>
>>>>       private static int loop(int... arr) {
>>>>           int sum = 0;
>>>>           for (int i = arr.length - 1; i >= 0; sum += arr[i--]) { ; }
>>>>           return sum;
>>>>       }
>>>>
>>>>       @Benchmark
>>>>       public void loopSum(Blackhole bh) {
>>>>           bh.consume(loop(next(), next()));
>>>>       }
>>>> }
>>>>
>>>> # JMH version: 1.21
>>>> # VM version: JDK 11.0.4, OpenJDK 64-Bit Server VM, 11.0.4+11
>>>> ArrayLoop.loopSum avgt    3   26.124
>>> ±
>>>>      7.727   ns/op
>>>> ArrayLoop.loopSum:·gc.alloc.rate avgt    3  700.529
>>> ±
>>>>    208.524  MB/sec
>>>> ArrayLoop.loopSum:·gc.count avgt    3    5.000
>>>>             counts
>>>>
>>>> We see unexpected gc activity. When we avoid the loop by 
>>>> "unrolling" it
>>> and
>>>> adding the following to the ArrayLoop class above
>>>>
>>>>       // silly manually unrolled loop
>>>>       private static int unrolled(int... arr) {
>>>>           int sum = 0;
>>>>           switch (arr.length) {
>>>>               default: for (int i = arr.length - 1; i >= 4; sum +=
>>> arr[i--])
>>>> { ; }
>>>>               case 4: sum += arr[3];
>>>>               case 3: sum += arr[2];
>>>>               case 2: sum += arr[1];
>>>>               case 1: sum += arr[0];
>>>>           }
>>>>           return sum;
>>>>       }
>>>>
>>>>       @Benchmark
>>>>       public void unrolledSum(Blackhole bh) {
>>>>           bh.consume(unrolled(next(), next()));
>>>>       }
>>>>
>>>> #
>>>> ArrayLoop.unrolledSum avgt    3
>>>> 25.076 ±    1.711   ns/op
>>>> ArrayLoop.unrolledSum:·gc.alloc.rate avgt    3   ≈
>>>> 10⁻⁴             MB/sec
>>>> ArrayLoop.unrolledSum:·gc.count avgt    3
>>>    ≈
>>>> 0             counts
>>>>
>>>> scalar replacement kicks in as expected. Then to try out a more 
>>>> realistic
>>>> scenario representing our usage, we added the following wrapper and
>>>> benchmarks
>>>>
>>>>       private static class ArrayWrapper {
>>>>           final int[] arr;
>>>>           ArrayWrapper(int... many) { arr = many; }
>>>>           int loopSum() { return loop(arr); }
>>>>           int unrolledSum() { return unrolled(arr); }
>>>>       }
>>>>
>>>>       @Benchmark
>>>>       public void wrappedUnrolledSum(Blackhole bh) {
>>>>           bh.consume(new ArrayWrapper(next(), next()).unrolledSum());
>>>>       }
>>>>
>>>>       @Benchmark
>>>>       public void wrappedLoopSum(Blackhole bh) {
>>>>           bh.consume(new ArrayWrapper(next(), next()).loopSum());
>>>>       }
>>>>
>>>> #
>>>> ArrayLoop.wrappedLoopSum avgt    3
>>>> 26.190 ±   18.853   ns/op
>>>> ArrayLoop.wrappedLoopSum:·gc.alloc.rate avgt    3
>>>>    699.433 ±  512.953  MB/sec
>>>> ArrayLoop.wrappedLoopSum:·gc.count avgt    3
>>>>    6.000             counts
>>>> ArrayLoop.wrappedUnrolledSum avgt    3
>>>> 25.877 ±   13.348   ns/op
>>>> ArrayLoop.wrappedUnrolledSum:·gc.alloc.rate avgt    3
>>>>    707.440 ±  360.702  MB/sec
>>>> ArrayLoop.wrappedUnrolledSum:·gc.count avgt    3
>>>>    6.000             counts
>>>>
>>>> While the LoopSum behaviour is same as before here, even the 
>>>> UnrolledSum
>>>> benchmark starts to show gc activity. What gives?
>>>>
>>>> Thanks,
>>>> Govind
>>>> PS: MCVE available at https://github.com/gjajoo/EA/
>>>>
>


More information about the hotspot-compiler-dev mailing list