scalar replacement of arrays affected by minor changes to surrounding code
Nils Eliasson
nils.eliasson at oracle.com
Tue Sep 17 07:35:38 UTC 2019
We also have a problem where array allocations that have lost their uses
doesn't get eliminated.
If we can't prove that the array length is positive, the allocation must
be replaced by a guard, checking for negative values and throwing
NegativeArraySizeException.
I have a almost finished patch for this.
// Nils
On 2019-09-17 05:39, dean.long at oracle.com wrote:
> The problem sounds similar to this issue:
> https://bugs.openjdk.java.net/browse/JDK-6853701
>
> dl
>
> On 9/16/19 3:07 PM, Govind Jajoo wrote:
>> hi Eric,
>>
>> We're operating well within the default limit of
>> -XX:EliminateAllocationArraySizeLimit
>> and as shown in the tests, escape analysis is able to identify and elide
>> the array allocations for hand-unrolled loops. What we're trying to
>> figure
>> out is why a loop or an object wrapper is affecting this optimization?
>> We've tried with and without the ... args, but creating a temporary
>> array
>> instead and it makes no difference (Examples checked in to the github
>> repo).
>>
>> Are you suggesting that this optimization is not supported in
>> presence of
>> loops?
>>
>> Thanks,
>> Govind
>>
>>
>> On Mon, Sep 16, 2019 at 11:40 PM Eric Caspole <eric.caspole at oracle.com>
>> wrote:
>>
>>> Hi Govind,
>>> When you use ... to pass parameters and receive the array, the array
>>> must be created to pass the parameters, so it is expected to get some
>>> allocation and GCs. You can see it in the bytecode for your loopSum:
>>>
>>> public void loopSum(org.openjdk.jmh.infra.Blackhole);
>>> descriptor: (Lorg/openjdk/jmh/infra/Blackhole;)V
>>> Code:
>>> 0: aload_1
>>> 1: iconst_2
>>> 2: newarray int
>>> 4: dup
>>> 5: iconst_0
>>> 6: invokestatic #6 // Method next:()I
>>> 9: iastore
>>> 10: dup
>>> 11: iconst_1
>>> 12: invokestatic #6 // Method next:()I
>>> 15: iastore
>>> 16: invokestatic #2 // Method loop:([I)I
>>> 19: invokevirtual #7 // Method
>>> org/openjdk/jmh/infra/Blackhole.consume:(I)V
>>> 22: return
>>>
>>> If you want to reduce the object allocation maybe you can tweak your
>>> code to not pass arguments by ...
>>> Regards,
>>> Eric
>>>
>>>
>>> On 9/16/19 11:19, Govind Jajoo wrote:
>>>> Hi team,
>>>>
>>>> We're seeing some unexpected behaviour with scalar replacement of
>>>> arrays
>>>> getting affected by subtle changes to surrounding code. If a newly
>>> created
>>>> array is accessed in a loop or wrapped inside another object, the
>>>> optimization gets disabled easily. For example when we run the
>>>> following
>>>> benchmark in jmh (jdk11/linux)
>>>>
>>>> public class ArrayLoop {
>>>> private static Random s_r = new Random();
>>>> private static int next() { return s_r.nextInt() % 1000; }
>>>>
>>>> private static int loop(int... arr) {
>>>> int sum = 0;
>>>> for (int i = arr.length - 1; i >= 0; sum += arr[i--]) { ; }
>>>> return sum;
>>>> }
>>>>
>>>> @Benchmark
>>>> public void loopSum(Blackhole bh) {
>>>> bh.consume(loop(next(), next()));
>>>> }
>>>> }
>>>>
>>>> # JMH version: 1.21
>>>> # VM version: JDK 11.0.4, OpenJDK 64-Bit Server VM, 11.0.4+11
>>>> ArrayLoop.loopSum avgt 3 26.124
>>> ±
>>>> 7.727 ns/op
>>>> ArrayLoop.loopSum:·gc.alloc.rate avgt 3 700.529
>>> ±
>>>> 208.524 MB/sec
>>>> ArrayLoop.loopSum:·gc.count avgt 3 5.000
>>>> counts
>>>>
>>>> We see unexpected gc activity. When we avoid the loop by
>>>> "unrolling" it
>>> and
>>>> adding the following to the ArrayLoop class above
>>>>
>>>> // silly manually unrolled loop
>>>> private static int unrolled(int... arr) {
>>>> int sum = 0;
>>>> switch (arr.length) {
>>>> default: for (int i = arr.length - 1; i >= 4; sum +=
>>> arr[i--])
>>>> { ; }
>>>> case 4: sum += arr[3];
>>>> case 3: sum += arr[2];
>>>> case 2: sum += arr[1];
>>>> case 1: sum += arr[0];
>>>> }
>>>> return sum;
>>>> }
>>>>
>>>> @Benchmark
>>>> public void unrolledSum(Blackhole bh) {
>>>> bh.consume(unrolled(next(), next()));
>>>> }
>>>>
>>>> #
>>>> ArrayLoop.unrolledSum avgt 3
>>>> 25.076 ± 1.711 ns/op
>>>> ArrayLoop.unrolledSum:·gc.alloc.rate avgt 3 ≈
>>>> 10⁻⁴ MB/sec
>>>> ArrayLoop.unrolledSum:·gc.count avgt 3
>>> ≈
>>>> 0 counts
>>>>
>>>> scalar replacement kicks in as expected. Then to try out a more
>>>> realistic
>>>> scenario representing our usage, we added the following wrapper and
>>>> benchmarks
>>>>
>>>> private static class ArrayWrapper {
>>>> final int[] arr;
>>>> ArrayWrapper(int... many) { arr = many; }
>>>> int loopSum() { return loop(arr); }
>>>> int unrolledSum() { return unrolled(arr); }
>>>> }
>>>>
>>>> @Benchmark
>>>> public void wrappedUnrolledSum(Blackhole bh) {
>>>> bh.consume(new ArrayWrapper(next(), next()).unrolledSum());
>>>> }
>>>>
>>>> @Benchmark
>>>> public void wrappedLoopSum(Blackhole bh) {
>>>> bh.consume(new ArrayWrapper(next(), next()).loopSum());
>>>> }
>>>>
>>>> #
>>>> ArrayLoop.wrappedLoopSum avgt 3
>>>> 26.190 ± 18.853 ns/op
>>>> ArrayLoop.wrappedLoopSum:·gc.alloc.rate avgt 3
>>>> 699.433 ± 512.953 MB/sec
>>>> ArrayLoop.wrappedLoopSum:·gc.count avgt 3
>>>> 6.000 counts
>>>> ArrayLoop.wrappedUnrolledSum avgt 3
>>>> 25.877 ± 13.348 ns/op
>>>> ArrayLoop.wrappedUnrolledSum:·gc.alloc.rate avgt 3
>>>> 707.440 ± 360.702 MB/sec
>>>> ArrayLoop.wrappedUnrolledSum:·gc.count avgt 3
>>>> 6.000 counts
>>>>
>>>> While the LoopSum behaviour is same as before here, even the
>>>> UnrolledSum
>>>> benchmark starts to show gc activity. What gives?
>>>>
>>>> Thanks,
>>>> Govind
>>>> PS: MCVE available at https://github.com/gjajoo/EA/
>>>>
>
More information about the hotspot-compiler-dev
mailing list