Enable optimization of arraycopy as loads/stores with Shenandoah
Aleksey Shipilev
shade at redhat.com
Thu Dec 8 16:37:44 UTC 2016
On 12/08/2016 04:38 PM, Aleksey Shipilev wrote:
> On 12/08/2016 03:55 PM, Roman Kennke wrote:
>> Am Donnerstag, den 08.12.2016, 15:15 +0100 schrieb Roland Westrelin:
>>> http://cr.openjdk.java.net/~roland/shenandoah/arraycopy/webrev.00/
>>>
>>> This re-enables an optimization that was disabled with shenandoah.
>>
>> Cool! I like that!
>>
>> Do we have any idea if it does improve performance? That would be arraycopy
>> on smallish arrays only right? Aleksey?
>
> Let me find the arraycopy tests (that I swear I did in OpenJDK for the
> previous Roland's non-Shenandoah patch :) and run then.
Using this test:
http://icedtea.classpath.org/people/shade/gc-bench/file/6d332199876c/src/main/java/org/openjdk/gcbench/runtime/arraycopy/RefArray.java
=== baseline
Benchmark Mode Cnt Score Error Units
RefArray.nulls_01 avgt 5 3.987 ± 1.282 ns/op
RefArray.nulls_02 avgt 5 4.185 ± 0.145 ns/op
RefArray.nulls_04 avgt 5 5.022 ± 0.601 ns/op
RefArray.nulls_08 avgt 5 6.421 ± 0.252 ns/op
RefArray.nulls_16 avgt 5 8.344 ± 1.012 ns/op
RefArray.nulls_32 avgt 5 14.646 ± 1.486 ns/op
RefArray.nulls_64 avgt 5 28.125 ± 3.523 ns/op
RefArray.objs_01 avgt 5 3.905 ± 0.131 ns/op
RefArray.objs_02 avgt 5 4.267 ± 0.332 ns/op
RefArray.objs_04 avgt 5 4.838 ± 0.064 ns/op
RefArray.objs_08 avgt 5 6.459 ± 0.187 ns/op
RefArray.objs_16 avgt 5 8.610 ± 1.526 ns/op
RefArray.objs_32 avgt 5 14.269 ± 0.536 ns/op
RefArray.objs_64 avgt 5 27.225 ± 0.405 ns/op
=== baseline +UseShenandoahGC
Benchmark Mode Cnt Score Error Units
RefArray.nulls_01 avgt 5 16.021 ± 0.379 ns/op
RefArray.nulls_02 avgt 5 15.997 ± 0.137 ns/op
RefArray.nulls_04 avgt 5 16.560 ± 0.342 ns/op
RefArray.nulls_08 avgt 5 16.103 ± 0.070 ns/op
RefArray.nulls_16 avgt 5 17.060 ± 0.285 ns/op
RefArray.nulls_32 avgt 5 18.654 ± 0.092 ns/op
RefArray.nulls_64 avgt 5 30.848 ± 0.948 ns/op
RefArray.objs_01 avgt 5 15.941 ± 0.015 ns/op
RefArray.objs_02 avgt 5 15.953 ± 0.041 ns/op
RefArray.objs_04 avgt 5 16.514 ± 0.059 ns/op
RefArray.objs_08 avgt 5 16.122 ± 0.032 ns/op
RefArray.objs_16 avgt 5 17.110 ± 0.146 ns/op
RefArray.objs_32 avgt 5 19.304 ± 0.622 ns/op
RefArray.objs_64 avgt 5 31.025 ± 0.806 ns/op
=== patched +UseShenandoahGC
Benchmark Mode Cnt Score Error Units
RefArray.nulls_01 avgt 5 5.110 ± 0.033 ns/op
RefArray.nulls_02 avgt 5 5.293 ± 0.019 ns/op
RefArray.nulls_04 avgt 5 6.903 ± 0.065 ns/op
RefArray.nulls_08 avgt 5 9.627 ± 0.043 ns/op
RefArray.nulls_16 avgt 5 17.016 ± 0.134 ns/op
RefArray.nulls_32 avgt 5 19.466 ± 2.545 ns/op
RefArray.nulls_64 avgt 5 30.659 ± 0.147 ns/op
RefArray.objs_01 avgt 5 5.171 ± 0.106 ns/op
RefArray.objs_02 avgt 5 5.827 ± 0.013 ns/op
RefArray.objs_04 avgt 5 7.377 ± 0.046 ns/op
RefArray.objs_08 avgt 5 9.353 ± 0.099 ns/op
RefArray.objs_16 avgt 5 17.097 ± 0.434 ns/op
RefArray.objs_32 avgt 5 19.212 ± 0.792 ns/op
RefArray.objs_64 avgt 5 30.818 ± 0.301 ns/op
Good to go. I guess the code quality might be a teeny little better (we've seen
this before with null-paths in read barriers being thrown out), but I'll take
that too.
0.82% 1.21% ││ 0x00007f2451477ea1: mov 0x10(%rcx),%r10d
1.17% 1.04% ││ 0x00007f2451477ea5: test %r10d,%r10d
╭ ││ 0x00007f2451477ea8: je 0x00007f2451477f01
2.23% 2.78% │ ││ 0x00007f2451477eaa: mov -0x8(%r12,%r10,8),%r10
10.65% 13.44% │ ││ 0x00007f2451477eaf: mov %r10,%r11
0.14% 0.19% │ ││ 0x00007f2451477eb2: shr $0x3,%r11
2.68% 3.36% │ ││↗ 0x00007f2451477eb6: mov %r11d,0x10(%rdx)
2.46% 3.38% │ │││ 0x00007f2451477eba: mov 0x14(%rcx),%r11d
0.05% │ │││ 0x00007f2451477ebe: test %r11d,%r11d
│╭ │││ 0x00007f2451477ec1: je 0x00007f2451477f06
0.12% 0.17% ││ │││ 0x00007f2451477ec3: mov -0x8(%r12,%r11,8),%r10
0.64% 1.01% ││ │││ 0x00007f2451477ec8: mov %r10,%r11
2.42% 2.35% ││ │││ 0x00007f2451477ecb: shr $0x3,%r11
0.34% 0.41% ││ │││↗ 0x00007f2451477ecf: mov %r11d,0x14(%rdx)
1.27% 1.26% ││ ││││ 0x00007f2451477ed3: mov 0x18(%rcx),%r11d
0.10% 0.09% ││ ││││ 0x00007f2451477ed7: test %r11d,%r11d
││╭││││ 0x00007f2451477eda: je 0x00007f2451477f0b
1.77% 1.35% │││││││ 0x00007f2451477edc: mov -0x8(%r12,%r11,8),%r10
0.36% 0.51% │││││││ 0x00007f2451477ee1: mov %r10,%r11
1.10% 1.33% │││││││ 0x00007f2451477ee4: shr $0x3,%r11
0.24% 0.19% │││││││↗ 0x00007f2451477ee8: mov %r11d,0x18(%rdx)
1.77% 1.36% ││││││││ 0x00007f2451477eec: mov 0x1c(%rcx),%r11d
0.02% 0.03% ││││││││ 0x00007f2451477ef0: test %r11d,%r11d
│││╰││││ 0x00007f2451477ef3: jne 0x00007f2451477dd1
│││ ││││ 0x00007f2451477ef9: xor %r10,%r10
│││ ╰│││ 0x00007f2451477efc: jmpq 0x00007f2451477dda
↘││ │││ 0x00007f2451477f01: xor %r11,%r11
││ ╰││ 0x00007f2451477f04: jmp 0x00007f2451477eb6
↘│ ││ 0x00007f2451477f06: xor %r11,%r11
│ ╰│ 0x00007f2451477f09: jmp 0x00007f2451477ecf
↘ │ 0x00007f2451477f0b: xor %r11,%r11
╰ 0x00007f2451477f0e: jmp 0x00007f2451477ee8
-Aleksey
More information about the shenandoah-dev
mailing list