Enable optimization of arraycopy as loads/stores with Shenandoah

Aleksey Shipilev shade at redhat.com
Thu Dec 8 16:37:44 UTC 2016


On 12/08/2016 04:38 PM, Aleksey Shipilev wrote:
> On 12/08/2016 03:55 PM, Roman Kennke wrote:
>> Am Donnerstag, den 08.12.2016, 15:15 +0100 schrieb Roland Westrelin:
>>> http://cr.openjdk.java.net/~roland/shenandoah/arraycopy/webrev.00/
>>> 
>>> This re-enables an optimization that was disabled with shenandoah.
>> 
>> Cool! I like that!
>> 
>> Do we have any idea if it does improve performance? That would be arraycopy
>> on smallish arrays only right? Aleksey?
> 
> Let me find the arraycopy tests (that I swear I did in OpenJDK for the
> previous Roland's non-Shenandoah patch :) and run then.

Using this test:

http://icedtea.classpath.org/people/shade/gc-bench/file/6d332199876c/src/main/java/org/openjdk/gcbench/runtime/arraycopy/RefArray.java

=== baseline

Benchmark          Mode  Cnt   Score   Error  Units

RefArray.nulls_01  avgt    5   3.987 ± 1.282  ns/op
RefArray.nulls_02  avgt    5   4.185 ± 0.145  ns/op
RefArray.nulls_04  avgt    5   5.022 ± 0.601  ns/op
RefArray.nulls_08  avgt    5   6.421 ± 0.252  ns/op
RefArray.nulls_16  avgt    5   8.344 ± 1.012  ns/op
RefArray.nulls_32  avgt    5  14.646 ± 1.486  ns/op
RefArray.nulls_64  avgt    5  28.125 ± 3.523  ns/op

RefArray.objs_01   avgt    5   3.905 ± 0.131  ns/op
RefArray.objs_02   avgt    5   4.267 ± 0.332  ns/op
RefArray.objs_04   avgt    5   4.838 ± 0.064  ns/op
RefArray.objs_08   avgt    5   6.459 ± 0.187  ns/op
RefArray.objs_16   avgt    5   8.610 ± 1.526  ns/op
RefArray.objs_32   avgt    5  14.269 ± 0.536  ns/op
RefArray.objs_64   avgt    5  27.225 ± 0.405  ns/op


=== baseline +UseShenandoahGC

Benchmark          Mode  Cnt   Score   Error  Units

RefArray.nulls_01  avgt    5  16.021 ± 0.379  ns/op
RefArray.nulls_02  avgt    5  15.997 ± 0.137  ns/op
RefArray.nulls_04  avgt    5  16.560 ± 0.342  ns/op
RefArray.nulls_08  avgt    5  16.103 ± 0.070  ns/op
RefArray.nulls_16  avgt    5  17.060 ± 0.285  ns/op
RefArray.nulls_32  avgt    5  18.654 ± 0.092  ns/op
RefArray.nulls_64  avgt    5  30.848 ± 0.948  ns/op

RefArray.objs_01   avgt    5  15.941 ± 0.015  ns/op
RefArray.objs_02   avgt    5  15.953 ± 0.041  ns/op
RefArray.objs_04   avgt    5  16.514 ± 0.059  ns/op
RefArray.objs_08   avgt    5  16.122 ± 0.032  ns/op
RefArray.objs_16   avgt    5  17.110 ± 0.146  ns/op
RefArray.objs_32   avgt    5  19.304 ± 0.622  ns/op
RefArray.objs_64   avgt    5  31.025 ± 0.806  ns/op


=== patched +UseShenandoahGC

Benchmark          Mode  Cnt   Score   Error  Units

RefArray.nulls_01  avgt    5   5.110 ± 0.033  ns/op
RefArray.nulls_02  avgt    5   5.293 ± 0.019  ns/op
RefArray.nulls_04  avgt    5   6.903 ± 0.065  ns/op
RefArray.nulls_08  avgt    5   9.627 ± 0.043  ns/op
RefArray.nulls_16  avgt    5  17.016 ± 0.134  ns/op
RefArray.nulls_32  avgt    5  19.466 ± 2.545  ns/op
RefArray.nulls_64  avgt    5  30.659 ± 0.147  ns/op

RefArray.objs_01   avgt    5   5.171 ± 0.106  ns/op
RefArray.objs_02   avgt    5   5.827 ± 0.013  ns/op
RefArray.objs_04   avgt    5   7.377 ± 0.046  ns/op
RefArray.objs_08   avgt    5   9.353 ± 0.099  ns/op
RefArray.objs_16   avgt    5  17.097 ± 0.434  ns/op
RefArray.objs_32   avgt    5  19.212 ± 0.792  ns/op
RefArray.objs_64   avgt    5  30.818 ± 0.301  ns/op


Good to go. I guess the code quality might be a teeny little better (we've seen
this before with null-paths in read barriers being thrown out), but I'll take
that too.

  0.82%    1.21%      ││     0x00007f2451477ea1: mov    0x10(%rcx),%r10d
  1.17%    1.04%      ││     0x00007f2451477ea5: test   %r10d,%r10d
                   ╭  ││     0x00007f2451477ea8: je     0x00007f2451477f01
  2.23%    2.78%   │  ││     0x00007f2451477eaa: mov    -0x8(%r12,%r10,8),%r10
 10.65%   13.44%   │  ││     0x00007f2451477eaf: mov    %r10,%r11
  0.14%    0.19%   │  ││     0x00007f2451477eb2: shr    $0x3,%r11
  2.68%    3.36%   │  ││↗    0x00007f2451477eb6: mov    %r11d,0x10(%rdx)
  2.46%    3.38%   │  │││    0x00007f2451477eba: mov    0x14(%rcx),%r11d
  0.05%            │  │││    0x00007f2451477ebe: test   %r11d,%r11d
                   │╭ │││    0x00007f2451477ec1: je     0x00007f2451477f06
  0.12%    0.17%   ││ │││    0x00007f2451477ec3: mov    -0x8(%r12,%r11,8),%r10
  0.64%    1.01%   ││ │││    0x00007f2451477ec8: mov    %r10,%r11
  2.42%    2.35%   ││ │││    0x00007f2451477ecb: shr    $0x3,%r11
  0.34%    0.41%   ││ │││↗   0x00007f2451477ecf: mov    %r11d,0x14(%rdx)
  1.27%    1.26%   ││ ││││   0x00007f2451477ed3: mov    0x18(%rcx),%r11d
  0.10%    0.09%   ││ ││││   0x00007f2451477ed7: test   %r11d,%r11d
                   ││╭││││   0x00007f2451477eda: je     0x00007f2451477f0b
  1.77%    1.35%   │││││││   0x00007f2451477edc: mov    -0x8(%r12,%r11,8),%r10
  0.36%    0.51%   │││││││   0x00007f2451477ee1: mov    %r10,%r11
  1.10%    1.33%   │││││││   0x00007f2451477ee4: shr    $0x3,%r11
  0.24%    0.19%   │││││││↗  0x00007f2451477ee8: mov    %r11d,0x18(%rdx)
  1.77%    1.36%   ││││││││  0x00007f2451477eec: mov    0x1c(%rcx),%r11d
  0.02%    0.03%   ││││││││  0x00007f2451477ef0: test   %r11d,%r11d
                   │││╰││││  0x00007f2451477ef3: jne    0x00007f2451477dd1
                   │││ ││││  0x00007f2451477ef9: xor    %r10,%r10
                   │││ ╰│││  0x00007f2451477efc: jmpq   0x00007f2451477dda
                   ↘││  │││  0x00007f2451477f01: xor    %r11,%r11
                    ││  ╰││  0x00007f2451477f04: jmp    0x00007f2451477eb6
                    ↘│   ││  0x00007f2451477f06: xor    %r11,%r11
                     │   ╰│  0x00007f2451477f09: jmp    0x00007f2451477ecf
                     ↘    │  0x00007f2451477f0b: xor    %r11,%r11
                          ╰  0x00007f2451477f0e: jmp    0x00007f2451477ee8


-Aleksey



More information about the shenandoah-dev mailing list