[aarch64-port-dev ] RFR: 8151502: aarch64: optimize pd_disjoint_words and pd_conjoint_words

Andrew Haley aph at redhat.com
Wed Mar 9 12:57:27 UTC 2016


On 03/09/2016 12:17 PM, Edward Nevill wrote:
> http://cr.openjdk.java.net/~enevill/8151502/JMHSample_97_GCStress.java
> 
> JMH jar file: http://cr.openjdk.java.net/~enevill/8151502/benchmarks.jar
> 
> The following are the results I get

Not bad, but not quite perfect.  But I guess you knew I'd say that.
:-)

The switch on count < threshold should be done in C, with multiple
inline asm blocks.  That way, GCC can do value range propagation for
small copies.

Also, GCC can do things like if (__builtin_constant_p(cnt)).  There
are some cases where cnt is a constant.  We should be careful that we
don't slow down slow such cases.

GCC does this:

0x0000007fb774850c <+0>: adrp x2, 0x7fb7db1000
0x0000007fb7748510 <+4>: add x0, x2, #0xe20
0x0000007fb7748514 <+8>: ldr x5, [x0,#56]
0x0000007fb7748518 <+12>: ldr x4, [x0,#48]
0x0000007fb774851c <+16>: ldr x3, [x0,#40]
0x0000007fb7748520 <+20>: ldr x1, [x0,#32]
0x0000007fb7748524 <+24>: str x5, [x0,#24]
0x0000007fb7748528 <+28>: str x4, [x0,#16]
0x0000007fb774852c <+32>: str x3, [x0,#8]
0x0000007fb7748530 <+36>: str x1, [x2,#3616]

for this:
HeapWord blah[4];
HeapWord blah2[4];

void bletch() {
  Copy::disjoint_words(blah, blah2, sizeof blah / sizeof blah[0]);
}

Finally, GCC has __builtin_expect(bool).  We should use that to emit
the large copy and the backwards copy out of line.

Finally, GCC knows that copying from one object to the other copies the
contents.  It can do copy propagation.

I think that this change can be done with no performance regressions,
either in code size or speed, for any range of arguments.

Andrew.


More information about the aarch64-port-dev mailing list