On 03/09/2016 12:17 PM, Edward Nevill wrote:
http://cr.openjdk.java.net/~enevill/8151502/JMHSample_97_GCStress.java
JMH jar file: http://cr.openjdk.java.net/~enevill/8151502/benchmarks.jar
The following are the results I get
Not bad, but not quite perfect. But I guess you knew I'd say that. :-) The switch on count < threshold should be done in C, with multiple inline asm blocks. That way, GCC can do value range propagation for small copies. Also, GCC can do things like if (__builtin_constant_p(cnt)). There are some cases where cnt is a constant. We should be careful that we don't slow down slow such cases. GCC does this: 0x0000007fb774850c <+0>: adrp x2, 0x7fb7db1000 0x0000007fb7748510 <+4>: add x0, x2, #0xe20 0x0000007fb7748514 <+8>: ldr x5, [x0,#56] 0x0000007fb7748518 <+12>: ldr x4, [x0,#48] 0x0000007fb774851c <+16>: ldr x3, [x0,#40] 0x0000007fb7748520 <+20>: ldr x1, [x0,#32] 0x0000007fb7748524 <+24>: str x5, [x0,#24] 0x0000007fb7748528 <+28>: str x4, [x0,#16] 0x0000007fb774852c <+32>: str x3, [x0,#8] 0x0000007fb7748530 <+36>: str x1, [x2,#3616] for this: HeapWord blah[4]; HeapWord blah2[4]; void bletch() { Copy::disjoint_words(blah, blah2, sizeof blah / sizeof blah[0]); } Finally, GCC has __builtin_expect(bool). We should use that to emit the large copy and the backwards copy out of line. Finally, GCC knows that copying from one object to the other copies the contents. It can do copy propagation. I think that this change can be done with no performance regressions, either in code size or speed, for any range of arguments. Andrew.