On 02/17/2016 01:21 PM, Hui Shi wrote:
For StringConcat test ( http://people.linaro.org/~hui.shi/arraycopy/StringConcatTest.java), though array copy only takes 25% cycles in this test, entire test can still see 3.5% improvement with this combine load/store optimization. However I wondering if this is the proper way to improve these test-bit-load-store code sequence. This will requires extra really “disjoint” array copy stub code, current disjoint array copy only means it can safely perform forward array copy. Or introduce no "overlap" test at runtime. My personal tradeoff is leaving array copy code unchanged and keep it simply and consistent now.
OK, that makes sense. My plan (such as it is) for tidying up the tail code is to convert three bit-test-and-branches into a single 8-way computed jump with an optimum sequence for all 8 cases. Sure, it will usually be mispredicted, but it's just a single jump. But really, once we're down to 3.5% of a contrived string- concatenation intensive test, it's questionable whether this is what we need to be spending time on. Thanks, Andrew.