[aarch64-port-dev ] Caller registers protection inside loop hurts performance

Andrew Haley aph at redhat.com
Tue Aug 14 15:09:49 UTC 2018


On 08/14/2018 08:38 AM, Patrick Zhang wrote:
> Ran string-density-bench.LengthBench and dumped out the assembly code, we can find that inside the main loop (do-while in attached java code snippet) there are a couple of registers (x13,x14,x15,x19,x20 here) spilled/filled when calling to the static function do_cmp(). If the loop count becomes larger, the extra overhead would be very heavy, if C2 could move these protections out of the loop, based on local analysis inside the caller function test_avgt_jmhStub(), the total time could be saved ~20-25% per my tests with count=4096 (the default parameter value). Do we have any opportunity to optimize this in aarch64-port?

It's a known problem in the register allocator, and affects all ports.
Probably very hard to fix.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


More information about the aarch64-port-dev mailing list