JDK-8307137: aarch64 MacroAssembler::lookup_interface_method could use conditional compare instead of branch

Peter Kessler OS peter.kessler at os.amperecomputing.com
Tue May 2 00:36:47 UTC 2023


I agree that for the first few elements on the key-value array, the result is not promising, because of the need to check after the loop for which condition caused the loop to exit.  But for searches that go further down the array, ccmp is a win on the machine I've tried it on (an Ampere Altra).

Here's a table of times in nanoseconds for making 1B interface calls to various depths in an interface hierarchy, in a clone of JDK-21+19, and in JDK-21+19 with the loops done with ccmp:

Test       clone JDK-21+19   ccmp JDK-21+19
Interface 1  9,753,623,061   9,751,264,492
Interface 2 10,512,917,318   10,654,232,119
Interface 3 11,554,908,217   11,635,931,298
Interface 4 15,501,591,613   12,926,417,745
Interface 5 18,472,136,372   14,559,380,750
Interface 6 19,369,030,295   16,389,137,458
Interface 7 20,543,012,798   18,119,622,732
Interface 8 21,947,230,096   18,918,257,704

The differences will be halved if your change can eliminate one of the two loops in itable_stub.  Then using ccmp is has half the benefit.  I look forward to your patch.

                                                ... peter

From: Boris Ulasevich <boris.ulasevich at bell-sw.com>
Date: Saturday, April 29, 2023 at 04:40
To: Peter Kessler OS <peter.kessler at os.amperecomputing.com>
Cc: hotspot-dev at openjdk.java.net <hotspot-dev at openjdk.java.net>
Subject: Re: JDK-8307137: aarch64 MacroAssembler::lookup_interface_method could use conditional compare instead of branch
Peter,

I tried ccmp as part of improving itable stub on aarch64, and the results were not promising. Applying ccmp as suggested increased geomean from 15.7 ns to 15.9 ns on N1 and from 201 ns to 205 ns on A72. I don't think micro-architecture specialization in itable stub would bring universal benefits, it will only make code more complicated. I would appreciate your review of the AArch64 part of JDK-8305959 once I post it.

thanks,
Boris
On 4/29/2023 1:48 PM, Boris Ulasevich wrote:
Hi Peter,

Please have a look at JDK-8305959. I'm going to rewrite the itable stub codes to use a single pass over itable! I have an aarch64 implementation which shows improvement on Ampere Altra.

Boris
On 4/29/2023 6:18 AM, Peter Kessler OS wrote:
I notice that src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp MacroAssembler::lookup_interface_method loops over the itable list with code that uses two branches: one to check for a null indicating the end of the list, and one to see if the appropriate entry has been found.  aarch64 has a "ccmp" instruction that can be used to evaluate two conditions with only one branch.  On an out-of-order implementation with more integer execution units than branch units, the trading of a branch for a ccmp can be beneficial.  The downside is that one has to check, after the loop has exited, which of the conditions cause the loop to exit, but if the loop executes more than once or twice, that is still a win.

There are other opportunities to use cmp;ccmp;br instead of cmp;br;cmp;br.  I happened to see the one in MacroAssembler::lookup_interface_method because it was in what passes for hand-written assembler in HotSpot.  For generic searches for a key in a key-value array the improvement can be ~10% on a Ampere Altra, depending on how far down the key-value array one has to look.

I am only proposing to fix the loop in MacroAssembler::lookup_interface_method, but I would be interested in talking to people about where else the ccmp style could be applied.

                                    ... peter


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20230502/781fb85a/attachment-0001.htm>


More information about the hotspot-dev mailing list