[aarch64-port-dev ] RFR: C2: Canonicalize (x & 16 == 16) [Was: AARCH64 optimization: using TBZ instruction for bit check]
Boris Ulasevich
boris.ulasevich at bell-sw.com
Mon Jun 22 14:45:13 UTC 2020
Hi Vladimir,
> Would be nice to know if any Java benchmark is affected.
With the change we have got 5% performance boost on lucene tokenizer
method on ARM64. Same time on x86 there is no visible improvement on
lucene tokenizer.
thanks,
Boris
import org.apache.lucene.analysis.standard.StandardTokenizerImpl;
import java.nio.file.Files;
import java.io.*;
class Test {
public static void main(String args[]) {
long count = 0;
try {
byte[] content = Files.readAllBytes(new
File("aarch64.ad").toPath());
for (int i=0; i < 1000; i++) {
Reader reader = new InputStreamReader(new
ByteArrayInputStream(content));
StandardTokenizerImpl sti = new StandardTokenizerImpl(reader);
while (sti.getNextToken() != -1) {
count ++;
}
}
} catch (Exception ex) { System.out.println(ex); }
System.out.println(count);
}
}
On 19.06.2020 21:36, Vladimir Kozlov wrote:
> Nice optimization.
>
> I don't think we should turn it off on any machine. In real
> application you will not see such tight loops only with such branch.
> On other hand reducing code size should help in all cases.
>
> Would be nice to know if any Java benchmark is affected.
>
> I will try to run our set of benchmarks with these changes.
>
> Regards,
> Vladimir K
>
> On 6/19/20 10:07 AM, Andrew Haley wrote:
>> Hi,
>>
>> On 19/06/2020 17:49, Boris Ulasevich wrote:
>>> I added the expression canonicalization in the BoolNode::Ideal method:
>>> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.02b
>>>
>>> The change reduces a number of generated machine instructions on all
>>> ARM/x86/PPC architectures. Benchmark shows positive results on ARM64
>>> and
>>> ARM32 with the given change.
>>>
>>> On x86 benchmark performance improves from +1% to +13% depending on the
>>> CPU generation, except of machines affected by Intel Erratum
>>> (JDK-8234160)
>>> issue. Maximum decrease observed is -%11. It does not look like a
>>> problem
>>> with the proposed benchmark though, but rather like an issue with
>>> Erratum mitigation.
>>>
>>> On PowerPC result of the micro-benchmark is also positive. I changed
>>> the
>>> micro-benchmark to make it a little bulkier so that we don't hit the
>>> limitations of architectures with a less elaborate branch prediction
>>> mechanism. The original application performance does not change on
>>> PowerPC.
>>
>> Fantastic work, thanks! You've done a remarkably thorough job. It's
>> slightly unfortunate that one of the targets regresses. If there had
>> been no regressions, I'd approve this straight away.
>>
>> Forwarding to hotspot-compiler-dev for more comments.
>>
>> VladimirK, what do you think? I guess we could turn this off on the
>> machines affected by JDK-8234160. Should we?
>>
More information about the aarch64-port-dev
mailing list