[aarch64-port-dev ] RFR: C2: Canonicalize (x & 16 == 16) [Was: AARCH64 optimization: using TBZ instruction for bit check]
Boris Ulasevich
boris.ulasevich at bell-sw.com
Thu Jun 25 17:34:33 UTC 2020
Vladimir, thank you!
I think I need one more review. Can I ask someone else to have a look?
Thanks,
Boris
On 22.06.2020 18:48, Vladimir Kozlov wrote:
> On 6/22/20 7:45 AM, Boris Ulasevich wrote:
>> Hi Vladimir,
>>
>> > Would be nice to know if any Java benchmark is affected.
>>
>> With the change we have got 5% performance boost on lucene tokenizer
>> method on ARM64. Same time on x86 there is no visible improvement on
>> lucene tokenizer.
>
> Good.
>
> I ran our benchmarks (mostly jvm2008) on x86 and don't see any effects
> too.
>
> Thanks,
> Vladimir
>
>>
>> thanks,
>> Boris
>>
>> import org.apache.lucene.analysis.standard.StandardTokenizerImpl;
>> import java.nio.file.Files;
>> import java.io.*;
>>
>> class Test {
>> public static void main(String args[]) {
>> long count = 0;
>> try {
>> byte[] content = Files.readAllBytes(new
>> File("aarch64.ad").toPath());
>> for (int i=0; i < 1000; i++) {
>> Reader reader = new InputStreamReader(new
>> ByteArrayInputStream(content));
>> StandardTokenizerImpl sti = new StandardTokenizerImpl(reader);
>> while (sti.getNextToken() != -1) {
>> count ++;
>> }
>> }
>> } catch (Exception ex) { System.out.println(ex); }
>> System.out.println(count);
>> }
>> }
>>
>>
>> On 19.06.2020 21:36, Vladimir Kozlov wrote:
>>> Nice optimization.
>>>
>>> I don't think we should turn it off on any machine. In real
>>> application you will not see such tight loops only with such branch.
>>> On other hand reducing code size should help in all cases.
>>>
>>> Would be nice to know if any Java benchmark is affected.
>>>
>>> I will try to run our set of benchmarks with these changes.
>>>
>>> Regards,
>>> Vladimir K
>>>
>>> On 6/19/20 10:07 AM, Andrew Haley wrote:
>>>> Hi,
>>>>
>>>> On 19/06/2020 17:49, Boris Ulasevich wrote:
>>>>> I added the expression canonicalization in the BoolNode::Ideal
>>>>> method:
>>>>> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.02b
>>>>>
>>>>> The change reduces a number of generated machine instructions on all
>>>>> ARM/x86/PPC architectures. Benchmark shows positive results on
>>>>> ARM64 and
>>>>> ARM32 with the given change.
>>>>>
>>>>> On x86 benchmark performance improves from +1% to +13% depending
>>>>> on the
>>>>> CPU generation, except of machines affected by Intel Erratum
>>>>> (JDK-8234160)
>>>>> issue. Maximum decrease observed is -%11. It does not look like a
>>>>> problem
>>>>> with the proposed benchmark though, but rather like an issue with
>>>>> Erratum mitigation.
>>>>>
>>>>> On PowerPC result of the micro-benchmark is also positive. I
>>>>> changed the
>>>>> micro-benchmark to make it a little bulkier so that we don't hit the
>>>>> limitations of architectures with a less elaborate branch prediction
>>>>> mechanism. The original application performance does not change on
>>>>> PowerPC.
>>>>
>>>> Fantastic work, thanks! You've done a remarkably thorough job. It's
>>>> slightly unfortunate that one of the targets regresses. If there had
>>>> been no regressions, I'd approve this straight away.
>>>>
>>>> Forwarding to hotspot-compiler-dev for more comments.
>>>>
>>>> VladimirK, what do you think? I guess we could turn this off on the
>>>> machines affected by JDK-8234160. Should we?
>>>>
>>
More information about the aarch64-port-dev
mailing list