[aarch64-port-dev ] RFR: C2: Canonicalize (x & 16 == 16) [Was: AARCH64 optimization: using TBZ instruction for bit check]

Thu Jun 25 17:34:33 UTC 2020

Vladimir, thank you!

I think I need one more review. Can I ask someone else to have a look?

Thanks,
Boris

On 22.06.2020 18:48, Vladimir Kozlov wrote:
> On 6/22/20 7:45 AM, Boris Ulasevich wrote:
>> Hi Vladimir,
>>
>>  > Would be nice to know if any Java benchmark is affected.
>>
>> With the change we have got 5% performance boost on lucene tokenizer 
>> method on ARM64. Same time on x86 there is no visible improvement on 
>> lucene tokenizer.
>
> Good.
>
> I ran our benchmarks (mostly jvm2008) on x86 and don't see any effects 
> too.
>
> Thanks,
> Vladimir
>
>>
>> thanks,
>> Boris
>>
>> import org.apache.lucene.analysis.standard.StandardTokenizerImpl;
>> import java.nio.file.Files;
>> import java.io.*;
>>
>> class Test {
>>    public static void main(String args[]) {
>>      long count = 0;
>>      try {
>>        byte[] content = Files.readAllBytes(new 
>> File("aarch64.ad").toPath());
>>        for (int i=0; i < 1000; i++) {
>>          Reader reader = new InputStreamReader(new 
>> ByteArrayInputStream(content));
>>          StandardTokenizerImpl sti = new StandardTokenizerImpl(reader);
>>          while (sti.getNextToken() != -1) {
>>            count ++;
>>          }
>>        }
>>      } catch (Exception ex) { System.out.println(ex); }
>>      System.out.println(count);
>>    }
>> }
>>
>>
>> On 19.06.2020 21:36, Vladimir Kozlov wrote:
>>> Nice optimization.
>>>
>>> I don't think we should turn it off on any machine. In real 
>>> application you will not see such tight loops only with such branch. 
>>> On other hand reducing code size should help in all cases.
>>>
>>> Would be nice to know if any Java benchmark is affected.
>>>
>>> I will try to run our set of benchmarks with these changes.
>>>
>>> Regards,
>>> Vladimir K
>>>
>>> On 6/19/20 10:07 AM, Andrew Haley wrote:
>>>> Hi,
>>>>
>>>> On 19/06/2020 17:49, Boris Ulasevich wrote:
>>>>> I added the expression canonicalization in the BoolNode::Ideal 
>>>>> method:
>>>>> http://cr.openjdk.java.net/~bulasevich/8247408/webrev.02b
>>>>>
>>>>> The change reduces a number of generated machine instructions on all
>>>>> ARM/x86/PPC architectures. Benchmark shows positive results on 
>>>>> ARM64 and
>>>>> ARM32 with the given change.
>>>>>
>>>>> On x86 benchmark performance improves from +1% to +13% depending 
>>>>> on the
>>>>> CPU generation, except of machines affected by Intel Erratum 
>>>>> (JDK-8234160)
>>>>> issue. Maximum decrease observed is -%11. It does not look like a 
>>>>> problem
>>>>> with the proposed benchmark though, but rather like an issue with
>>>>> Erratum mitigation.
>>>>>
>>>>> On PowerPC result of the micro-benchmark is also positive. I 
>>>>> changed the
>>>>> micro-benchmark to make it a little bulkier so that we don't hit the
>>>>> limitations of architectures with a less elaborate branch prediction
>>>>> mechanism. The original application performance does not change on 
>>>>> PowerPC.
>>>>
>>>> Fantastic work, thanks! You've done a remarkably thorough job. It's
>>>> slightly unfortunate that one of the targets regresses. If there had
>>>> been no regressions, I'd approve this straight away.
>>>>
>>>> Forwarding to hotspot-compiler-dev for more comments.
>>>>
>>>> VladimirK, what do you think? I guess we could turn this off on the
>>>> machines affected by JDK-8234160. Should we?
>>>>
>>