RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v5]

Nick Gasson ngasson at openjdk.org
Tue Nov 29 09:45:15 UTC 2022


On Thu, 24 Nov 2022 15:56:08 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

>> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example -
>> 
>> eor a, a, b
>> eor a, a, c
>> 
>> can be optimized to single instruction - `eor3 a, b, c`
>> 
>> This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features -
>> 
>> 
>> Benchmark               gain
>> TestEor3.test1Int       10.87%
>> TestEor3.test1Long      8.84%
>> TestEor3.test2Int       21.68%
>> TestEor3.test2Long      21.04%
>> 
>> 
>> The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon.
>
> Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Resolve merge conflicts with master
>  - Merge branch 'master' into JDK-8293488
>  - Removed svesha3 feature check for eor3
>  - Changed the modifier order preference in JTREG test
>  - Modified JTREG test to include feature constraints
>  - 8293488: Add EOR3 backend rule for aarch64 SHA3 extension
>    
>    Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those
>    SHA3 instructions - "eor3" performs an exclusive OR of three vectors.
>    This is helpful in applications that have multiple, consecutive "eor"
>    operations which can be reduced by clubbing them into fewer operations
>    using the "eor3" instruction. For example -
>    eor a, a, b
>    eor a, a, c
>    can be optimized to single instruction - eor3 a, b, c
>    
>    This patch adds backend rules for Neon and SVE2 "eor3" instructions and
>    a micro benchmark to assess the performance gains with this patch.
>    Following are the results of the included micro benchmark on a 128-bit
>    aarch64 machine that supports Neon, SVE2 and SHA3 features -
>    
>    Benchmark               gain
>    TestEor3.test1Int       10.87%
>    TestEor3.test1Long      8.84%
>    TestEor3.test2Int       21.68%
>    TestEor3.test2Long      21.04%
>    
>    The numbers shown are performance gains with using Neon eor3 instruction
>    over the master branch that uses multiple "eor" instructions instead.
>    Similar gains can be observed with the SVE2 "eor3" version as well since
>    the "eor3" instruction is unpredicated and the machine under test uses a
>    maximum vector width of 128 bits which makes the SVE2 code generation very
>    similar to the one with Neon.

test/hotspot/gtest/aarch64/aarch64-asmtest.py line 1043:

> 1041:                         [str(self.reg[i]) for i in range(1, self.numRegs)]))
> 1042:     def astr(self):
> 1043:         if self._name == "eor3":

Suggestion:

        firstArg = 0 if self._name == "eor3" else 1
        formatStr = "%s%s" + ''.join([", %s" for i in range(firstArg, self.numRegs)])


And similarly below.

-------------

PR: https://git.openjdk.org/jdk/pull/10407


More information about the hotspot-compiler-dev mailing list