RFR: 8282431: AArch64: Add optimized rules for masked vector multiply-add/sub for SVE [v2]
Xiaohong Gong
xgong at openjdk.java.net
Tue Mar 15 01:24:36 UTC 2022
On Thu, 10 Mar 2022 07:55:15 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> We have the optimized match rules for vector `"fmls,fnmla,fnmls,mla,mls"` for ARM SVE. Similarly we can also add the rules for the relative masked operations to generate the optimized predicated instructions.
>>
>> With this patch the following masked vector multiply-add for a byte vector:
>>
>> mul z18.b, p7/m, z18.b, z17.b
>> add z17.b, p0/m, z17.b, z18.b
>>
>> could be optimized to:
>>
>> mla z19.b, p0/m, z18.b, z17.b
>>
>> And so does the multiply-sub operations. Also the following masked fused multiply-substract for a float vector:
>>
>> fneg z18.s, p7/m, z18.s
>> fmad z17.s, p0/m, z18.s, z16.s
>>
>> could be optimized to:
>>
>> fmsb z17.s, p0/m, z18.s, z16.s"
>>
>> And the same to the relative negated fused operations.
>>
>> This patch also fixes the potential issues for the usage of `NegVF/D` in match rules. The explicit check of non-predicated vector for `NegVF/D` must be added to the match rule predicate if the `NegVF/D` is assumed to be non-masked. Otherwise, the jvm might crash if a masked` NegVF/D` with two operands is matched into a rule which assumes the `NegVF/D` in subtree has one operand.
>>
>> Also add the jtreg tests for all the touched vector operations.
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
>
> Use the ir framework for the test
Hi, could anyone please help to take a look at this PR? Thanks so much!
-------------
PR: https://git.openjdk.java.net/jdk/pull/7737
More information about the hotspot-compiler-dev
mailing list