RFR: 8289422: Fix and re-enable vector conditional move [v4]
Fei Gao
fgao at openjdk.org
Tue Sep 13 02:34:59 UTC 2022
On Tue, 6 Sep 2022 02:47:38 GMT, Fei Gao <fgao at openjdk.org> wrote:
>> // float[] a, float[] b, float[] c;
>> for (int i = 0; i < a.length; i++) {
>> c[i] = (a[i] > b[i]) ? a[i] : b[i];
>> }
>>
>>
>> After [JDK-8139340](https://bugs.openjdk.org/browse/JDK-8139340) and [JDK-8192846](https://bugs.openjdk.org/browse/JDK-8192846), we hope to vectorize the case
>> above by enabling -XX:+UseCMoveUnconditionally and -XX:+UseVectorCmov.
>> But the transformation here[1] is going to optimize the BoolNode
>> with constant input to a constant and break the design logic of
>> cmove vector node[2]. We can't prevent all GVN transformation to
>> the BoolNode before matcher, so the patch keeps the condition input
>> as a constant while creating a cmove vector node, and then
>> restructures it into a binary tree before matching.
>>
>> When the input order of original cmp node is different from the
>> input order of original cmove node, like:
>>
>> // float[] a, float[] b, float[] c;
>> for (int i = 0; i < a.length; i++) {
>> c[i] = (a[i] < b[i]) ? a[i] : b[i];
>> }
>>
>> the patch negates the mask of the BoolNode before creating the
>> cmove vector node in SuperWord::output().
>>
>> We can also use VectorNode::implemented() to consult if vector
>> conditional move is supported in the backend. So, the patch cleans
>> the related code in SuperWord::implemented().
>>
>> With the patch, the performance uplift is:
>> (The micro-benchmark functions are included in the file
>> test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java)
>>
>> AArch64:
>> Benchmark (length) Mode Cnt uplift(ns/op)
>> cmoveD 523 avgt 15 68.89%
>> cmoveF 523 avgt 15 72.40%
>>
>> X86:
>> Benchmark (length) Mode Cnt uplift(ns/op)
>> cmoveD 523 avgt 15 73.12%
>> cmoveF 523 avgt 15 85.45%
>>
>> [1]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/subnode.cpp#L1310
>> [2]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/matcher.cpp#L2365
>
> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
>
> - Rebase the patch to the latest JDK and add some testcase for NE and EQ
>
> Change-Id: Ifb02b5efc2a09e6e0b4fc1c8346698597464f448
> - Merge branch 'master' into fg8289422
>
> Change-Id: I09677cb07f6b2717aa768a830663ca455806b900
> - Merge branch 'master' into fg8289422
>
> Change-Id: I870c7bbc73d12bac16756226125edc1a229ba412
> - Enable the test only on aarch64 platform because X86 supports vector cmove only on some 256-bits AVXs
>
> Change-Id: I64dd49380fe3d303ef6be21460df3be31c1458f8
> - Merge branch 'master' into fg8289422
>
> Change-Id: I7936552df6ac12949ed8b550576f4e3520596423
> - 8289422: Fix and re-enable vector conditional move
>
> ```
> // float[] a, float[] b, float[] c;
> for (int i = 0; i < a.length; i++) {
> c[i] = (a[i] > b[i]) ? a[i] : b[i];
> }
> ```
>
> After JDK-8139340 and JDK-8192846, we hope to vectorize the case
> above by enabling -XX:+UseCMoveUnconditionally and -XX:+UseVectorCmov.
> But the transformation here[1] is going to optimize the BoolNode
> with constant input to a constant and break the design logic of
> cmove vector node[2]. We can't prevent all GVN transformation to
> the BoolNode before matcher, so the patch keeps the condition input
> as a constant while creating a cmove vector node, and then
> restructures it into a binary tree before matching.
>
> When the input order of original cmp node is different from the
> input order of original cmove node, like:
> ```
> // float[] a, float[] b, float[] c;
> for (int i = 0; i < a.length; i++) {
> c[i] = (a[i] < b[i]) ? a[i] : b[i];
> }
> ```
> the patch negates the mask of the BoolNode before creating the
> cmove vector node in SuperWord::output().
>
> We can also use VectorNode::implemented() to consult if vector
> conditional move is supported in the backend. So, the patch cleans
> the related code in SuperWord::implemented().
>
> With the patch, the performance uplift is:
> (The micro-benchmark functions are included in the file
> test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java)
>
> AArch64:
> Benchmark (length) Mode Cnt uplift(ns/op)
> cmoveD 523 avgt 15 68.89%
> cmoveF 523 avgt 15 72.40%
>
> X86:
> Benchmark (length) Mode Cnt uplift(ns/op)
> cmoveD 523 avgt 15 73.12%
> cmoveF 523 avgt 15 85.45%
>
> [1]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/subnode.cpp#L1310
> [2]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/matcher.cpp#L2365
>
> Change-Id: If046dd745024deb0e602bf7efc2a07c22b89c690
> Thanks, I can see failures with the following tests when running with `-XX:+UseCMoveUnconditionally -XX:+UseVectorCmov`:
>
> * `compiler/c2/TestCondAddDeadBranch.java`
> * `compiler/loopopts/TestCastFFAtPhi.java`
>
> They also happen without this patch. Should we file a separate bug or are these supposed to be fixed by this change?
Thanks for your effort, @TobiHartmann . The backtrace of the failure is different from the problem that the patch tries to fix. It may be caused by another problem in mid-end. So I prefer to fix it in a separate patch and try to make each patch much easier. WDYT?
-------------
PR: https://git.openjdk.org/jdk/pull/9652
More information about the hotspot-compiler-dev
mailing list