RFR: 8289422: Fix and re-enable vector conditional move [v4]

Tobias Hartmann thartmann at openjdk.org
Fri Sep 9 10:50:42 UTC 2022


On Tue, 6 Sep 2022 02:47:38 GMT, Fei Gao <fgao at openjdk.org> wrote:

>> // float[] a, float[] b, float[] c;
>> for (int i = 0; i < a.length; i++) {
>>     c[i] = (a[i] > b[i]) ? a[i] : b[i];
>> }
>> 
>> 
>> After [JDK-8139340](https://bugs.openjdk.org/browse/JDK-8139340) and [JDK-8192846](https://bugs.openjdk.org/browse/JDK-8192846), we hope to vectorize the case
>> above by enabling -XX:+UseCMoveUnconditionally and -XX:+UseVectorCmov.
>> But the transformation here[1] is going to optimize the BoolNode
>> with constant input to a constant and break the design logic of
>> cmove vector node[2]. We can't prevent all GVN transformation to
>> the BoolNode before matcher, so the patch keeps the condition input
>> as a constant while creating a cmove vector node, and then
>> restructures it into a binary tree before matching.
>> 
>> When the input order of original cmp node is different from the
>> input order of original cmove node, like:
>> 
>> // float[] a, float[] b, float[] c;
>> for (int i = 0; i < a.length; i++) {
>>     c[i] = (a[i] < b[i]) ? a[i] : b[i];
>> }
>> 
>> the patch negates the mask of the BoolNode before creating the
>> cmove vector node in SuperWord::output().
>> 
>> We can also use VectorNode::implemented() to consult if vector
>> conditional move is supported in the backend. So, the patch cleans
>> the related code in SuperWord::implemented().
>> 
>> With the patch, the performance uplift is:
>> (The micro-benchmark functions are included in the file
>> test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java)
>> 
>> AArch64:
>> Benchmark (length)  Mode  Cnt   uplift(ns/op)
>> cmoveD     523      avgt  15    68.89%
>> cmoveF     523      avgt  15    72.40%
>> 
>> X86:
>> Benchmark (length)  Mode  Cnt   uplift(ns/op)
>> cmoveD     523      avgt  15    73.12%
>> cmoveF     523      avgt  15    85.45%
>> 
>> [1]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/subnode.cpp#L1310
>> [2]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/matcher.cpp#L2365
>
> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:
> 
>  - Rebase the patch to the latest JDK and add some testcase for NE and EQ
>    
>    Change-Id: Ifb02b5efc2a09e6e0b4fc1c8346698597464f448
>  - Merge branch 'master' into fg8289422
>    
>    Change-Id: I09677cb07f6b2717aa768a830663ca455806b900
>  - Merge branch 'master' into fg8289422
>    
>    Change-Id: I870c7bbc73d12bac16756226125edc1a229ba412
>  - Enable the test only on aarch64 platform because X86 supports vector cmove only on some 256-bits AVXs
>    
>    Change-Id: I64dd49380fe3d303ef6be21460df3be31c1458f8
>  - Merge branch 'master' into fg8289422
>    
>    Change-Id: I7936552df6ac12949ed8b550576f4e3520596423
>  - 8289422: Fix and re-enable vector conditional move
>    
>    ```
>    // float[] a, float[] b, float[] c;
>    for (int i = 0; i < a.length; i++) {
>        c[i] = (a[i] > b[i]) ? a[i] : b[i];
>    }
>    ```
>    
>    After JDK-8139340 and JDK-8192846, we hope to vectorize the case
>    above by enabling -XX:+UseCMoveUnconditionally and -XX:+UseVectorCmov.
>    But the transformation here[1] is going to optimize the BoolNode
>    with constant input to a constant and break the design logic of
>    cmove vector node[2]. We can't prevent all GVN transformation to
>    the BoolNode before matcher, so the patch keeps the condition input
>    as a constant while creating a cmove vector node, and then
>    restructures it into a binary tree before matching.
>    
>    When the input order of original cmp node is different from the
>    input order of original cmove node, like:
>    ```
>    // float[] a, float[] b, float[] c;
>    for (int i = 0; i < a.length; i++) {
>        c[i] = (a[i] < b[i]) ? a[i] : b[i];
>    }
>    ```
>    the patch negates the mask of the BoolNode before creating the
>    cmove vector node in SuperWord::output().
>    
>    We can also use VectorNode::implemented() to consult if vector
>    conditional move is supported in the backend. So, the patch cleans
>    the related code in SuperWord::implemented().
>    
>    With the patch, the performance uplift is:
>    (The micro-benchmark functions are included in the file
>    test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java)
>    
>    AArch64:
>    Benchmark (length)  Mode  Cnt   uplift(ns/op)
>    cmoveD     523      avgt  15    68.89%
>    cmoveF     523      avgt  15    72.40%
>    
>    X86:
>    Benchmark (length)  Mode  Cnt   uplift(ns/op)
>    cmoveD     523      avgt  15    73.12%
>    cmoveF     523      avgt  15    85.45%
>    
>    [1]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/subnode.cpp#L1310
>    [2]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/matcher.cpp#L2365
>    
>    Change-Id: If046dd745024deb0e602bf7efc2a07c22b89c690

Thanks, I can see failures with the following tests when running with `-XX:+UseCMoveUnconditionally -XX:+UseVectorCmov`:
- `compiler/c2/TestCondAddDeadBranch.java`
- `compiler/loopopts/TestCastFFAtPhi.java`


Error mixing types: vectory[4]:{double_top} and double_top



# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (workspace/open/src/hotspot/share/opto/type.cpp:1179), pid=3589333, tid=3589359
#  Error: ShouldNotReachHere()
#
# JRE version: Java(TM) SE Runtime Environment (20.0) (fastdebug build 20-internal-2022-09-09-0957028.tobias.hartmann.jdk2)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 20-internal-2022-09-09-0957028.tobias.hartmann.jdk2, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x1a95869]  Type::typerr(Type const*) const+0x79

Current CompileTask:
C2:    130   10    b        TestCastFFAtPhi::init (35 bytes)

Stack: [0x00007ff917726000,0x00007ff917827000],  sp=0x00007ff917821540,  free space=1005k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x1a95869]  Type::typerr(Type const*) const+0x79  (type.cpp:1179)
V  [libjvm.so+0x1a97f2b]  TypeVect::xmeet(Type const*) const+0x1eb  (type.cpp:2451)
V  [libjvm.so+0x1a9d203]  Type::meet_helper(Type const*, bool) const+0x73  (type.cpp:879)
V  [libjvm.so+0x1a9d41a]  Type::filter_helper(Type const*, bool) const+0x1a  (type.hpp:188)
V  [libjvm.so+0x1793690]  PhaseIterGVN::transform_old(Node*)+0x230  (phaseX.cpp:1294)
V  [libjvm.so+0x178b30e]  PhaseIterGVN::optimize()+0x6e  (phaseX.cpp:1203)
V  [libjvm.so+0xafeefa]  PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x6da  (loopnode.hpp:1169)
V  [libjvm.so+0xafb253]  Compile::Optimize()+0xe53  (compile.cpp:2171)
V  [libjvm.so+0xafd50d]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x15ad  (compile.cpp:823)
V  [libjvm.so+0x90e2e5]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x675  (c2compiler.cpp:113)
V  [libjvm.so+0xb0ba5c]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0xb1c  (compileBroker.cpp:2243)
V  [libjvm.so+0xb0c828]  CompileBroker::compiler_thread_loop()+0x5a8  (compileBroker.cpp:1917)
V  [libjvm.so+0x106c1dc]  JavaThread::thread_main_inner()+0x22c  (javaThread.cpp:700)
V  [libjvm.so+0x1a6dd10]  Thread::call_run()+0x100  (thread.cpp:224)
V  [libjvm.so+0x1708f13]  thread_native_entry(Thread*)+0x103  (os_linux.cpp:710)



They also happen without this patch. Should we file a separate bug or are these supposed to be fixed by this change?

-------------

PR: https://git.openjdk.org/jdk/pull/9652


More information about the hotspot-compiler-dev mailing list