RFR: 8292289: [vectorapi] Improve the implementation of VectorTestNode [v8]

Quan Anh Mai duke at openjdk.org
Wed Sep 21 07:40:46 UTC 2022


> This patch modifies the node generation of `VectorSupport::test` to emit a `CMoveINode`, which is picked up by `BoolNode::Ideal(PhaseGVN*, bool)` to connect the `VectorTestNode` directly to the `BoolNode`, removing the redundant operations of materialising the test result in a GP register and do a `CmpI` to get back the flags. As a result, `VectorMask<T>::alltrue` is compiled into machine codes:
> 
>         vptest xmm0, xmm1
>         jb if_true
>     if_false:
> 
> instead of:
> 
>         vptest xmm0, xmm1
>         setb r10
>         movzbl r10
>         testl r10
>         jne if_true
>     if_false:
> 
> The results of `jdk.incubator.vector.ArrayMismatchBenchmark` shows noticeable improvements:
> 
>                                                                                        Before                  After
>     Benchmark                                      Prefix    Size   Mode  Cnt       Score      Error       Score     Error    Units  Change
>     ArrayMismatchBenchmark.mismatchVectorByte         0.5       9  thrpt   10  217345.383 ± 8316.444  222279.381 ± 2660.983  ops/ms   +2.3%
>     ArrayMismatchBenchmark.mismatchVectorByte         0.5     257  thrpt   10  113918.406 ± 1618.836  116268.691 ± 1291.899  ops/ms   +2.1%
>     ArrayMismatchBenchmark.mismatchVectorByte         0.5  100000  thrpt   10     702.066 ±   72.862     797.806 ±   16.429  ops/ms  +13.6%
>     ArrayMismatchBenchmark.mismatchVectorByte         1.0       9  thrpt   10  146096.564 ± 2401.258  145338.910 ±  687.453  ops/ms   -0.5%
>     ArrayMismatchBenchmark.mismatchVectorByte         1.0     257  thrpt   10   60598.181 ± 1259.397   69041.519 ± 1073.156  ops/ms  +13.9%
>     ArrayMismatchBenchmark.mismatchVectorByte         1.0  100000  thrpt   10     316.814 ±   10.975     408.770 ±    5.281  ops/ms  +29.0%
>     ArrayMismatchBenchmark.mismatchVectorDouble       0.5       9  thrpt   10  195674.549 ± 1200.166  188482.433 ± 1872.076  ops/ms   -3.7%
>     ArrayMismatchBenchmark.mismatchVectorDouble       0.5     257  thrpt   10   44357.169 ±  473.013   42293.411 ± 2838.255  ops/ms   -4.7%
>     ArrayMismatchBenchmark.mismatchVectorDouble       0.5  100000  thrpt   10      68.199 ±    5.410      67.628 ±    3.241  ops/ms   -0.8%
>     ArrayMismatchBenchmark.mismatchVectorDouble       1.0       9  thrpt   10  107722.450 ± 1677.607  111060.400 ±  982.230  ops/ms   +3.1%
>     ArrayMismatchBenchmark.mismatchVectorDouble       1.0     257  thrpt   10   16692.645 ± 1002.599   21440.506 ± 1618.266  ops/ms  +28.4%
>     ArrayMismatchBenchmark.mismatchVectorDouble       1.0  100000  thrpt   10      32.984 ±    0.548      33.202 ±    2.365  ops/ms   +0.7%
>     ArrayMismatchBenchmark.mismatchVectorInt          0.5       9  thrpt   10  335458.217 ± 3154.842  379944.254 ± 5703.134  ops/ms  +13.3%
>     ArrayMismatchBenchmark.mismatchVectorInt          0.5     257  thrpt   10   58505.302 ±  786.312   56721.368 ± 2497.052  ops/ms   -3.0%
>     ArrayMismatchBenchmark.mismatchVectorInt          0.5  100000  thrpt   10     133.037 ±   11.415     139.537 ±    4.667  ops/ms   +4.9%
>     ArrayMismatchBenchmark.mismatchVectorInt          1.0       9  thrpt   10  117943.802 ± 2281.349  112409.365 ± 2110.055  ops/ms   -4.7%
>     ArrayMismatchBenchmark.mismatchVectorInt          1.0     257  thrpt   10   27060.015 ±  795.619   33756.613 ±  826.533  ops/ms  +24.7%
>     ArrayMismatchBenchmark.mismatchVectorInt          1.0  100000  thrpt   10      57.558 ±    8.927      66.951 ±    4.381  ops/ms  +16.3%
>     ArrayMismatchBenchmark.mismatchVectorLong         0.5       9  thrpt   10  182963.715 ± 1042.497  182438.405 ± 2120.832  ops/ms   -0.3%
>     ArrayMismatchBenchmark.mismatchVectorLong         0.5     257  thrpt   10   36672.215 ±  614.821   35397.398 ± 1609.235  ops/ms   -3.5%
>     ArrayMismatchBenchmark.mismatchVectorLong         0.5  100000  thrpt   10      66.438 ±    2.142      65.427 ±    2.270  ops/ms   -1.5%
>     ArrayMismatchBenchmark.mismatchVectorLong         1.0       9  thrpt   10  110393.047 ±  497.853  115165.845 ± 5381.674  ops/ms   +4.3%
>     ArrayMismatchBenchmark.mismatchVectorLong         1.0     257  thrpt   10   14720.765 ±  661.350   19871.096 ±  201.464  ops/ms  +35.0%
>     ArrayMismatchBenchmark.mismatchVectorLong         1.0  100000  thrpt   10      30.760 ±    0.821      31.933 ±    1.352  ops/ms   +3.8%
> 
> I have not been able to conduct throughout testing on AVX512 and Aarch64 so any help would be invaluable. Thank you very much.

Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits:

 - Merge branch 'master' into improveVTest
 - fix merge problems
 - Merge branch 'master' into improveVTest
 - refactor x86
 - revert renaming temp
 - style + use rscratch instead of tmp
 - fix
 - redo aarch
 - Merge branch 'master' into improveVTest
 - delete aarch64 vector files
 - ... and 14 more: https://git.openjdk.org/jdk/compare/1b496064...0894474a

-------------

Changes: https://git.openjdk.org/jdk/pull/9855/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9855&range=07
  Stats: 483 lines in 23 files changed: 214 ins; 145 del; 124 mod
  Patch: https://git.openjdk.org/jdk/pull/9855.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/9855/head:pull/9855

PR: https://git.openjdk.org/jdk/pull/9855


More information about the hotspot-dev mailing list