RFR: 8292289: [vectorapi] Improve the implementation of VectorTestNode [v5]
Quan Anh Mai
duke at openjdk.org
Thu Aug 18 08:13:15 UTC 2022
> This patch modifies the node generation of `VectorSupport::test` to emit a `CMoveINode`, which is picked up by `BoolNode::Ideal(PhaseGVN*, bool)` to connect the `VectorTestNode` directly to the `BoolNode`, removing the redundant operations of materialising the test result in a GP register and do a `CmpI` to get back the flags. As a result, `VectorMask<T>::alltrue` is compiled into machine codes:
>
> vptest xmm0, xmm1
> jb if_true
> if_false:
>
> instead of:
>
> vptest xmm0, xmm1
> setb r10
> movzbl r10
> testl r10
> jne if_true
> if_false:
>
> The results of `jdk.incubator.vector.ArrayMismatchBenchmark` shows noticeable improvements:
>
> Before After
> Benchmark Prefix Size Mode Cnt Score Error Score Error Units Change
> ArrayMismatchBenchmark.mismatchVectorByte 0.5 9 thrpt 10 217345.383 ± 8316.444 222279.381 ± 2660.983 ops/ms +2.3%
> ArrayMismatchBenchmark.mismatchVectorByte 0.5 257 thrpt 10 113918.406 ± 1618.836 116268.691 ± 1291.899 ops/ms +2.1%
> ArrayMismatchBenchmark.mismatchVectorByte 0.5 100000 thrpt 10 702.066 ± 72.862 797.806 ± 16.429 ops/ms +13.6%
> ArrayMismatchBenchmark.mismatchVectorByte 1.0 9 thrpt 10 146096.564 ± 2401.258 145338.910 ± 687.453 ops/ms -0.5%
> ArrayMismatchBenchmark.mismatchVectorByte 1.0 257 thrpt 10 60598.181 ± 1259.397 69041.519 ± 1073.156 ops/ms +13.9%
> ArrayMismatchBenchmark.mismatchVectorByte 1.0 100000 thrpt 10 316.814 ± 10.975 408.770 ± 5.281 ops/ms +29.0%
> ArrayMismatchBenchmark.mismatchVectorDouble 0.5 9 thrpt 10 195674.549 ± 1200.166 188482.433 ± 1872.076 ops/ms -3.7%
> ArrayMismatchBenchmark.mismatchVectorDouble 0.5 257 thrpt 10 44357.169 ± 473.013 42293.411 ± 2838.255 ops/ms -4.7%
> ArrayMismatchBenchmark.mismatchVectorDouble 0.5 100000 thrpt 10 68.199 ± 5.410 67.628 ± 3.241 ops/ms -0.8%
> ArrayMismatchBenchmark.mismatchVectorDouble 1.0 9 thrpt 10 107722.450 ± 1677.607 111060.400 ± 982.230 ops/ms +3.1%
> ArrayMismatchBenchmark.mismatchVectorDouble 1.0 257 thrpt 10 16692.645 ± 1002.599 21440.506 ± 1618.266 ops/ms +28.4%
> ArrayMismatchBenchmark.mismatchVectorDouble 1.0 100000 thrpt 10 32.984 ± 0.548 33.202 ± 2.365 ops/ms +0.7%
> ArrayMismatchBenchmark.mismatchVectorInt 0.5 9 thrpt 10 335458.217 ± 3154.842 379944.254 ± 5703.134 ops/ms +13.3%
> ArrayMismatchBenchmark.mismatchVectorInt 0.5 257 thrpt 10 58505.302 ± 786.312 56721.368 ± 2497.052 ops/ms -3.0%
> ArrayMismatchBenchmark.mismatchVectorInt 0.5 100000 thrpt 10 133.037 ± 11.415 139.537 ± 4.667 ops/ms +4.9%
> ArrayMismatchBenchmark.mismatchVectorInt 1.0 9 thrpt 10 117943.802 ± 2281.349 112409.365 ± 2110.055 ops/ms -4.7%
> ArrayMismatchBenchmark.mismatchVectorInt 1.0 257 thrpt 10 27060.015 ± 795.619 33756.613 ± 826.533 ops/ms +24.7%
> ArrayMismatchBenchmark.mismatchVectorInt 1.0 100000 thrpt 10 57.558 ± 8.927 66.951 ± 4.381 ops/ms +16.3%
> ArrayMismatchBenchmark.mismatchVectorLong 0.5 9 thrpt 10 182963.715 ± 1042.497 182438.405 ± 2120.832 ops/ms -0.3%
> ArrayMismatchBenchmark.mismatchVectorLong 0.5 257 thrpt 10 36672.215 ± 614.821 35397.398 ± 1609.235 ops/ms -3.5%
> ArrayMismatchBenchmark.mismatchVectorLong 0.5 100000 thrpt 10 66.438 ± 2.142 65.427 ± 2.270 ops/ms -1.5%
> ArrayMismatchBenchmark.mismatchVectorLong 1.0 9 thrpt 10 110393.047 ± 497.853 115165.845 ± 5381.674 ops/ms +4.3%
> ArrayMismatchBenchmark.mismatchVectorLong 1.0 257 thrpt 10 14720.765 ± 661.350 19871.096 ± 201.464 ops/ms +35.0%
> ArrayMismatchBenchmark.mismatchVectorLong 1.0 100000 thrpt 10 30.760 ± 0.821 31.933 ± 1.352 ops/ms +3.8%
>
> I have not been able to conduct throughout testing on AVX512 and Aarch64 so any help would be invaluable. Thank you very much.
Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
revert renaming temp
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/9855/files
- new: https://git.openjdk.org/jdk/pull/9855/files/14b71c00..5444c51d
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=9855&range=04
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=9855&range=03-04
Stats: 20 lines in 2 files changed: 0 ins; 0 del; 20 mod
Patch: https://git.openjdk.org/jdk/pull/9855.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/9855/head:pull/9855
PR: https://git.openjdk.org/jdk/pull/9855
More information about the hotspot-compiler-dev
mailing list