Withdrawn: 8256820: AArch64: Optimize vector rotate (immediate) with shift and insert instructions

Dong Bo dongbo at openjdk.java.net
Wed Dec 16 02:02:56 UTC 2020


On Mon, 14 Dec 2020 05:57:36 GMT, Dong Bo <dongbo at openjdk.org> wrote:

> This patch optimizes vectorial rotate (immediate) on aarch64 with shift and insert instructions, i.e. SLI and SRI.
> 
> Patch passed jtreg tier1-3 tests with linux-aarch64-server-fastdebug build.
> Tests under `test/hotspot/jtreg/compiler/c2/cr6340864/` runned specially for the correctness and passed.
> 
> The JMH micro `test/micro/org/openjdk/bench/java/lang/RotateBenchmark.java` is used for performance test.
> Witnessed ~15.4% performance improvements on Kunpeng920 (CPU tsv110), but ~15.8% regression on Kunpeng916 (CPU A72).
> So a switch `UseSIMDShiftInsertForRotation` is introduced on aarch64 with default value `false`, and set `true` for Kunpeng920.
> 
> The `RotateBenchmark.java` JMH micro-benchmark results on Kunpeng920:
> Benchmark                            (SHIFT)  (TESTSIZE)   Mode  Cnt     Score    Error   Units
> 
> # kunpeng 920, -XX:-UseSIMDShiftInsertForRotation
> RotateBenchmark.testRotateLeftI           20        1024  thrpt   10  3524.840 ±  2.365  ops/ms
> RotateBenchmark.testRotateLeftIImm        20        1024  thrpt   10  3961.288 ±  0.897  ops/ms
> RotateBenchmark.testRotateLeftL           20        1024  thrpt   10  1704.321 ± 11.309  ops/ms
> RotateBenchmark.testRotateLeftLImm        20        1024  thrpt   10  2137.924 ±  2.215  ops/ms
> RotateBenchmark.testRotateRightI          20        1024  thrpt   10  3536.960 ±  7.945  ops/ms
> RotateBenchmark.testRotateRightIImm       20        1024  thrpt   10  3961.552 ±  0.673  ops/ms
> RotateBenchmark.testRotateRightL          20        1024  thrpt   10  1729.868 ±  0.468  ops/ms
> RotateBenchmark.testRotateRightLImm       20        1024  thrpt   10  2132.458 ±  3.385  ops/ms
> 
> # kunpeng 920, default, -XX:+UseSIMDShiftInsertForRotation
> RotateBenchmark.testRotateLeftI           20        1024  thrpt   10  3504.602 ± 21.609  ops/ms
> RotateBenchmark.testRotateLeftIImm        20        1024  thrpt   10  4569.820 ±  7.455  ops/ms
> RotateBenchmark.testRotateLeftL           20        1024  thrpt   10  1730.735 ±  0.701  ops/ms
> RotateBenchmark.testRotateLeftLImm        20        1024  thrpt   10  2469.796 ±  0.981  ops/ms
> RotateBenchmark.testRotateRightI          20        1024  thrpt   10  3540.899 ±  7.679  ops/ms
> RotateBenchmark.testRotateRightIImm       20        1024  thrpt   10  4571.876 ±  0.879  ops/ms
> RotateBenchmark.testRotateRightL          20        1024  thrpt   10  1731.499 ±  0.877  ops/ms
> RotateBenchmark.testRotateRightLImm       20        1024  thrpt   10  2469.454 ±  0.705  ops/ms
> 
> This also moves all logical and shifting NEON instructions from `aarch64.ad` to `aarch64_neon.ad`,
> and has two minor improvements of supporting vector length 4 for `vsraa8B_imm` and `vsrla8B_imm`, vector length 2 for `vsraa4S_imm` and `vsrla4S_imm`.

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1761


More information about the hotspot-dev mailing list