RFR: 8257483: C2: Split immediate vector rotate from RotateLeftV and RotateRightV nodes

Wed Dec 2 06:39:55 UTC 2020

On Tue, 1 Dec 2020 23:56:11 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> Why do you introduce new ideal nodes (`RotateLeftImmV`/`RotateRightImmV`)? You already extended `is_vector_rotate_supported` to take the shift argument into account.
> 

I think extended `is_vector_rotate_supported` can't offer what we are going after.

Aarch64 supports `OrV/LShiftV/URShiftV`, so `is_vector_rotate_supported`  returns true anyway with or without `Rotate*ImmV`.
If we do not introduce `RotateLeftImmV/RotateRightImmV` nodes, `RotateLeftV/RotateRightV`nodes will be created for both immediate and variable rotation.
As mentioned before, variable rotation matching rules should no exsit in `aarch64.ad`, due to the performance regression.
It leads to a crash at `src/hotspot/share/opto/matcher.cpp:1657` with message `assert(false) failed: bad AD file`.

If extended `is_vector_rotate_supported` return false anyhow for variable rotations like:
 bool VectorNode::is_vector_rotate_supported(int vopc, uint vlen, BasicType bt, bool is_imm) {
   assert(vopc == Op_RotateLeftV || vopc == Op_RotateRightV, "wrong opcode");

+  AARCH64_ONLY(if(!is_imm) return false);
+
   // Get the opcode for rotate immediate
   if (is_imm) {
     vopc = vopc == Op_RotateLeftV ? Op_RotateLeftImmV : Op_RotateRightImmV;
The variable roations would not be vectorized by `OrV/LShiftV/URShiftV` combinations.

With `RotateLeftImmV/RotateRightImmV` nodes, we can match them separately and optimized in a CPU and keep the variable rotation the same as before.
I'am sorry if I miss something.

> Also, you mention aarch64, but I see only x86 changes.
>

I tried to optimize immediate vector rotations with shift&insert in JDK-8256820 for aarch64.
WIth `RotateLeftImmV/RotateRightImmV`,  an initial optimization for aarch64 is shown in attachment [optimize_rotateImm_aarch64.txt](https://github.com/openjdk/jdk/files/5627342/optimize_rotateImm_aarch64.txt)
, tests are good.
But I also want to discuss some other modifications (for aarch64 only) in that issue, e.g. trivial fix for absolute and accumulate with small vector length, move shifting operations to aarch64_neon.ad, etc.
To make this progress clean and clear, and I think this splitting immediate roatation can be discussed separately, so I raised up this new PR. Hope this would not be a bother.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1532