RFR: 8257483: C2: Split immediate vector rotate from RotateLeftV and RotateRightV nodes

Vladimir Ivanov vlivanov at openjdk.java.net
Thu Dec 3 11:18:56 UTC 2020


On Thu, 3 Dec 2020 03:15:09 GMT, Dong Bo <dongbo at openjdk.org> wrote:

>> It looks like you implicitly depend on the check that there are rules present which match that particular opcode:
>> const bool Matcher::match_rule_supported(int opcode) {
>>   if (!has_match_rule(opcode)) {
>>     return false; // no match rule present
>>   }
>> 
>> But you don't have to and I don't see why additional checks in `VectorNode::is_vector_rotate_supported()` can't be extended to cover that for you.
>
> As far as I considered, `RotateLeftV` + a check in `VectorNode::is_vector_rotate_supported()` will degradate performance of rotating with variable shift.
> 
> Taking the following test for example.
>   // Rotate with variable SHIFT, TESTSIZE = 1024
>   public void testRotateRightI() {
>     for (int i = 0; i < TESTSIZE; i++)
>        ires[i] = Integer.rotateRight(iarr[i], SHIFT);
>   }
> 
> On aarch64, we have scalar and vector instructions to implement the loop core of this test:
> A. vector with NEON
> sshl v17.4s, v16.4s, v19.4s              B. scalar
> neg v18.16b, v20.16b                    ror    w13, w13, w10
> ushl v16.4s, v16.4s, v18.4s
> orr v16.16b, v17.16b, v16.16b
> As of now, the default code generated by C2 is `A` under SuperWord framework.
> 
> Without `RotateLeftImmV`/`RotateRightImmV`, when we match `RotateLeftV`/`RotateRightV` only with constant shifts,
> to avoid the `BAD AD file` crash, we have to add a check to return `false` in `VectorNode::is_vector_rotate_supported()`, like:
> bool VectorNode::is_vector_rotate_supported(int vopc, uint vlen, BasicType bt, bool is_imm) {
>   assert(vopc == Op_RotateLeftV || vopc == Op_RotateRightV, "wrong opcode");
> 
>   /* --- Matcher::match_rule_supported() independent checks  added --- */
>   AARCH64_ONLY(if (!is_imm) return false); 
>   AARCH64_ONLY(if (is_imm) return true); 
>   ...
>   // Validate existence of nodes created in case of rotate degeneration.
>   switch (bt) {
>     case T_INT:
>       return Matcher::match_rule_supported_vector(Op_OrV,       vlen, bt) &&
>              Matcher::match_rule_supported_vector(Op_LShiftVI,  vlen, bt) &&
>              Matcher::match_rule_supported_vector(Op_URShiftVI, vlen, bt);
>     case T_LONG:
>       return Matcher::match_rule_supported_vector(Op_OrV,       vlen, bt) &&
>              Matcher::match_rule_supported_vector(Op_LShiftVL,  vlen, bt) &&
>              Matcher::match_rule_supported_vector(Op_URShiftVL, vlen, bt);
>     default:
>       assert(false, "not supported: %s", type2name(bt));
>       return false;
>   }
> }
> But the check also tells C2 that `RotateVRight` with vairable shifts can't be vectorized.
> The code is rolled back to scalar version. I check  the asm code generated with modifications above (see [1]), it is `B`.
> So we have a performance regression here, because `RotateVRight` can be vectorized via OrV+LShiftVI+URShiftVI (`A`).
> 
> [1] An implementation with `RotateLeftV` + a check in `VectorNode::is_vector_rotate_supported()` . Link: https://github.com/dgbo/jdk/commit/2b3b01dc41eb65dd59d8340d6ce5701421484a88

Thanks for the clarifications. I think I understand now the issue you are facing. 

As of now, it works as follows:
RotateLeft ==vectorizes==> RotateLeftV ==degenerates(?)==> OrV+LShiftV+URShiftV

`VectorNode::is_vector_rotate_supported()` checks that either `RotateLeftV` or `OrV`/`LShiftV`/`URShiftV` are supported.
Then `RotateLeftV::Ideal` checks for `RotateLeftV` support and replaces the node with  `OrV`/`LShiftV`/`URShiftV` if it is absent.

Your fix proposes to introduce `RotateLeftImmV` and keep the checks in `VectorNode::is_vector_rotate_supported()` intact:
RotateLeft ==vectorizes==> RotateLeftV    ==degenerates(?)==> OrV+LShiftV+URShiftV
           ==vectorizes==> RotateLeftImmV ==degenerates(?)==> OrV+LShiftV+URShiftV

What I'm after is to keep existing nodes and adjust the checks in `RotateLeftV::Ideal` to take shift value into account and degrade to  `OrV+LShiftV+URShiftV` depending on whether the platform finds constant/variable shifts profitable.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1532


More information about the hotspot-compiler-dev mailing list