RFR: 8257483: C2: Split immediate vector rotate from RotateLeftV and RotateRightV nodes

Dong Bo dongbo at openjdk.java.net
Thu Dec 3 14:05:55 UTC 2020


On Thu, 3 Dec 2020 11:16:03 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> As far as I considered, `RotateLeftV` + a check in `VectorNode::is_vector_rotate_supported()` will degradate performance of rotating with variable shift.
>> 
>> Taking the following test for example.
>>   // Rotate with variable SHIFT, TESTSIZE = 1024
>>   public void testRotateRightI() {
>>     for (int i = 0; i < TESTSIZE; i++)
>>        ires[i] = Integer.rotateRight(iarr[i], SHIFT);
>>   }
>> 
>> On aarch64, we have scalar and vector instructions to implement the loop core of this test:
>> A. vector with NEON
>> sshl v17.4s, v16.4s, v19.4s              B. scalar
>> neg v18.16b, v20.16b                    ror    w13, w13, w10
>> ushl v16.4s, v16.4s, v18.4s
>> orr v16.16b, v17.16b, v16.16b
>> As of now, the default code generated by C2 is `A` under SuperWord framework.
>> 
>> Without `RotateLeftImmV`/`RotateRightImmV`, when we match `RotateLeftV`/`RotateRightV` only with constant shifts,
>> to avoid the `BAD AD file` crash, we have to add a check to return `false` in `VectorNode::is_vector_rotate_supported()`, like:
>> bool VectorNode::is_vector_rotate_supported(int vopc, uint vlen, BasicType bt, bool is_imm) {
>>   assert(vopc == Op_RotateLeftV || vopc == Op_RotateRightV, "wrong opcode");
>> 
>>   /* --- Matcher::match_rule_supported() independent checks  added --- */
>>   AARCH64_ONLY(if (!is_imm) return false); 
>>   AARCH64_ONLY(if (is_imm) return true); 
>>   ...
>>   // Validate existence of nodes created in case of rotate degeneration.
>>   switch (bt) {
>>     case T_INT:
>>       return Matcher::match_rule_supported_vector(Op_OrV,       vlen, bt) &&
>>              Matcher::match_rule_supported_vector(Op_LShiftVI,  vlen, bt) &&
>>              Matcher::match_rule_supported_vector(Op_URShiftVI, vlen, bt);
>>     case T_LONG:
>>       return Matcher::match_rule_supported_vector(Op_OrV,       vlen, bt) &&
>>              Matcher::match_rule_supported_vector(Op_LShiftVL,  vlen, bt) &&
>>              Matcher::match_rule_supported_vector(Op_URShiftVL, vlen, bt);
>>     default:
>>       assert(false, "not supported: %s", type2name(bt));
>>       return false;
>>   }
>> }
>> But the check also tells C2 that `RotateVRight` with vairable shifts can't be vectorized.
>> The code is rolled back to scalar version. I check  the asm code generated with modifications above (see [1]), it is `B`.
>> So we have a performance regression here, because `RotateVRight` can be vectorized via OrV+LShiftVI+URShiftVI (`A`).
>> 
>> [1] An implementation with `RotateLeftV` + a check in `VectorNode::is_vector_rotate_supported()` . Link: https://github.com/dgbo/jdk/commit/2b3b01dc41eb65dd59d8340d6ce5701421484a88
>
> Thanks for the clarifications. I think I understand now the issue you are facing. 
> 
> As of now, it works as follows:
> RotateLeft ==vectorizes==> RotateLeftV ==degenerates(?)==> OrV+LShiftV+URShiftV
> 
> `VectorNode::is_vector_rotate_supported()` checks that either `RotateLeftV` or `OrV`/`LShiftV`/`URShiftV` are supported.
> Then `RotateLeftV::Ideal` checks for `RotateLeftV` support and replaces the node with  `OrV`/`LShiftV`/`URShiftV` if it is absent.
> 
> Your fix proposes to introduce `RotateLeftImmV` and keep the checks in `VectorNode::is_vector_rotate_supported()` intact:
> RotateLeft ==vectorizes==> RotateLeftV    ==degenerates(?)==> OrV+LShiftV+URShiftV
>            ==vectorizes==> RotateLeftImmV ==degenerates(?)==> OrV+LShiftV+URShiftV
> 
> What I'm after is to keep existing nodes and adjust the checks in `RotateLeftV::Ideal` to take shift value into account and degrade to  `OrV+LShiftV+URShiftV` depending on whether the platform finds constant/variable shifts profitable.

Yes, it works just the same as you mentioned. Thanks a lot to make this so clear.
The degeneration to `OrV+LShiftV+URShiftV` is at `RotateLeftVNode::Ideal()`, then `VectorNode::degenerate_vector_rotate()`:
`RotateLeft ==vectorizes==> RotateLeftV    ==degenerates(RotateLeftVNode::Ideal, VectorNode::degenerate_vector_rotate())==> OrV+LShiftV+URShiftV`

However, the workflow relys on the creation of `RotateLeftV` node, and the creation of `RotateLeftV` with variables should be forbidden if one CPU match RotateLeftV/RotateRightV only with constant shifts.
Otherwise, `Matcher::Label_Root()` will check wether the AD files have implemented the matching rules for RotateLeftV with variables.
If there are no rules for `RotateLeftV` with variables, while we have rules for `RotateLeftV` with constants, we get a crash at https://github.com/openjdk/jdk/blob/fa58671f9f8a8230d907c0a6289685f8455bce1f/src/hotspot/share/opto/matcher.cpp#L1669.

IMHO, seems ajusting checks in `RotateLeftV::Ideal` does not deliver the solution, since there are no `RotateRightV` for variable shifts from the start.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1532


More information about the hotspot-compiler-dev mailing list