RFR: 8257483: C2: Split immediate vector rotate from RotateLeftV and RotateRightV nodes
Dong Bo
dongbo at openjdk.java.net
Thu Dec 3 14:05:55 UTC 2020
On Thu, 3 Dec 2020 11:16:03 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:
>> As far as I considered, `RotateLeftV` + a check in `VectorNode::is_vector_rotate_supported()` will degradate performance of rotating with variable shift.
>>
>> Taking the following test for example.
>> // Rotate with variable SHIFT, TESTSIZE = 1024
>> public void testRotateRightI() {
>> for (int i = 0; i < TESTSIZE; i++)
>> ires[i] = Integer.rotateRight(iarr[i], SHIFT);
>> }
>>
>> On aarch64, we have scalar and vector instructions to implement the loop core of this test:
>> A. vector with NEON
>> sshl v17.4s, v16.4s, v19.4s B. scalar
>> neg v18.16b, v20.16b ror w13, w13, w10
>> ushl v16.4s, v16.4s, v18.4s
>> orr v16.16b, v17.16b, v16.16b
>> As of now, the default code generated by C2 is `A` under SuperWord framework.
>>
>> Without `RotateLeftImmV`/`RotateRightImmV`, when we match `RotateLeftV`/`RotateRightV` only with constant shifts,
>> to avoid the `BAD AD file` crash, we have to add a check to return `false` in `VectorNode::is_vector_rotate_supported()`, like:
>> bool VectorNode::is_vector_rotate_supported(int vopc, uint vlen, BasicType bt, bool is_imm) {
>> assert(vopc == Op_RotateLeftV || vopc == Op_RotateRightV, "wrong opcode");
>>
>> /* --- Matcher::match_rule_supported() independent checks added --- */
>> AARCH64_ONLY(if (!is_imm) return false);
>> AARCH64_ONLY(if (is_imm) return true);
>> ...
>> // Validate existence of nodes created in case of rotate degeneration.
>> switch (bt) {
>> case T_INT:
>> return Matcher::match_rule_supported_vector(Op_OrV, vlen, bt) &&
>> Matcher::match_rule_supported_vector(Op_LShiftVI, vlen, bt) &&
>> Matcher::match_rule_supported_vector(Op_URShiftVI, vlen, bt);
>> case T_LONG:
>> return Matcher::match_rule_supported_vector(Op_OrV, vlen, bt) &&
>> Matcher::match_rule_supported_vector(Op_LShiftVL, vlen, bt) &&
>> Matcher::match_rule_supported_vector(Op_URShiftVL, vlen, bt);
>> default:
>> assert(false, "not supported: %s", type2name(bt));
>> return false;
>> }
>> }
>> But the check also tells C2 that `RotateVRight` with vairable shifts can't be vectorized.
>> The code is rolled back to scalar version. I check the asm code generated with modifications above (see [1]), it is `B`.
>> So we have a performance regression here, because `RotateVRight` can be vectorized via OrV+LShiftVI+URShiftVI (`A`).
>>
>> [1] An implementation with `RotateLeftV` + a check in `VectorNode::is_vector_rotate_supported()` . Link: https://github.com/dgbo/jdk/commit/2b3b01dc41eb65dd59d8340d6ce5701421484a88
>
> Thanks for the clarifications. I think I understand now the issue you are facing.
>
> As of now, it works as follows:
> RotateLeft ==vectorizes==> RotateLeftV ==degenerates(?)==> OrV+LShiftV+URShiftV
>
> `VectorNode::is_vector_rotate_supported()` checks that either `RotateLeftV` or `OrV`/`LShiftV`/`URShiftV` are supported.
> Then `RotateLeftV::Ideal` checks for `RotateLeftV` support and replaces the node with `OrV`/`LShiftV`/`URShiftV` if it is absent.
>
> Your fix proposes to introduce `RotateLeftImmV` and keep the checks in `VectorNode::is_vector_rotate_supported()` intact:
> RotateLeft ==vectorizes==> RotateLeftV ==degenerates(?)==> OrV+LShiftV+URShiftV
> ==vectorizes==> RotateLeftImmV ==degenerates(?)==> OrV+LShiftV+URShiftV
>
> What I'm after is to keep existing nodes and adjust the checks in `RotateLeftV::Ideal` to take shift value into account and degrade to `OrV+LShiftV+URShiftV` depending on whether the platform finds constant/variable shifts profitable.
Yes, it works just the same as you mentioned. Thanks a lot to make this so clear.
The degeneration to `OrV+LShiftV+URShiftV` is at `RotateLeftVNode::Ideal()`, then `VectorNode::degenerate_vector_rotate()`:
`RotateLeft ==vectorizes==> RotateLeftV ==degenerates(RotateLeftVNode::Ideal, VectorNode::degenerate_vector_rotate())==> OrV+LShiftV+URShiftV`
However, the workflow relys on the creation of `RotateLeftV` node, and the creation of `RotateLeftV` with variables should be forbidden if one CPU match RotateLeftV/RotateRightV only with constant shifts.
Otherwise, `Matcher::Label_Root()` will check wether the AD files have implemented the matching rules for RotateLeftV with variables.
If there are no rules for `RotateLeftV` with variables, while we have rules for `RotateLeftV` with constants, we get a crash at https://github.com/openjdk/jdk/blob/fa58671f9f8a8230d907c0a6289685f8455bce1f/src/hotspot/share/opto/matcher.cpp#L1669.
IMHO, seems ajusting checks in `RotateLeftV::Ideal` does not deliver the solution, since there are no `RotateRightV` for variable shifts from the start.
-------------
PR: https://git.openjdk.java.net/jdk/pull/1532
More information about the hotspot-compiler-dev
mailing list