RFR: 8257483: C2: Split immediate vector rotate from RotateLeftV and RotateRightV nodes [v4]
Vladimir Ivanov
vlivanov at openjdk.java.net
Tue Dec 8 22:02:37 UTC 2020
On Mon, 7 Dec 2020 07:18:24 GMT, Dong Bo <dongbo at openjdk.org> wrote:
>> Currently, for all CPUs, if optimizing RotateLeftV and RotateRightV with match rules in AD files, we have to implement both immediate and variable versions.
>>
>> On aarch64, with match rules for vector rotation, immediate vector rotatation can be optimized with shift+insert instructions (i.e. SLI/SRI, ~20% improvements with an initial implementation).
>> However there woule be performance regression for variable version, due to SLI/SRI have no register version in NEON intruction set and there is no register version for right shift neither.
>> The instructions for match rules of vector rotate variable should be:
>> mov w9, 32
>> dup v13.4s, w9
>> sub v20.4s, v13.4S, v19.4s
>> # For a loop, default version only has code below,
>> # the code above (loop invariants) are put outside of a loop.
>> sshl v17.4s, v16.4s, v19.4s
>> neg v18.16b, v20.16b # on aarch64, vector right shift is implemented as left shift by negative shift count
>> ushl v16.4s, v16.4s, v18.4s
>> orr v16.16b, v17.16b, v16.16b
>>
>> With this patch, immediate vector rotation can be matched alone and optimized on CPUs like aarch64.
>>
>> Verified with linux-x86_64-server-fastdebug tier1-3 and passed.
>> Also added immediate vector rotation tests to micro `test/micro/org/openjdk/bench/java/lang/RotateBenchmark.java`.
>> Tested the micro on a x86_64/aarch64 server and witnessed no regressions.
>
> Dong Bo has updated the pull request incrementally with one additional commit since the last revision:
>
> fix trailing whitespace
src/hotspot/share/opto/vectornode.cpp line 1186:
> 1184: BasicType bt = vect_type()->element_basic_type();
> 1185: int extinfo = encode_rotate_vector_shift_type(in(2));
> 1186: if (!Matcher::match_rule_supported_vector(Op_RotateLeftV, vlen, bt, extinfo)) {
I don't see much value in encoding and packing node-specific info in an int. Why not simply introduce a new platform-specific entry and pass a `Node*` instead letting platform-specific code to extract any useful information from it?
As an alternative, introduce a new capability (`Matcher::supports_vector_variable_rotates`) and check it in shared code (`RotateLeftVNode::Ideal`).
-------------
PR: https://git.openjdk.java.net/jdk/pull/1532
More information about the hotspot-compiler-dev
mailing list