RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2]
Pengfei Li
pli at openjdk.org
Tue Jul 4 02:40:15 UTC 2023
On Wed, 28 Jun 2023 10:41:12 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Address part of comments from Emanuel
>
> src/hotspot/share/opto/vmaskloop.cpp line 89:
>
>> 87: cl->mark_loop_vectorized();
>> 88: cl->mark_vector_masked();
>> 89: _phase->C->set_max_vector_size(MaxVectorSize);
>
> What is this for?
On AArch64 with SVE, we pre-initialize an all-true register (p7) only in compiled methods where `MaxVectorSize` is set. So if a compiled method implicitly uses ptrue (because it's vectorized), we need to set the `MaxVectorSize` to guarantee p7 is initialized. This usage starts since the initial SVE patch (in 2020). We know this is not a perfect solution and @fg1417 is currently investigating if we have better solutions.
> src/hotspot/share/opto/vmaskloop.cpp line 531:
>
>> 529: if (!addp->is_AddP() || !operates_on_array_of_type(addp, mem_type)) {
>> 530: return nullptr;
>> 531: }
>
> I guess this prevents you from having `Unsafe` use type mismatched loads/stores. But it also prevents vectorization in cases where one might just store shorts into an int array using `Unsafe`. This saves you a lot of headaches. You probably don't lose too much for not vectorizing those cases.
Exactly. Is there any case in real applications that may store shorts into an int array?
> src/hotspot/share/opto/vmaskloop.cpp line 642:
>
>> 640:
>> 641: // Helper method for finding or creating a vector input at specified index
>> 642: Node* VectorMaskedLoop::get_vector_input(Node* node, uint idx) {
>
> This is analogous to `SuperWord::vector_opd`. Can we not refactor things so that we can share the code?
I like refactoring, but it may require big effort. Shall we discuss it later in our future conversations?
> src/hotspot/share/opto/vmaskloop.cpp line 939:
>
>> 937: Node* root_vmask = vmask_tree->at(1);
>> 938:
>> 939: // Replace vectorization candidate nodes to vector nodes
>
> Expand explanation. Say that you are for now only generating a single vector node per scalar node. And that the duplication afterwards makes sure that all scalar nodes are "widened" to the same number of elements. The smalles type using a single vector, larger types using multiple (duplicated) vectors per scalar node.
Thanks for suggestion. Done in commit 2.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251410392
PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251410844
PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251411540
PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251411689
More information about the hotspot-compiler-dev
mailing list