RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2]

Emanuel Peter epeter at openjdk.org
Tue Jul 4 12:01:20 UTC 2023


On Mon, 3 Jul 2023 09:33:18 GMT, Pengfei Li <pli at openjdk.org> wrote:

>> src/hotspot/share/opto/vmaskloop.cpp line 363:
>> 
>>> 361:           // Otherwise, use signed subword type or the statement's bottom type
>>> 362:           if (subword_stmt) {
>>> 363:             set_elem_bt(node, get_signed_subword_bt(stmt_bottom_type));
>> 
>> Why are you taking only the signed subword type, and not unsigned (eg for char you take short)?
>
> Current SuperWord also does in this way (see `SuperWord::container_type()`). A main reason is that some matching rules on some backends (like x86) only matches signed subword type. AFAICR, it's good to removing this for AArch64.

Ok. This sounds like we should probably refactor the backend accordingly. That would simplify things for loop vectorizer / SuperWord.

>> src/hotspot/share/opto/vmaskloop.cpp line 548:
>> 
>>> 546:   // Check supported memory access via SWPointer. It's not supported if
>>> 547:   //  1) The constructed SWPointer is invalid
>>> 548:   //  2) Address is growing down (index scale * loop stride < 0)
>> 
>> Is that a limitation that could be removed in the future?
>
> Yes, at least on SVE2. For growing up memory accesses, we generate vector masks that indicate active lanes at lower parts of a vector. But it's opposite for growing down memory accesses where active lanes are at higher parts of a vector. Only SVE2 of AArch64 can generate vector masks in this way, current SVE(1) can not. I'm not sure whether x86 AVX-512 has the similar ability.

There must surely be some way. The only question is what is the cheapest way to do it, ie with the fewest number of instructions.

>> src/hotspot/share/opto/vmaskloop.cpp line 549:
>> 
>>> 547:   //  1) The constructed SWPointer is invalid
>>> 548:   //  2) Address is growing down (index scale * loop stride < 0)
>>> 549:   //  3) Memory access scale is different from data size
>> 
>> I guess this could also be relaxed for strided accesses in the future?
>
> Exactly! I have tried supporting some basic strided accesses. The code is not included in this patch as it's not that beneficial on some CPUs and requires more C2 refactorings.

Great, you should probably leave that to a future RFE anyway.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251931841
PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251929419
PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251930025


More information about the hotspot-compiler-dev mailing list