RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2]

Tue Jul 4 02:29:09 UTC 2023

On Tue, 27 Jun 2023 17:47:33 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/vmaskloop.cpp line 785:
>> 
>>> 783: }
>>> 784: 
>>> 785: // Duplicate vectorized operations with given vector element size
>> 
>> Got to here today. There should probably be some comment higher up that you first replace scalars with one vector each, and then duplicate them for the larger types that need multiple vectors.
>> 
>> I'm also concerned that there may be some platforms where the max vector width in bytes is not the same for all types. But maybe all platforms that support masked register ops also all have the same vector width in bytes for all types?
>
> Assume we only allow `32` bit registers for `int`, but `64` bits for doubles. Now you'd be assuming that there need to be double as many `double` vectors as `int` vectors. But actually, they need the same amount of vectors, because vectors of both sizes fit exactly `8` elements.

More comments are added. I can only say this way is good on current AArch64. As we don't have enough knowledge of other architectures, we may need some help if we need to change this.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251405212