RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3]

Quan Anh Mai qamai at openjdk.org
Mon Jan 8 10:23:22 UTC 2024


On Mon, 8 Jan 2024 06:06:22 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Thanks for the updates!
>> 
>> One more idea: Your AVX2 solution has a lot of cost for converting the mask to a permutation. Might it make sense to split this off into a separate vector-node, so that it can float out of a loop if the mask is invariant?
>
>> Thanks for the updates!
>> 
>> One more idea: Your AVX2 solution has a lot of cost for converting the mask to a permutation. Might it make sense to split this off into a separate vector-node, so that it can float out of a loop if the mask is invariant?
> 
> CompressV / ExpandV only accepts two inputs, vector to be operated on and mask under which operation is performed, permute table based implementation is specific to x86 backend implementation.

@jatin-bhateja I think you can expand them in the matcher into several `MachNode`s that will get scheduled separately.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17261#issuecomment-1880724248


More information about the hotspot-compiler-dev mailing list