RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v9]

Tue Dec 10 08:26:51 UTC 2024

On Tue, 10 Dec 2024 07:44:57 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Change wording on VectorLoadShuffleNode
>>   
>>   Co-authored-by: Jatin Bhateja <jatin.bhateja at intel.com>
>
> src/hotspot/share/opto/library_call.hpp line 358:
> 
>> 356:   bool inline_vector_shuffle_to_vector();
>> 357:   bool inline_vector_wrap_shuffle_indexes();
>> 358:   bool inline_vector_shuffle_iota();
> 
> FTR, x86 ISA does not support a direct byte multiplier instruction, so we first unpack to a short vector, multiply at a short granularity, and then pack it back to byte vector. This was somewhat costly since now shuffle backing storage matches the lane size of the corresponding vector. Hence, the perofmance of iota computation with a non-unit scalar should improve.

I believe with the type information of vector elements this optimization should be trivial.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1877566967