Improve the efficiency of VectorShuffle usage
Paul Sandoz
paul.sandoz at oracle.com
Thu Dec 19 18:27:08 UTC 2024
If we delay the wrapping (full or partial) I think we would still need to assume a default implicit wrapping state (partial) if the current VectorShuffle API is to remain the mostly same e.g.,
jshell> int[] values = new int[] {1, 13, 19, 29}
values ==> int[4] { 1, 13, 19, 29 }
jshell> VectorShuffle<Integer> s = IntVector.SPECIES_128.shuffleFromValues(values)
s ==> Shuffle[1, -3, -1, -3]
jshell> int[] shuffleValues = s.toArray()
shuffleValues ==> int[4] { 1, -3, -1, -3 }
jshell> VectorShuffle<Integer> s = IntVector.SPECIES_128.shuffleFromValues(shuffleValues)
s ==> Shuffle[1, -3, -1, -3]
jshell> s.toArray()
$9 ==> int[4] { 1, -3, -1, -3 }
Ideally this is mostly all about implementation changes. We should discuss further with John.
On the 2048-bit sizes I think it ok to disallow for now. Ideally we do this for the partially wrapped cases and where we can focus the rejection on shuffle creation rather than on use, but this conflicts with the laziness of on-demand wrapping.
Paul.
> On Dec 19, 2024, at 9:02 AM, Quân Anh Mai <anhmdq at gmail.com> wrote:
>
> Thanks a lot for your response,
>
> Actually, we do not need to wrap at all on construction, a VectorShuffle is a black box, from the creation of a VectorShuffle to its usage there is somewhere in between when we do the wrap, and that is totally enough.
>
> From your response, I think a more reasonable proposal would be: On VectorShuffle construction, we wrap all indices to [0, 2 * VLENGTH - 1] (instead of the current model of wrapping oob indices to [-VLENGTH, -1]) and when using that VectorShuffle for a 1-operand rearrange, we wrap the indices to [0, VLENGTH - 1].
>
> Implementation-wise, we do not do any wrapping on construction, and for operations that observe the VectorShuffle instance, we wrap at those places. This allows us to reduce the number of wrapping to the minimum for the most frequent operations. For other operations, the wrapping operation is itself cheap and should be GVN-ed.
>
> An important question is that what we should do with the hypothetical 2048-bit byte VectorShuffles. We cannot wrap those to [0, 2 * VLENGTH - 1] because of implementation limitations. Should then we disallow all operations that would observe that (toVector, laneSource, 2-operand rearrange)?
>
> Cheers,
> Quan Anh
More information about the panama-dev
mailing list