Trying out Vector API with ShiftOr algorithm

Paul Sandoz paul.sandoz at oracle.com
Thu Mar 16 22:23:15 UTC 2023



> On Mar 14, 2023, at 7:37 AM, Quân Anh Mai <anhmdq at gmail.com> wrote:
> 
> The slowdown relative to the scalar implementation is expected since the CPU can execute those instructions really efficiently, and vector operations only shine because they can perform on multiple inputs concurrently, as well as powerful cross-lane operations, both of which are absent from the benchmark.
> 

Yes, there is no data parallelism in the loop.

Ideally we should be able to stride over the elements of haystack by the long vector length, use a mapped load from the filter array using the haystack as the index map array.

@ForceInline
public final
void intoArray(long[] a, int offset,
               int[] indexMap, int mapOffset) {

But haystack is a byte[] array. If that array is promoted to an int[] array it might be possible. There has been some performance work done on mapped loads/stores but there is likely more work needed, so no guarantees this will be more performance right now.

Paul.


More information about the panama-dev mailing list