Why Vector API is slower than Scalar-Style Code ?

Gary Gao garygaowork at gmail.com
Sun Apr 4 08:24:50 UTC 2021


Hi, everyone, I tried Panama Vector API, which is included in OpenJDK 16,
on my Mac.

The code below shows a long array named a add another long array named b,
I foud out that when their length is small(such as 200), doAdd() is much
faster than doAddWithSIMD(),when their is big (such as 200 million),
doAdd() is slower than doAddWith SIMD, but not too much, lower than one
magnitude.
The result is not similar to what I have seen on many slides and videos
talking about vector API.
They all show Vector API is at least 2x faster than scalar style code.

Can anyone help me to figure it out ?

Mac:


and codes are below:


*package **org.example*;
*import *jdk.incubator.vector.*LongVector*;
*import *jdk.incubator.vector.*VectorSpecies*;
*import *java.util.Arrays;
*import *java.util.Random;
*public class *HelloVector
{
*public static void *main( String[] args )
{
*VectorSpecies *species = *LongVector*.SPECIES_MAX;
// when len = 200 doAdd() is done in about 6000 nano second, but *doAddWithSIMD
needs 26808696 nano seconds*
// when len = 200 million doAdd() is done in about 280,000,000 nano
second, *doAddWithSIMD
needs **230,000,000** nano seconds*
*int *len = 200;
*long*[] a = *initArray*(91);
*long*[] b = *initArray*(91);
*long*[] c = *new long*[len];
*long *p1 = System.*nanoTime*();
*doAdd*(a, b, c);
*long *p2 = System.*nanoTime*();
*doAddWithSIMD*(a, b, c, species);
*long *p3 = System.*nanoTime*();
System.out.println("RAW: " + (p2 - p1) + ", SIMD: " + (p3 - p2));
}
*public static long*[] initArray(*int *len) {
Random random = *new *Random();
*long*[] lArr = *new long*[len];
*for *(*int *i = 0; i < len; i++) {
* long l = random.nextLong();*
lArr[i] = l;
}
*return *lArr;
}
*public static void *doAdd(*long*[] a, *long*[] b, *long*[] c) {
*for *(*int *i = 0; i < a.length; i++) {
c[i] = a[i] + b[i];
}
}
*public static void *doAddWithSIMD(*long*[] a, *long*[] b, *long*[] c,
*VectorSpecies
*species) {
*int *i = 0;
*for *(; i + species.length() < a.length; i += species.length()) {
*LongVector *op1 = *LongVector*.*fromArray*(species, a, i);
*LongVector *op2 = *LongVector*.*fromArray*(species, a, i);
*LongVector *res = op1.add(op2);
res.intoArray(c, i);
}
*for *(; i < a.length; i++) {
c[i] = a[i] + b[i];
}
}
}


More information about the panama-dev mailing list