Why Vector API is slower than Scalar-Style Code ?

Gary Gao garygaowork at gmail.com
Mon Apr 5 13:58:54 UTC 2021


Hi Remi,
     I run the code you modified but it still the same result, which is
vector api significantly slower than scalar style code when array len is
smaller than 2 million, still don't know why. by the way my mac cpu is
intel core i7.

Thanks and regards,
Gary

On Sun, Apr 4, 2021 at 6:48 PM Remi Forax <forax at univ-mlv.fr> wrote:

> ----- Mail original -----
> > De: "Gary Gao" <garygaowork at gmail.com>
> > À: "panama-dev at openjdk.java.net'" <panama-dev at openjdk.java.net>
> > Envoyé: Dimanche 4 Avril 2021 10:24:50
> > Objet: Why Vector API is slower than Scalar-Style Code ?
>
> > Hi, everyone, I tried Panama Vector API, which is included in OpenJDK 16,
> > on my Mac.
> >
> > The code below shows a long array named a add another long array named b,
> > I foud out that when their length is small(such as 200), doAdd() is much
> > faster than doAddWithSIMD(),when their is big (such as 200 million),
> > doAdd() is slower than doAddWith SIMD, but not too much, lower than one
> > magnitude.
> > The result is not similar to what I have seen on many slides and videos
> > talking about vector API.
> > They all show Vector API is at least 2x faster than scalar style code.
> >
> > Can anyone help me to figure it out ?
>
>
> Hi,
> there are several issues in your code, first SPECIES should be a constant,
> not something you pass as a parameter,
> then when initializing op2 in doAddWithSIMD you are uisng 'a' instead of
> 'b'.
>
> You have to remember that the vector API is using the JIT to replace the
> method calls fromArray, add, etc to the corresponding vector instructions,
> so the code has to be JITed (and SPECIES has to be a constant for the JIT).
> But in your code, you call doAddWithSIMD once with a small length (200) so
> the method doAddWithSIMD is not JITed.
> If you add warmup calls like below, it will work.
> (there is a cool tool called JMH which do all the warmup thing and more if
> you want to do serious testing)
>
> On my laptop, i've the roughly the same time for doAdd and doAddWithSIMD.
> That's because Hotspot also does auto-vectorisation of simple loop, so
> doAdd also uses SIMD/AVX instructions.
> If you test with a min instead of a add, Hotpsot does not do
> auto-vectorisation of min AFAIK, you will see a difference between the SIMD
> version and the non SIMD version.
>
> regards,
> Rémi
>
> ---
>
> import jdk.incubator.vector.LongVector;
> import jdk.incubator.vector.VectorSpecies;
>
> import java.util.Random;
>
> public class HelloVector {
>   private static final VectorSpecies<Long> SPECIES =
> LongVector.SPECIES_PREFERRED;
>
>   public static void main( String[] args ) {
> // when len = 200 doAdd() is done in about 6000 nano second, but
> doAddWithSIMD needs 26808696 nano seconds
> // when len = 200 million doAdd() is done in about 280,000,000 nano
> second, doAddWithSIMD needs 230,000,000 nano seconds
>     int len = 2_000_000;
>     long[] a = initArray(len);
>     long[] b = initArray(len);
>     long[] c = new long[len];
>
>     // warmup
>     for(int i = 0; i < 5; i++) {
>       doAdd(a, b, c);
>       doAddWithSIMD(a, b, c);
>     }
>
>     long p1 = System.nanoTime();
>     doAdd(a, b, c);
>     long p2 = System.nanoTime();
>     doAddWithSIMD(a, b, c);
>     long p3 = System.nanoTime();
>     System.out.println("RAW: " + (p2 - p1) + ", SIMD: " + (p3 - p2));
>   }
>
>   public static long[] initArray(int len) {
>     /*Random random = new Random();
>     long[] lArr = new long[len];
>     for (int i = 0; i < len; i++) {
>       long l = random.nextLong();
>       lArr[i] = l;
>     }
>     return lArr;*/
>     // fix the value of Random so the results are repeatable
>     return new Random(0).longs(len).toArray();
>   }
>
>   public static void doAdd(long[] a, long[] b, long[] c) {
>     for (int i = 0; i < a.length; i++) {
>       c[i] = a[i] + b[i];
>     }
>   }
>
>   public static void doAddWithSIMD(long[] a, long[] b, long[] c) {
>     int i = 0;
>     int loopBound = a.length - SPECIES.length();
>     for (; i < loopBound; i += SPECIES.length()) {
>       LongVector op1 = LongVector.fromArray(SPECIES, a, i);
>       LongVector op2 = LongVector.fromArray(SPECIES, b, i);
>       LongVector res = op1.add(op2);
>       res.intoArray(c, i);
>     }
>     for (; i < a.length; i++) {
>       c[i] = a[i] + b[i];
>     }
>   }
> }
>


More information about the panama-dev mailing list