[aarch64-port-dev ] Scalar reduction

Thu Jun 25 10:20:49 UTC 2015

On Thu, 2015-06-25 at 10:24 +0100, Andrew Haley wrote:
> Do you know if int scalar reduction supposed to work yet?

Yes, the following shows an example

--- cut here ---
public class Sum
{
  public static void main(String[] args) {
    int[] a = new int[256*1024];
    int[] b = new int[256*1024];
    init(a,b);
    int total = 0;
    for(int j = 0; j < 2000; j++) {
      total = sum(a,b);
    }
    System.out.println("total = " + total);
  }

  public static void init(
    int[] a,
    int[] b)
  {
    for(int j = 0; j < 1; j++)
    {
      for(int i = 0; i < a.length; i++)
      {
        a[i] = i * 1 + j;
        b[i] = i * 1 - j;
      }
    }
  }

  public static int sum(
    int[] a,
    int[] b)
  {
    int total = 0;
    for(int i = 0; i < a.length; i++)
    {
      total += a[i] + b[i];
    }
    return total;
  }

}
--- cut here ---

This generates

  0x000003ff850eaa00: sbfiz     x11, x16, #2, #32  ;*iaload
                                                ; - Sum::sum at 13 (line 35)

  0x000003ff850eaa04: add       x12, x2, x11
  0x000003ff850eaa08: add       x11, x18, x11
  0x000003ff850eaa0c: ldr       q17, [x11,#16]
  0x000003ff850eaa10: ldr       q16, [x12,#16]
  0x000003ff850eaa14: sbfiz     x11, x16, #2, #32
  0x000003ff850eaa18: add       x12, x2, x11
  0x000003ff850eaa1c: add       x11, x18, x11
  0x000003ff850eaa20: ldr       q19, [x11,#32]
  0x000003ff850eaa24: ldr       q18, [x12,#32]
  0x000003ff850eaa28: add       v16.4s, v16.4s, v17.4s
  0x000003ff850eaa2c: add       v17.4s, v18.4s, v19.4s
  0x000003ff850eaa30: addv      s18, v16.4s        <<<<< SCALAR REDUCTION
  0x000003ff850eaa34: mov       w12, v18.s[0]
  0x000003ff850eaa38: add       w11, w12, w0
  0x000003ff850eaa3c: add       w16, w16, #0x8  ;*iinc
                                                ; - Sum::sum at 20 (line 33)

  0x000003ff850eaa40: addv      s16, v17.4s
  0x000003ff850eaa44: mov       w13, v16.s[0]
  0x000003ff850eaa48: add       w0, w13, w11    ;*iadd
                                                ; - Sum::sum at 18 (line 35)

  0x000003ff850eaa4c: cmp       w16, w10
  0x000003ff850eaa50: b.lt      0x000003ff850eaa00  ;*if_icmpge

> 
> This doesn't seem to be vectorized:
> 
>     int sum(int[] a) {
>         int val = 0;
>         for(int elem: a)
>             val += elem;
>         return val;
>     }

But yes, it seems rather bad that it doesn't get this.

I'll take a closer look,
Ed.