Numerical Stream code

Howard Lovatt howard.lovatt at gmail.com
Wed Feb 13 22:34:23 PST 2013


Hi,

I have been trying out lambdas on:

openjdk version "1.8.0-ea"
OpenJDK Runtime Environment (build
1.8.0-ea-lambda-nightly-h3307-20130211-b77-b00)
OpenJDK 64-Bit Server VM (build 25.0-b15, mixed mode)

To see if scientific type numerical code can use Streams. I wrote a
synthetic benchmark that applies a kernel repeatedly over time and space to
solve a diffusion equation in 1 D, e.g. heat diffusing into a metal rod
from either end. The core of the code is:

  private enum Styles implements Style {
    CLike {
      @Override public double run() {
        uM1[0] = uT0; // t = 0
        for (int xi = 1; xi < numXs - 1; xi++) { uM1[xi] = u0X; }
        uM1[numXs - 1] = uT1;
        for (int ti = 1; ti < numTs; ti++, uTemp = uM1, uM1 = u0, u0 =
uTemp) { // t > 0
          u0[0] = uT0; // x = 0
          for (int xi = 1; xi < numXs - 1; xi++) { u0[xi] =
explicitFDM.u00(uM1[xi - 1], uM1[xi], uM1[xi + 1]); } // 0 < x < 1
          u0[numXs - 1] = uT1; // x = 1
        }
        double sum = 0; // Calculate average of last us
        for (final double u : uM1) { sum += u; }
        return sum / numXs;
      }
    },

    SerialStream {
      @Override public double run() {
        Arrays.indices(uM1).forEach(this::t0);
        for (int ti = 1; ti < numTs; ti++, uTemp = uM1, uM1 = u0, u0 =
uTemp) { // t > 0
          Arrays.indices(uM1).forEach(this::tg0);
        }
        return Arrays.stream(uM1).average().getAsDouble(); // Really slow!
      }
    },

    ParallelStream {
      @Override public double run() {
        Arrays.indices(uM1).parallel().forEach(this::t0);
        for (int ti = 1; ti < numTs; ti++, uTemp = uM1, uM1 = u0, u0 =
uTemp) { // t > 0
          Arrays.indices(uM1).parallel().forEach(this::tg0);
        }
        return Arrays.stream(uM1).parallel().average().getAsDouble(); //
Really really slow!!
      }
    };

    double[] u0 = new double[numXs];
    double[] uM1 = new double[numXs];
    double[] uTemp = null;

    void t0(final int xi) {
      if (xi == 0) { uM1[0] = uT0; }
      else if (xi == numXs - 1) { uM1[numXs - 1] = uT1; }
      else { uM1[xi] = u0X; }
    }

    void tg0(final int xi) {
      if (xi == 0) { u0[0] = uT0; }
      else if (xi == numXs - 1) { u0[numXs - 1] = uT1; }
      else { u0[xi] = explicitFDM.u00(uM1[xi - 1], uM1[xi], uM1[xi + 1]); }
    }
  }

And when run it produces:

CLike: time = 2351 ms, result = 99.99581170383331
SerialStream: time = 20532 ms, result = 99.99581170383331
ParallelStream: time = 131317 ms, result = 99.99581170383331

The slowness is a pity because the coding comes out quite well!

I wasn't particularly expecting the Stream implementation to be fast,
because they are a work in progress after all. However a factor of almost
10 for the serial case and over 50 for the parallel case seems excessive. I
therefore suspect that I am doing something wrong.

Can anyone enlighten me?

Thanks,

  -- Howard.


More information about the lambda-dev mailing list