Numerical Stream code

Thu Feb 14 03:57:01 PST 2013

Hi, Howward,

the price for convenience is the performance:) The specialization version
of stream has much mitigated the stress numerical computing, I guess.
 After quickly scanning the sources, the DoubleStatistics does more things.
So, one of my suggestions is to use the direct average(like sum/length
) rather than use the default average() to see if there is some
improvement. And the benchmark is always a tricky stuff. You should warm up
your bench codes enough at least.

best regards,
Jin Mingjian

On Thu, Feb 14, 2013 at 2:34 PM, Howard Lovatt <howard.lovatt at gmail.com>wrote:

> Hi,
>
> I have been trying out lambdas on:
>
> openjdk version "1.8.0-ea"
> OpenJDK Runtime Environment (build
> 1.8.0-ea-lambda-nightly-h3307-20130211-b77-b00)
> OpenJDK 64-Bit Server VM (build 25.0-b15, mixed mode)
>
> To see if scientific type numerical code can use Streams. I wrote a
> synthetic benchmark that applies a kernel repeatedly over time and space to
> solve a diffusion equation in 1 D, e.g. heat diffusing into a metal rod
> from either end. The core of the code is:
>
>   private enum Styles implements Style {
>     CLike {
>       @Override public double run() {
>         uM1[0] = uT0; // t = 0
>         for (int xi = 1; xi < numXs - 1; xi++) { uM1[xi] = u0X; }
>         uM1[numXs - 1] = uT1;
>         for (int ti = 1; ti < numTs; ti++, uTemp = uM1, uM1 = u0, u0 =
> uTemp) { // t > 0
>           u0[0] = uT0; // x = 0
>           for (int xi = 1; xi < numXs - 1; xi++) { u0[xi] =
> explicitFDM.u00(uM1[xi - 1], uM1[xi], uM1[xi + 1]); } // 0 < x < 1
>           u0[numXs - 1] = uT1; // x = 1
>         }
>         double sum = 0; // Calculate average of last us
>         for (final double u : uM1) { sum += u; }
>         return sum / numXs;
>       }
>     },
>
>     SerialStream {
>       @Override public double run() {
>         Arrays.indices(uM1).forEach(this::t0);
>         for (int ti = 1; ti < numTs; ti++, uTemp = uM1, uM1 = u0, u0 =
> uTemp) { // t > 0
>           Arrays.indices(uM1).forEach(this::tg0);
>         }
>         return Arrays.stream(uM1).average().getAsDouble(); // Really slow!
>       }
>     },
>
>     ParallelStream {
>       @Override public double run() {
>         Arrays.indices(uM1).parallel().forEach(this::t0);
>         for (int ti = 1; ti < numTs; ti++, uTemp = uM1, uM1 = u0, u0 =
> uTemp) { // t > 0
>           Arrays.indices(uM1).parallel().forEach(this::tg0);
>         }
>         return Arrays.stream(uM1).parallel().average().getAsDouble(); //
> Really really slow!!
>       }
>     };
>
>     double[] u0 = new double[numXs];
>     double[] uM1 = new double[numXs];
>     double[] uTemp = null;
>
>     void t0(final int xi) {
>       if (xi == 0) { uM1[0] = uT0; }
>       else if (xi == numXs - 1) { uM1[numXs - 1] = uT1; }
>       else { uM1[xi] = u0X; }
>     }
>
>     void tg0(final int xi) {
>       if (xi == 0) { u0[0] = uT0; }
>       else if (xi == numXs - 1) { u0[numXs - 1] = uT1; }
>       else { u0[xi] = explicitFDM.u00(uM1[xi - 1], uM1[xi], uM1[xi + 1]); }
>     }
>   }
>
> And when run it produces:
>
> CLike: time = 2351 ms, result = 99.99581170383331
> SerialStream: time = 20532 ms, result = 99.99581170383331
> ParallelStream: time = 131317 ms, result = 99.99581170383331
>
> The slowness is a pity because the coding comes out quite well!
>
> I wasn't particularly expecting the Stream implementation to be fast,
> because they are a work in progress after all. However a factor of almost
> 10 for the serial case and over 50 for the parallel case seems excessive. I
> therefore suspect that I am doing something wrong.
>
> Can anyone enlighten me?
>
> Thanks,
>
>   -- Howard.
>
>