Dot Product Thoughts

Richard Warburton richard.warburton at gmail.com
Thu Apr 18 16:09:38 PDT 2013


Hi,

Implementing a dot product between a pair of vectors brought up a few
observations about the library:

1. There's a Double::sum, but no Double::multiply, etc.  I appreciate you
have to stop somewhere, but is sum the place to stop?  Might be worth
adding other basic arithmetic operations.
2. There appears to be a zip function now, but with no overloads for
primitive streams.  I managed to guess at Streams.zip as the location, so
one data point, but good news on that front.
3. If I have an double[] is there is a cleaner way of making a DoubleStream
than adding every element to the builder?  Perhaps I'm missing something?
Is the expectation that people who care about unboxed primitives are using
specialised collections libraries, eg GNU Trove, and these libraries will
provide specialised .stream() methods returning DoubleStream, et al. ?
4. A performance comparison with a trivial imperative example resulted in a
~16x slowdown moving to lambdas.  I'm willing to take some performance hit
for the nicer code, but 16x is a lot higher than I would have expected or
hoped for.  I'll try to have a look at some more 'real world' examples in
future, but even then being this much slower on mathematical problems will
cause some people trouble.

Imperative code (x and y are arrays)

final int N = x.length;
double sum = 0;
for (int i = 0; i < N; i++) {
    sum += x[i] * y[i];
}
return sum;


Java 8 Lambdas (x and y are streams):

Streams.zip(x, y, (left, right) -> left * right)
       .reduce(0.0, Double::sum);

Notes:
 * When I profiled the hotspot wasn't in boxing/unboxing but
Iterator.forEachRemaining - pretty hot as well, ~ 95% of execution
time.  I suspect the conclusion is that the computational work of
actually performing arithmetic is entirely dominated by the cost of
iteration in the framework.
 * The 16x figure comes from an average of 90 calls for 10 Million
entry vectors.  I've left out the timings from my warm up calls, so
the timings were steady-state.
 * I didn't do thorough GC analysis, but time spent in GC was entirely
dominated by execution time in java code.

regards,

  Richard Warburton

  http://insightfullogic.com
  @RichardWarburto <http://twitter.com/richardwarburto>


More information about the lambda-dev mailing list