Dot Product Thoughts

Thu Apr 18 17:05:26 PDT 2013

On Apr 18, 2013, at 4:30 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

>> 1. There's a Double::sum, but no Double::multiply, etc.  I appreciate you
>> have to stop somewhere, but is sum the place to stop?  Might be worth
>> adding other basic arithmetic operations.
> 
> Yes, we definitely stopped in the wrong place :)
> 
> Sensible would be the ability to express all built-in operators as 
> functions so they can be used as reducers.

Yes.  I suggest mining both the JLS and the JVMS for candidate "operators".
The bytecode i2d corresponds to an implicit conversion, which isn't an operator at the JLS level.

> I suspect that the naming convention we've started down the road of 
> (sum) may not scale as well as we'd like.  Peraps that should be 
> reconsidered now as we now have methods like IntStream.sum() and no 
> longer have as much need for Integer::sum when used as a reducer; 
> perhaps it should be renamed opPlus or opSum or some such.
> 
>> 2. There appears to be a zip function now, but with no overloads for
>> primitive streams.  I managed to guess at Streams.zip as the location, so
>> one data point, but good news on that front.
> 
> Yeah, this one is on the fence.  We should go one way or the other.

Ouch.  Quadratic overloading hurts.  Want tuples.  Need value type support in JVM first.

> ...
>> 4. A performance comparison with a trivial imperative example resulted in a
>> ~16x slowdown moving to lambdas.  I'm willing to take some performance hit
>> for the nicer code, but 16x is a lot higher than I would have expected or
>> hoped for.  I'll try to have a look at some more 'real world' examples in
>> future, but even then being this much slower on mathematical problems will
>> cause some people trouble.
> 
> The real cost is in replacing the primitive operators with method 
> invocations; the cost of the streams framework is a smaller component. 

Vector reduce is *almost* a microbenchmark, in the sense that the leaf calls do almost nothing (one CPU instruction).

So, like a microbenchmark, it really stresses the JIT and the overheads in the framework, notably primitive boxing and (probably) array storage.  Unlike a microbenchmark, it's something you might really want to do.  Or at least, it is similar; I would say a segmented multi-reduce (during a sparse matrix multiply) is more plausible, and that starts to get away from what Lambda streams are about.  Still, plain mega-reduce is a good test case.

To optimize your loop, the JIT must inline everything on the hot path, and avoid a bunch of hazards, including: putting values in temporary Object[] arrays, dropping through megamorphic interface calls (in common code), keeping values in boxes, spending compiler resources on cold paths.  The JIT team is gearing up for this, and it's going to be an exciting project.

> Now, where'd I put that jar of Inline Sauce...

Those JSR 292 guys stole it from the fridge.

— John