[15] RFR (L): 8235824: C2: Merge AD instructions for AddReductionV and MulReductionV nodes

Fri Dec 13 08:38:25 UTC 2019

Thanks for the reviews, Vladimir & John.

> As I mentioned in the 8235756 thread, a good way to factor the
> implementation of (associative) reductions would be to reformulate
> them as the repeated composition of 2N-to-N-lane reductions.
> 
> For non-associative reductions (floating point), the 2N-to-N
> pattern is acceptable, *if* the reduction is specified to happen in
> that order.  To get that permission into the contract will require
> a distinction between reduceSequential and reduceParallel
> operations in the Vector API.
> 
> That sequence of 16 vaddss operations is certainly an eyesore,
> but it’s not clear how to improve on it, algorithmically.  Perhaps
> it could be factored into a sequential accumulation operation,
> to be repeated N times instead of lg N times.

Yes, I agree that reduction nodes look too high-level for matching 
purposes: having a node per reduction step is much more suitable (at 
least, on x86). I think if the IR is shaped that way (nested reduction 
steps which reduce a vector to a scalar), there's a way to introduce a 
single shared IR node which represents a reduction step (2N => N) across 
all vector shapes.

Best regards,
Vladimir Ivanov