IntVector.fromValues is not optimized away ?

Tue May 12 19:45:31 UTC 2020

Issue logged:

  https://bugs.openjdk.java.net/browse/JDK-8244856 <https://bugs.openjdk.java.net/browse/JDK-8244856>

Paul.

> On May 11, 2020, at 6:39 PM, John Rose <john.r.rose at oracle.com> wrote:
> 
> On May 11, 2020, at 5:14 PM, Paul Sandoz <paul.sandoz at oracle.com <mailto:paul.sandoz at oracle.com>> wrote:
>> 
>> I wonder if it's possible to teach the shared reduction code about operations using the identity value?
> 
> 
> In general, I’d encourage us to put as much into shared code as
> possible.  We have more vector hardware in our future; I’m thinking
> of GPUs of course, and who knows what other CPUs or VPUs will be
> important in 10 years.
> 
> BRW, this reminds me that in some cases reduction operations are
> most naturally formulated as type (scalar, vector) -> scalar, not just
> (vector) -> scalar.  The two-argument form reduces to the one-argument
> form when the input scalar is the identity value.  The two-argument
> form is useful when several vectors are being rolled up together,
> perhaps in a loop.  I think we may want (not now but later) to make
> the building block be the two-argument reduction, not the simpler
> one.
> 
> Also BTW, and independently, we might wish to make a shared
> convention (in C2 and the Java code) that reductions are always
> done in some particular order, when it matters.  If we do make
> such a choice, we should choose a particular binary spanning tree,
> since that, generally speaking, is how it’s done in hardware.
> Disagreements between spanning tree orders can be removed
> (if needed) by one-time permutations of the input.
> 
> It seems to me that the two observations work against each other,
> since you can’t build such a good spanning tree on 1+2^lgN nodes
> as you can on 2^lgN nodes.  This is one reason we need some time
> (after the current release) to consider the proper order specification
> for reductions in our portable API.
> 
> (BTW, the difference in order only matters with floating operations
> that have NaNs and/or rounding errors.  So the problems with order
> are limited only to those, and whatever other non-associative
> operations we might define in the future.)
> 
> Two arguments in favor of reducing in N-1 sequential steps instead
> of lgN steps of parallel operations:  It’s the simplest to specify, and
> works best with the binary version of reduction.  One argument
> against:  It will make rounding and NaNs slow FP operations down.
> Maybe there’s a “strictfp” move we can use to allow the JVM more
> latitude for reordering reductions in to lan trees, except in strict code.
> 
> — John