RFR: 8320725: C2: Add "is_associative" flag for floating-point add-reduction [v3]

Thu Mar 21 10:29:23 UTC 2024

On Tue, 19 Mar 2024 20:48:27 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

>> I'm not super familiar with the Vector API, but I could not see that MUL is not associative.
>
> Yes, MUL is non-associative in VectorAPI just like ADD operation (according to the description here - https://docs.oracle.com/en/java/javase/19/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorOperators.html#fp_assoc).
> 
> We found a significant perf difference between the SVE "fadda" instruction which is a strictly ordered instruction vs Neon instructions on a 128-bit SVE machine especially after this optimization - https://bugs.openjdk.org/browse/JDK-8298244 but there's no such performance difference for the MUL operation. MulReductionVF/VD do not have direct instructions for multiply reduction nor do they have separate ISA for strictly ordered or non-strictly ordered. So, currently we do not have any data that shows any benefit to add similar code for MUL and thus it's currently considered to be a non-associative operation (strictly ordered). I am not sure about other platforms.

Right. Ok, since your benchmarks are restiricted to NEON/SVE, I can understand these results. But I would think that probably on x86 machines this would look different, it is just that we currently have no unordered float/double add/mul reductions.

I think it would be nice if you made both Add and Mul capable of being unordered already, that would make future work in this area simpler. Or do you see a regression for unordered mul reductions on your benchmark machines?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1533604068