RFR: 8320725: C2: Add "is_associative" flag for floating-point add-reduction [v3]

Tue Mar 19 20:51:22 UTC 2024

On Mon, 18 Mar 2024 12:52:37 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> src/hotspot/share/opto/vectornode.cpp line 1332:
>> 
>>> 1330:   case Op_AddReductionVL: return new AddReductionVLNode(ctrl, n1, n2);
>>> 1331:   case Op_AddReductionVF: return new AddReductionVFNode(ctrl, n1, n2, is_associative);
>>> 1332:   case Op_AddReductionVD: return new AddReductionVDNode(ctrl, n1, n2, is_associative);
>> 
>> Why do you only do it for the `F/D` `Add` instructions, but not the `Mul` instructions? Would those not equally profit from associativity?
>
> I'm not super familiar with the Vector API, but I could not see that MUL is not associative.

Yes, MUL is non-associative in VectorAPI just like ADD operation (according to the description here - https://docs.oracle.com/en/java/javase/19/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorOperators.html#fp_assoc).

We found a significant perf difference between the SVE "fadda" instruction which is a strictly ordered instruction vs Neon instructions on a 128-bit SVE machine especially after this optimization - https://bugs.openjdk.org/browse/JDK-8298244 but there's no such performance difference for the MUL operation. MulReductionVF/VD do not have direct instructions for multiply reduction nor do they have separate ISA for strictly ordered or non-strictly ordered. So, currently we do not have any data that shows any benefit to add similar code for MUL and thus it's currently considered to be a non-associative operation (strictly ordered). I am not sure about other platforms.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1531091964