[aarch64-port-dev ] RFR: 8169697: aarch64: vectorized MLA instruction not generated for some test cases

Thu Nov 24 14:01:15 UTC 2016

>> There are two versions of TestSimdMlaInt.vectSumOfMulAdd1 compiled by C2.
>> Please check the OSR version (with %) which uses vector mla instruction
>> with my patch. Without my patch, vector mul and add instructions are used.
>
> OK, so you have the same problem that I do with this.  I do not
> know why vectorized code is not being generated for the non-OSR
> case.

What about with the patch below?

It seems the problem is that c2 fails to recognize the reduction in the
loop because the test below is correct only for a node that is data node
(a Phi in the case of the OSR version of the method) but not for a
control node (a return in the normal compilation case).

Roland.

diff --git a/src/share/vm/opto/loopTransform.cpp b/src/share/vm/opto/loopTransform.cpp
--- a/src/share/vm/opto/loopTransform.cpp
+++ b/src/share/vm/opto/loopTransform.cpp
@@ -1742,7 +1742,7 @@
               // The result of the reduction must not be used in the loop
               for (DUIterator_Fast imax, i = def_node->fast_outs(imax); i < imax && ok; i++) {
                 Node* u = def_node->fast_out(i);
-                if (has_ctrl(u) && !loop->is_member(get_loop(get_ctrl(u)))) {
+                if (!loop->is_member(get_loop(ctrl_or_self(u)))) {
                   continue;
                 }
                 if (u == phi) {