RFR(S): 8231291: C2: loop opts before EA should maximally unroll loops

Wed Dec 18 10:02:01 UTC 2019

Hi Roland,

> http://cr.openjdk.java.net/~roland/8231291/webrev.01/

As I understand, the intention of the change you propose is to perform 
complete loop unrolling earlier so EA can benefit from it.

Some comments:

src/hotspot/share/opto/loopnode.cpp:

+  if (mode == LoopOptsMaxUnroll) {
+    for (LoopTreeIterator iter(_ltree_root); !iter.done(); iter.next()) {
+      IdealLoopTree* lpt = iter.current();
+      if (lpt->is_innermost() && lpt->_allow_optimizations && 
!lpt->_has_call && lpt->is_counted()) {
+        lpt->compute_trip_count(this);
+        if (!lpt->do_one_iteration_loop(this) &&
+            !lpt->do_remove_empty_loop(this)) {
+          AutoNodeBudget node_budget(this);
+          if (lpt->policy_maximally_unroll(this)) {
+            memset( worklist.adr(), 0, worklist.Size()*sizeof(Node*) );
+            do_maximally_unroll(lpt, worklist);
+          }
+        }
+      }
+    }

It looks like LoopOptsMaxUnroll is a shortened version of 
IdealLoopTree::iteration_split/iteration_split_impl().

Have you considered factoring out the common code? Right now, its hard 
to correlate the checks for LoopOptsMaxUnroll with iteration_split() and 
there's a risk they'll diverge eventually.

Do you need the following steps from the original version?

================================================
   // Look for loop-exit tests with my 50/50 guesses from the Parsing stage.
   // Replace with a 1-in-10 exit guess.
   if (!is_root() && is_loop()) {
     adjust_loop_exit_prob(phase);
   }

   // Compute loop trip count from profile data
   compute_profile_trip_cnt(phase);

No use of profiling data since full unrolling is happening anyway?

========================
   if (!cl->is_valid_counted_loop()) return true; // Ignore various 
kinds of broken loops

========================
   // Do nothing special to pre- and post- loops
   if (cl->is_pre_loop() || cl->is_post_loop()) return true;

I assume there are no pre-/post-loops exist at that point, so these 
checks are redundant. Turn them into asserts?

========================
   if (cl->is_normal_loop()) {
     if (policy_unswitching(phase)) {
       phase->do_unswitching(this, old_new);
       return true;
     }
     if (policy_maximally_unroll(phase)) {
       // Here we did some unrolling and peeling.  Eventually we will
       // completely unroll this loop and it will no longer be a loop.
       phase->do_maximally_unroll(this, old_new);
       return true;
     }

You don't perform loop unswitching at all. So, the order of operations 
changes. Do you see any problems with that?

========================

src/hotspot/share/opto/compile.cpp
    // Perform escape analysis
    if (_do_escape_analysis && ConnectionGraph::has_candidates(this)) {
      if (has_loops()) {
        // Cleanup graph (remove dead nodes).
        TracePhase tp("idealLoop", &timers[_t_idealLoop]);
-      PhaseIdealLoop::optimize(igvn, LoopOptsNone);
+      PhaseIdealLoop::optimize(igvn, LoopOptsMaxUnroll);
        if (major_progress()) print_method(PHASE_PHASEIDEAL_BEFORE_EA, 2);
        if (failing())  return;
      }
      ConnectionGraph::do_analysis(this, &igvn);

Does it make sense to do more elaborate checks before performing early 
full loop unrolling? Like whether candidates are used inside loop bodies?

Best regards,
Vladimir Ivanov
> 
> As discussed before:
> 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-September/035094.html
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-November/036171.html
> 
> Fully unrolling loops early helps EA. The change to cfgnode.cpp is
> required because full unroll sometimes needs peeling which may add a phi
> between a memory access and its AddP, a pattern that EA doesn't
> recognize.
> 
> Roland.
>