RFR(M): 8223051: support loops with long (64b) trip counts

Thu Aug 20 00:47:46 UTC 2020

On Aug 19, 2020, at 5:47 PM, John Rose <john.r.rose at oracle.com> wrote:
> 
> On Aug 17, 2020, at 1:49 AM, Roland Westrelin <rwestrel at redhat.com <mailto:rwestrel at redhat.com>> wrote:
>> 
>> Does the fixed patch look ok to you?
> 
> I’m going over it one more time (between smoky breaths from
> the California fires) and I have a question.  What is the exact
> structure of outer_phi?
> 
> 1. At first it is a clone of phi, its region patched and the other edges the same:
> 
>     outer_phi := Phi(outer_head, init, AddL(phi, stride))
> 
> 2. After long_loop_replace_long_iv, the interior phi links it back to itself:
> 
>     outer_phi := Phi(outer_head, init, AddL(AddL(inner_phi, outer_phi), stride))
> 
> The thing I’m suspicious of (as fragile code) is the retention of the addend
> ‘stride’.  The inner loop (on ‘inner_phi’) *also* adds the stride.  What
> prevents there from being a superfluous number of strides added?
> 
> I suppose the answer is that the ending output of the inner loop produces
> the post-incremented value as (inner_incr + outer_phi), while the
> pre-incremented value is (inner_phi + outer_phi), which is *always*
> short of the final count by the stride; thus it’s OK to add single missing
> stride in the outer loop.
> 
> But, this seems fragile to me.  Would it not be safer to make the outer
> phi just copy the final inner IV value (the one that fails the loop’s test)?
> So:
> 
>         outer_phi := Phi(outer_head, init, AddL(inner_incr, outer_phi))
> 
> I’m worried that some edge-case of loop loop might actually miss
> a stride unless the outer phi has the latter, dead-simple form.
> 
> In particular, there are loops where there is only one IV (no separate
> incr).  I’m not confident that those loops will work correctly; it seems
> to me that the existing ‘outer_phi’, with its extra stride addend,
> may well contribute an unwanted extra step, when the inner_phi
> is post-incremented.
> 
> As a related issue, I think the pseudocode comment at the top is
> false, with the same problem.  It should probably not say this:
> 
> // L1: for (long phi1 = init; phi1 < limit; phi1 += stride) {
> //    // phi1 := Phi(L1, init, phi1 + stride)
> 
> but rather this:
> 
> // L1: for (long phi1 = init; phi1 < limit; phi1 += phi2) {
> //    // phi1 := Phi(L1, init, phi1 + phi2)
> 
> This sort of bug will show up if (a) we test long loops with
> large trip counts, and (b) also use the stress mode which
> makes the outer loop trip two or three times, and finally
> (c) we get several kinds of loops; ones with and without
> phi == incr, and with and without ‘limit_check_required’,
> and with each kind of possible termination condition
> (< <= > >=).
> 
> — John
> 
> P.S.  I came to this question while working through the transform
> logic on pseudocode.  Here it is, for reference.  It think it might
> make a good diagram to place in the code, just before the comment
> that says “Peel one iteration”.

== old IR nodes =>

entry_control: {...}
x:
for (long phi = init;;) {
  // phi := Phi(x, init, phi + stride)
  exit_test:
  if (phi < limit)
    back_control: fallthrough;
  else
    exit_branch: break;
  // test happens after increment => phi == phi_incr != NULL
  long incr = (phi + stride);
  ... use phi and incr ...
  phi = incr;
}

== new IR nodes (before final peel) =>

entry_control: {...}
long adjusted_limit = limit + stride;  //because phi_incr != NULL
assert(!limit_check_required || (extralong)limit + stride == adjusted_limit);  // else deopt
ulong inner_iters_limit = max_jint - ABS(stride) - 1;  //near 0x7FFFFFF0
outer_head:
for (long outer_phi = init;;) {  //phi->clone(), in(0):=outer_head
  // outer_phi := Phi(outer_head, init, inner_phi, phi=>(outer_phi+inner_phi) + stride)
  // >>> ISSUE: is the extra '+ stride' here always harmless? <<<
  ulong inner_iters_max = (ulong) MAX(0LL, ((extralong)adjusted_limit - outer_phi) * SGN(stride));
  int inner_iters_actual_int = (int) MIN(inner_iters_limit, inner_iters_max) * SGN(stride);
  inner_head: x: //in(1) := outer_head
  for (int inner_phi = 0;;) {
    // inner_phi := Phi(x, intcon(0), inner_phi + stride)
    int inner_incr = inner_phi + stride;
    bool inner_bol = (inner_incr < inner_iters_actual_int);
    exit_test: //exit_test->in(1) := inner_bol; 
    if (inner_bol) // WAS (phi < limit)
      back_control: fallthrough;
    else
      inner_exit_branch: break;  //exit_branch->clone()
    // REPLACE phi  => (outer_phi+inner_phi)
    // REPLACE incr => (outer_phi+inner_incr)
    ... use phi=>(outer_phi+inner_phi) and incr=>(outer_phi+inner_incr) ...
    inner_phi = inner_phi + stride;  // inner_incr
  }
  outer_exit_test:  //exit_test->clone(), in(0):=inner_exit_branch
  if ((outer_phi+inner_phi) < limit)  // WAS (phi < limit)
    outer_back_branch: fallthrough;  //back_control->clone(), in(0):=outer_exit_test
  else
    exit_branch: break;  //in(0) := outer_exit_test
}