RFR(M): 8223051: support loops with long (64b) trip counts

Thu Aug 20 00:47:02 UTC 2020

On Aug 17, 2020, at 1:49 AM, Roland Westrelin <rwestrel at redhat.com> wrote:
> 
> Does the fixed patch look ok to you?

I’m going over it one more time (between smoky breaths from
the California fires) and I have a question.  What is the exact
structure of outer_phi?

1. At first it is a clone of phi, its region patched and the other edges the same:

    outer_phi := Phi(outer_head, init, AddL(phi, stride))

2. After long_loop_replace_long_iv, the interior phi links it back to itself:

    outer_phi := Phi(outer_head, init, AddL(AddL(inner_phi, outer_phi), stride))

The thing I’m suspicious of (as fragile code) is the retention of the addend
‘stride’.  The inner loop (on ‘inner_phi’) *also* adds the stride.  What
prevents there from being a superfluous number of strides added?

I suppose the answer is that the ending output of the inner loop produces
the post-incremented value as (inner_incr + outer_phi), while the
pre-incremented value is (inner_phi + outer_phi), which is *always*
short of the final count by the stride; thus it’s OK to add single missing
stride in the outer loop.

But, this seems fragile to me.  Would it not be safer to make the outer
phi just copy the final inner IV value (the one that fails the loop’s test)?
So:

        outer_phi := Phi(outer_head, init, AddL(inner_incr, outer_phi))

I’m worried that some edge-case of loop loop might actually miss
a stride unless the outer phi has the latter, dead-simple form.

In particular, there are loops where there is only one IV (no separate
incr).  I’m not confident that those loops will work correctly; it seems
to me that the existing ‘outer_phi’, with its extra stride addend,
may well contribute an unwanted extra step, when the inner_phi
is post-incremented.

As a related issue, I think the pseudocode comment at the top is
false, with the same problem.  It should probably not say this:

// L1: for (long phi1 = init; phi1 < limit; phi1 += stride) {
//    // phi1 := Phi(L1, init, phi1 + stride)

but rather this:

// L1: for (long phi1 = init; phi1 < limit; phi1 += phi2) {
//    // phi1 := Phi(L1, init, phi1 + phi2)

This sort of bug will show up if (a) we test long loops with
large trip counts, and (b) also use the stress mode which
makes the outer loop trip two or three times, and finally
(c) we get several kinds of loops; ones with and without
phi == incr, and with and without ‘limit_check_required’,
and with each kind of possible termination condition
(< <= > >=).

— John

P.S.  I came to this question while working through the transform
logic on pseudocode.  Here it is, for reference.  It think it might
make a good diagram to place in the code, just before the comment
that says “Peel one iteration”.