RFR(M): 8223051: support loops with long (64b) trip counts
John Rose
john.r.rose at oracle.com
Thu Aug 20 00:47:46 UTC 2020
On Aug 19, 2020, at 5:47 PM, John Rose <john.r.rose at oracle.com> wrote:
>
> On Aug 17, 2020, at 1:49 AM, Roland Westrelin <rwestrel at redhat.com <mailto:rwestrel at redhat.com>> wrote:
>>
>> Does the fixed patch look ok to you?
>
> I’m going over it one more time (between smoky breaths from
> the California fires) and I have a question. What is the exact
> structure of outer_phi?
>
> 1. At first it is a clone of phi, its region patched and the other edges the same:
>
> outer_phi := Phi(outer_head, init, AddL(phi, stride))
>
> 2. After long_loop_replace_long_iv, the interior phi links it back to itself:
>
> outer_phi := Phi(outer_head, init, AddL(AddL(inner_phi, outer_phi), stride))
>
> The thing I’m suspicious of (as fragile code) is the retention of the addend
> ‘stride’. The inner loop (on ‘inner_phi’) *also* adds the stride. What
> prevents there from being a superfluous number of strides added?
>
> I suppose the answer is that the ending output of the inner loop produces
> the post-incremented value as (inner_incr + outer_phi), while the
> pre-incremented value is (inner_phi + outer_phi), which is *always*
> short of the final count by the stride; thus it’s OK to add single missing
> stride in the outer loop.
>
> But, this seems fragile to me. Would it not be safer to make the outer
> phi just copy the final inner IV value (the one that fails the loop’s test)?
> So:
>
> outer_phi := Phi(outer_head, init, AddL(inner_incr, outer_phi))
>
> I’m worried that some edge-case of loop loop might actually miss
> a stride unless the outer phi has the latter, dead-simple form.
>
> In particular, there are loops where there is only one IV (no separate
> incr). I’m not confident that those loops will work correctly; it seems
> to me that the existing ‘outer_phi’, with its extra stride addend,
> may well contribute an unwanted extra step, when the inner_phi
> is post-incremented.
>
> As a related issue, I think the pseudocode comment at the top is
> false, with the same problem. It should probably not say this:
>
> // L1: for (long phi1 = init; phi1 < limit; phi1 += stride) {
> // // phi1 := Phi(L1, init, phi1 + stride)
>
> but rather this:
>
> // L1: for (long phi1 = init; phi1 < limit; phi1 += phi2) {
> // // phi1 := Phi(L1, init, phi1 + phi2)
>
> This sort of bug will show up if (a) we test long loops with
> large trip counts, and (b) also use the stress mode which
> makes the outer loop trip two or three times, and finally
> (c) we get several kinds of loops; ones with and without
> phi == incr, and with and without ‘limit_check_required’,
> and with each kind of possible termination condition
> (< <= > >=).
>
> — John
>
> P.S. I came to this question while working through the transform
> logic on pseudocode. Here it is, for reference. It think it might
> make a good diagram to place in the code, just before the comment
> that says “Peel one iteration”.
== old IR nodes =>
entry_control: {...}
x:
for (long phi = init;;) {
// phi := Phi(x, init, phi + stride)
exit_test:
if (phi < limit)
back_control: fallthrough;
else
exit_branch: break;
// test happens after increment => phi == phi_incr != NULL
long incr = (phi + stride);
... use phi and incr ...
phi = incr;
}
== new IR nodes (before final peel) =>
entry_control: {...}
long adjusted_limit = limit + stride; //because phi_incr != NULL
assert(!limit_check_required || (extralong)limit + stride == adjusted_limit); // else deopt
ulong inner_iters_limit = max_jint - ABS(stride) - 1; //near 0x7FFFFFF0
outer_head:
for (long outer_phi = init;;) { //phi->clone(), in(0):=outer_head
// outer_phi := Phi(outer_head, init, inner_phi, phi=>(outer_phi+inner_phi) + stride)
// >>> ISSUE: is the extra '+ stride' here always harmless? <<<
ulong inner_iters_max = (ulong) MAX(0LL, ((extralong)adjusted_limit - outer_phi) * SGN(stride));
int inner_iters_actual_int = (int) MIN(inner_iters_limit, inner_iters_max) * SGN(stride);
inner_head: x: //in(1) := outer_head
for (int inner_phi = 0;;) {
// inner_phi := Phi(x, intcon(0), inner_phi + stride)
int inner_incr = inner_phi + stride;
bool inner_bol = (inner_incr < inner_iters_actual_int);
exit_test: //exit_test->in(1) := inner_bol;
if (inner_bol) // WAS (phi < limit)
back_control: fallthrough;
else
inner_exit_branch: break; //exit_branch->clone()
// REPLACE phi => (outer_phi+inner_phi)
// REPLACE incr => (outer_phi+inner_incr)
... use phi=>(outer_phi+inner_phi) and incr=>(outer_phi+inner_incr) ...
inner_phi = inner_phi + stride; // inner_incr
}
outer_exit_test: //exit_test->clone(), in(0):=inner_exit_branch
if ((outer_phi+inner_phi) < limit) // WAS (phi < limit)
outer_back_branch: fallthrough; //back_control->clone(), in(0):=outer_exit_test
else
exit_branch: break; //in(0) := outer_exit_test
}
More information about the hotspot-compiler-dev
mailing list