RFR(M): 8223051: support loops with long (64b) trip counts

Tue Aug 25 05:23:27 UTC 2020

On Aug 21, 2020, at 12:43 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> For the record, I've tested tier1-9 with "default" flags and tier1-5 with
> -XX:StressLongCountedLoop=1 and -XX:StressLongCountedLoop=4294967295.
> 
> Please let me know if you think other flag combinations/values should be tested as well.

Those settings force iters_limit (normally 2^31-2) to be either
preserved at  2^31-2 or reset to 0, respectively.  The latter value
is not very useful, since the transform will bail out for trip counts
of 1 or 0.  I suggest aiming for StressLongCountedLoop values
which get inner loop trip counts that are a balance between
two concerns:  (a) large enough so that the inner loop makes
a non-trivial number of trips, and (b) small enough so the
*outer* loop makes a non-trivial number of trips.

Concern (a) lets us to exercise further optimizations on the
inner loop such as unrolling, peeling, and RCE.  Concern (b)
helps us be sure that back edge of the outer loop performs the
right register moves, even if the inner loop is very complex
and has many exit points.  If we don’t worry about (a) we
could mask bugs in the transformed inner loop (unlikely,
but possible).  If we don’t worry about (b) we could be
ignorant about what happens when the outer loop runs
the second time (or third, after peeling).

For (a) we want an iters_limit on the order of 100 or more,
while for (b) we want an iters_limit large enough that many
tests (each loop of which has its own characteristic trip
count) will run the outer loop three or more times.  Tests
which intentionally warm up loops go for a *cumulative*
trip count of 20,000 or so, but the individual trip counts
can vary widely.  As a wild guess, I’ll say that many tests
will run 100 or more times, which means we want an
iter_limit of 300 or more.

To derive a StressLongCountedLoop parameter X from a
desired iter_limit, ensure that floor((2^31-2)/X)  is close to
the target iter_limit.  So, I recommend a value of
StressLongCountedLoop which is at most 21400000
(for an iters_limit of at least 100), and another which
is at least 7150000 (for an iters_limit of at most 300).

Putting these together, and choosing a round number
which prioritizes concern (b) by moving closer to the
limit of (a), if I had one more run to do I’d choose
-XX: StressLongCountedLoop=20000000.

If I were to do multiple runs, I might choose vary that
stress parameter by adding and subtracting a couple
of zeroes:

-XX: StressLongCountedLoop=200000
-XX: StressLongCountedLoop=2000000
-XX: StressLongCountedLoop=20000000
-XX: StressLongCountedLoop=200000000
-XX: StressLongCountedLoop=2000000000

If any of those runs kicks out a bug or other suspicious behavior,
it should be added to a permanent test list.

Separately from those issues, we know that the stress mode
converts 32-bit loops into 64-bit loops, which then re-nest
using the new logic.  But, are we confident that this re-nesting
works?  Roland did some manual testing to make sure the
test works as intended, but it would be good to run the above
stress tests with some sort of logging that ensures that there
are at least “lots and lots” of successful 32-to-64 loop conversions.
If those loop conversions fail (staying at 64 bits) the tests will
pass, but they won’t be testing what we need to be testing.

HTH

— John

> Best regards,
> Tobias
> 
> On 20.08.20 17:34, Roland Westrelin wrote:
>> 
>>> Yes, webrev.03 looks good to me. I've re-run extended testing and the results look good.
>> 
>> Thanks for the review and testing!
>> 
>> Roland.
>>