RFR(M): 8244504: C2: refactor counted loop code in preparation for long counted loop

Thu May 21 20:09:33 UTC 2020

On May 13, 2020, at 7:48 AM, Roland Westrelin <rwestrel at redhat.com> wrote:
> ...
> After spending some time trying to get this to work, I have to admit my
> initial logic was very much wrong and it's a lot less straightforward to
> get this to work in a way that I'm confident about than I thought.

Thanks for pushing it through.  I think this will be very useful
for improving loops over jumbo data structures, such as those
that could bye accessed via the MemorySegment API (JEP 383).

> 
>> Another way to deal with that, would be to pass the expected result type
>> as argument to unsigned_min() rather than compute it.
> 
> So I went that way and passes the type as an argument to the min/max
> methods in case the caller knows something about the result:
> 
> http://cr.openjdk.java.net/~roland/8244504/webrev.01/

A few comments:

In C2 whenever a TypeNode is created, it is given a type you
the programmer know is “as good as possible for now”, and
then maybe it narrows some more later.  Often that type
is redundant.  Case in point:  the new call to  set_type on
inner_iters_actual_int the TypeInt should really be derived
in a Value call the sees the earlier TypeLong, not made from
scratch.  I wish there were a way to do this less manual
repetition, because bugs can hide there.  I think this
kind of repetition is endemic to C2, so I don’t have any
more to say about this than to complain.  I sort of want
a function called PhaseTransform::set_type_Value
which starts from set_type_bottom but then sharpens
it once, by calling Value (on the assumption that the
nearby nodes already have a phase-assigned type).
Uses of register_new_node_with_optimizer are places
where such a function might reduce redundancy.

The assigned types in build_min_max* are also provided
“by fiat” rather than by local inference, and it’s harder to
prove correct because of the way the code is factored (but
thank you for doing that!).  I really like your asserts that
test the “fiat type” against the “physics” of the node being
created; I mean those that say “type doesn’t match”.
Would it make sense to add more such asserts on the
other branches?  You’d need a work-alike for the Value
method of MaxINode for all the other cases, so (sigh)
maybe it’s too much to squeeze into this work.  The
lack of good asserts here feels like tech. debt, though.
Would you mind filing a suitable bug, if you agree with
me?  Perhaps the right answer is to move the GraphKit
functions (one more time) onto polymorphic factory
methods MaxNode::make and MinNode::make, styled
like LoadNode::make.  Then any fiddly min/max
interval logic (for assertion checks) can go where
it really belongs, in the same file as MaxINode::Value.
(Which happens to be addnode.cpp, but whatever.)
Also MinNode::make_zero_clipped_difference.
With an optional unsigned flag.

Another reason I’m thinking this could make sense
is that, while I thought GraphKit was a good place
for the new functions, the _gvn instance isn’t a
GK, so you had to make it a static method on GK
that takes _gvn instead of “this”.  That’s a smell,
isn’t it?  (My bad.)  I think a polymorphic factory
method or three on MaxNode/MinNode would
work better.  That would also allow future C2
work to fill out the set of min and max operations
that get handled by intrinsics, with little further
disturbance.

The logic of check_stride_overflow is now clear
as crystal, where it used to be very muddy indeed.

Thanks for adding the extra comments.

I looked for a while at the 32-bit and 64-bit versions
of the counted loop pattern match, and thought about
whether they could be brought even closer together,
perhaps making the comments more similar, calling
out where there are features or limitations on one but
not the other, but gave up.  Further unifying the loop
detection would be a larger task, and perhaps not
worth the effort.

— John