RFR: 8259609: C2: optimize long range checks in long counted loops [v3]
John R Rose
jrose at openjdk.java.net
Mon Feb 1 08:57:43 UTC 2021
On Wed, 27 Jan 2021 08:32:42 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> I did a first pass over the changes and added some comments but I need more time to review.
>
> @TobiHartmann thanks for the comments. I pushed a change that should address them.
Partial review:
Yes, good cleanups along with great new functionality for big data! I
like the refactorings which use the new generic factory methods from
JDK-8255150.
The comment in `PhaseIdealLoop::transform_long_range_checks` is very
good.
PhaseTransform::integercon has a bug left over from JDK-8256655,
the cast in `assert(((long)int_con) == l, "not an int")` is wrong.
Just use `intcon(checked_cast<jint>(l))` as you already do elsewhere.
(I also noticed also that `PhaseIdealLoop::exact_limit` and
`LoopLimitNode::value` have similar asserts which could be replaced
with `checked_cast`. In fact, `PhaseIdealLoop::exact_limit` is a
duplicated from `LoopLimitNode::value` just to avoid creating a node,
yuck. That fiddly stuff should have been written as few times as
possible. I suggest putting it into a static inline in the class
`LoopLimitNode`, where it will also serve as documentation.)
typo: /had a change to be hoisted yet/s/change/chance/
Also there are unusually long comment lines there and inside
`transform_long_range_checks`. 130 columns plus is longer than my
widest laptop window holds unless I aggravate my presbyopia with
a smaller font size. Maybe, consider reflowing? I know some
of those lines are intrinsically long.
This variable is unused: `Node* long_stride = head->stride();`
The argument `signed_int` is unused in `MulNode::operates_on`.
Well done adapting `is_scaled_iv` and friends to T_LONG!
The logic for turning a shift amount into a scale factor looks
flawed, at least if `shift_amount` is ever out of range for the
corresponding basic type. I suggest being safe rather than
sorry: `shift_amount &= (bt == T_INT ? 31 : 63); //clip`
(There are about 3 possible bugs with out-of-range shifts.)
Even better: Use both overloadings of `java_shift_left`
from `globalDefinitions.hpp`; I think that would be the
best practice.
> I realize
there's some code duplication but I didn't see a way to share logic
between IdealLoopTree::may_have_range_check()
IdealLoopTree::policy_range_check() that would feel right.
I did a direct `diff -w` of the two function bodies and they are
so similar that I think it is better to maintain them as one copy
of the code. I suggest adding a boolean flag `provisional`
that activates the changed bits of logic:
bool IdealLoopTree::policy_range_check(PhaseIdealLoop *phase, bool provisional = false) const {
+ if (_head->is_CountedLoop()) {
CountedLoopNode *cl = _head->as_CountedLoop();
...existing stuff about unroll_only etc...
+ }
+ BaseCountedLoopNode *cl = _head->as_BaseCountedLoop();
Node *trip_counter = cl->phi();
+ BasicType bt = cl->bt();
for (...) {
...
Node *cmp = bol->in(1);
Node *rc_exp = cmp->in(1);
Node *limit = cmp->in(2);
+ if (provisional) {
+ // Try to pattern match with either cmp inputs, do not check whether one of the
+ // inputs is loop independent as it may not have had a chance to be hoisted yet.
+ if (!phase->is_scaled_iv_plus_offset(rc_exp, trip_counter, NULL, NULL, bt) &&
+ !phase->is_scaled_iv_plus_offset(limit, trip_counter, NULL, NULL, bt)) {
continue;
+ } else { // check loop independence if non-provisional
Node *limit_c = phase->get_ctrl(limit);
...existing checks & swap of rc_exp and limit...
if (!phase->is_scaled_iv_plus_offset(rc_exp, trip_counter, NULL, NULL)) {
continue;
}
+ }
if (is_loop_exit(iff)) {
// Found valid reason to split iterations (if there is room).
// NOTE: Usually a gross overestimate.
return provisional || phase->may_require_nodes(est_loop_clone_sz(2));
}
}
}
Now for the biggest part: You are replacing long range checks with
32-bit range checks which then can be RCE-ed. You dial back the
strip-mining count as needed to make sure the various scaled index
computations don't suffer overflow. Very clever!
I suggest placing `extract_long_range_checks` before
`transform_long_range_checks` because that's the order they are
executed. That way the two can be read more easily in sequence.
The variable `jlong stride_con` should be `int`. It is never used
for anything bigger, or else an assert would fire.
IMO the expression `scale > 0 != stride_con > 0` needs parens around
each of the two comparisons. This is a case of relying on obscure
C language precedence rules. (I had to look it up myself!)
This group of definitions is loop-invariant, and should probably
be set up once outside of the for-loop:
// Compute lower and upper
Node* last = new LoopLimitNode(C, int_zero, inner_iters_actual_int, int_stride);
register_new_node(last, entry_control);
last = new SubINode(last, int_stride);
register_new_node(last, entry_control);
Node* lower = outer_phi;
Node* upper = new ConvI2LNode(last);
register_new_node(upper, entry_control);
upper = new AddLNode(lower, upper);
register_new_node(upper, entry_control);
In `new AddLNode(upper, long_one)`, I think `long_one` should
have the same sign as the stride. But I'm not sure (yet).
You have an expression for accumulating the maximum scale:
`max_scale = MAX2(max_scale, scale)`. I think that the
accumulated value should not be negative, so you probably
want `max_scale = MAX2(max_scale, ABS(scale))`.
But, I suggest getting rid of `max_scale` altogether.
The logic will be simpler if it works like this:
jlong original_iters_limit = iters_limit;
jlong reduced_iters_limit = iters_limit;
for (...) {
... jlong new_limit = original_iters_limit / ABS(scale * stride_con)
if (new_limit >= min_iters) { // ??? or > ???
reduced_iters_limit = MIN2(reduced_iters_limit, new_limit);
range_checks.push(c);
}
}
return checked_cast<int>(reduced_iters_limit);
If you do this you won't need that delicate assert at the end.
More later. I'm still working through the core logic in
`transform_long_range_checks`, and I will propose more changes
there.
-------------
PR: https://git.openjdk.java.net/jdk/pull/2045
More information about the hotspot-compiler-dev
mailing list