RFR: 8259609: C2: optimize long range checks in long counted loops [v3]
Roland Westrelin
roland at openjdk.java.net
Thu Mar 4 13:18:40 UTC 2021
On Mon, 1 Feb 2021 08:54:49 GMT, John R Rose <jrose at openjdk.org> wrote:
>> @TobiHartmann thanks for the comments. I pushed a change that should address them.
>
> Partial review:
> Yes, good cleanups along with great new functionality for big data! I
> like the refactorings which use the new generic factory methods from
> JDK-8255150.
>
> The comment in `PhaseIdealLoop::transform_long_range_checks` is very
> good.
>
> PhaseTransform::integercon has a bug left over from JDK-8256655,
> the cast in `assert(((long)int_con) == l, "not an int")` is wrong.
> Just use `intcon(checked_cast<jint>(l))` as you already do elsewhere.
>
> (I also noticed also that `PhaseIdealLoop::exact_limit` and
> `LoopLimitNode::value` have similar asserts which could be replaced
> with `checked_cast`. In fact, `PhaseIdealLoop::exact_limit` is a
> duplicated from `LoopLimitNode::value` just to avoid creating a node,
> yuck. That fiddly stuff should have been written as few times as
> possible. I suggest putting it into a static inline in the class
> `LoopLimitNode`, where it will also serve as documentation.)
>
> typo: /had a change to be hoisted yet/s/change/chance/
>
> Also there are unusually long comment lines there and inside
> `transform_long_range_checks`. 130 columns plus is longer than my
> widest laptop window holds unless I aggravate my presbyopia with
> a smaller font size. Maybe, consider reflowing? I know some
> of those lines are intrinsically long.
>
> This variable is unused: `Node* long_stride = head->stride();`
>
> The argument `signed_int` is unused in `MulNode::operates_on`.
>
> Well done adapting `is_scaled_iv` and friends to T_LONG!
>
> The logic for turning a shift amount into a scale factor looks
> flawed, at least if `shift_amount` is ever out of range for the
> corresponding basic type. I suggest being safe rather than
> sorry: `shift_amount &= (bt == T_INT ? 31 : 63); //clip`
> (There are about 3 possible bugs with out-of-range shifts.)
> Even better: Use both overloadings of `java_shift_left`
> from `globalDefinitions.hpp`; I think that would be the
> best practice.
>
>> I realize
> there's some code duplication but I didn't see a way to share logic
> between IdealLoopTree::may_have_range_check()
> IdealLoopTree::policy_range_check() that would feel right.
>
> I did a direct `diff -w` of the two function bodies and they are
> so similar that I think it is better to maintain them as one copy
> of the code. I suggest adding a boolean flag `provisional`
> that activates the changed bits of logic:
>
> bool IdealLoopTree::policy_range_check(PhaseIdealLoop *phase, bool provisional = false) const {
> + if (_head->is_CountedLoop()) {
> CountedLoopNode *cl = _head->as_CountedLoop();
> ...existing stuff about unroll_only etc...
> + }
> + BaseCountedLoopNode *cl = _head->as_BaseCountedLoop();
> Node *trip_counter = cl->phi();
> + BasicType bt = cl->bt();
> for (...) {
> ...
> Node *cmp = bol->in(1);
> Node *rc_exp = cmp->in(1);
> Node *limit = cmp->in(2);
> + if (provisional) {
> + // Try to pattern match with either cmp inputs, do not check whether one of the
> + // inputs is loop independent as it may not have had a chance to be hoisted yet.
> + if (!phase->is_scaled_iv_plus_offset(rc_exp, trip_counter, NULL, NULL, bt) &&
> + !phase->is_scaled_iv_plus_offset(limit, trip_counter, NULL, NULL, bt)) {
> continue;
> + } else { // check loop independence if non-provisional
> Node *limit_c = phase->get_ctrl(limit);
> ...existing checks & swap of rc_exp and limit...
> if (!phase->is_scaled_iv_plus_offset(rc_exp, trip_counter, NULL, NULL)) {
> continue;
> }
> + }
> if (is_loop_exit(iff)) {
> // Found valid reason to split iterations (if there is room).
> // NOTE: Usually a gross overestimate.
> return provisional || phase->may_require_nodes(est_loop_clone_sz(2));
> }
> }
> }
>
> Now for the biggest part: You are replacing long range checks with
> 32-bit range checks which then can be RCE-ed. You dial back the
> strip-mining count as needed to make sure the various scaled index
> computations don't suffer overflow. Very clever!
>
> I suggest placing `extract_long_range_checks` before
> `transform_long_range_checks` because that's the order they are
> executed. That way the two can be read more easily in sequence.
>
> The variable `jlong stride_con` should be `int`. It is never used
> for anything bigger, or else an assert would fire.
>
> IMO the expression `scale > 0 != stride_con > 0` needs parens around
> each of the two comparisons. This is a case of relying on obscure
> C language precedence rules. (I had to look it up myself!)
>
> This group of definitions is loop-invariant, and should probably
> be set up once outside of the for-loop:
>
> // Compute lower and upper
> Node* last = new LoopLimitNode(C, int_zero, inner_iters_actual_int, int_stride);
> register_new_node(last, entry_control);
> last = new SubINode(last, int_stride);
> register_new_node(last, entry_control);
>
> Node* lower = outer_phi;
> Node* upper = new ConvI2LNode(last);
> register_new_node(upper, entry_control);
> upper = new AddLNode(lower, upper);
> register_new_node(upper, entry_control);
>
> In `new AddLNode(upper, long_one)`, I think `long_one` should
> have the same sign as the stride. But I'm not sure (yet).
>
> You have an expression for accumulating the maximum scale:
> `max_scale = MAX2(max_scale, scale)`. I think that the
> accumulated value should not be negative, so you probably
> want `max_scale = MAX2(max_scale, ABS(scale))`.
>
> But, I suggest getting rid of `max_scale` altogether.
> The logic will be simpler if it works like this:
>
> jlong original_iters_limit = iters_limit;
> jlong reduced_iters_limit = iters_limit;
> for (...) {
> ... jlong new_limit = original_iters_limit / ABS(scale * stride_con)
> if (new_limit >= min_iters) { // ??? or > ???
> reduced_iters_limit = MIN2(reduced_iters_limit, new_limit);
> range_checks.push(c);
>
> }
> }
> return checked_cast<int>(reduced_iters_limit);
>
> If you do this you won't need that delicate assert at the end.
>
> More later. I'm still working through the core logic in
> `transform_long_range_checks`, and I will propose more changes
> there.
comment to keep the PR open
-------------
PR: https://git.openjdk.java.net/jdk/pull/2045
More information about the hotspot-compiler-dev
mailing list