RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer
Daohan Qu
duke at openjdk.org
Thu Jun 8 07:40:59 UTC 2023
On Wed, 7 Jun 2023 14:31:19 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> This patch should fix [JDK-8309266](https://bugs.openjdk.org/browse/JDK-8309266). A `jtreg` test is also added. I'd appreciate any comments and reviews. Thanks in advance!
>>
>> ## Problem Analysis
>>
>> For the following program,
>>
>> public class Test {
>>
>> static boolean flag;
>>
>> public static void main(String[] args) {
>> for (int i = 0; i < 10000; i++) {
>> flag = !flag;
>> test();
>> }
>> }
>>
>> public static void test() {
>> int limit = flag ? Integer.MAX_VALUE : 1000;
>>
>> int i = 0;
>> while (i < limit) {
>> i += 3;
>> if (flag) {
>> return;
>> }
>> }
>> }
>> }
>>
>> A `LoopLimitNode` will be generated and its `Limit` input is a `PhiNode`, as depicted in the following picture.
>>
>> <img width="384" alt="phi_as_limit" src="https://github.com/openjdk/jdk/assets/18374295/c561bb09-9789-45c2-8fa8-5bc690ea8046">
>>
>> During `PhaseCCP`, the `LoopLimitNode::Value()` tries to calculate the constant final value:
>> https://github.com/openjdk/jdk/blob/16ebf47fe3b0fac7b67acfa589a26abf8843306b/src/hotspot/share/opto/loopnode.cpp#L2289-L2301
>>
>> The problem is that the assertion in `line 2299` could fail during CCP though it must hold true at the end of CCP. Here is the reason: `PhaseCCP` initializes all nodes with the type `TOP` and iterates in an "arbitrary" order. The following order may happen:
>>
>> 28 IfTrue => 34 Region => 36 Phi => 195 LoopLimit => ... => 29 IfFalse
>>
>> 1. In `ProjNode::Value()` (`IfTrue` inherits it), the type of `IfTrue` is set to `Type::CONTROL`
>> https://github.com/openjdk/jdk/blob/fa791119f0b73cd1e110d6a62d3bed58fee5740a/src/hotspot/share/opto/multnode.cpp#L168-L171
>>
>> 2. In `PhiNode::Value()`, only `28 IfTrue`'s correspondence `33 ConI` gets merged (as `29 IfFalse` has not been dealt with yet), then it has a value of `int:max`.
>> https://github.com/openjdk/jdk/blob/fa791119f0b73cd1e110d6a62d3bed58fee5740a/src/hotspot/share/opto/cfgnode.cpp#L1269-L1277
>>
>> 3. In `LoopLimitNode::Value()`, it finds its `Limit` input `36 Phi` is constant, which triggers the assertion, and the assertion fails since the final value calculated from that constant limit (`int:max`) overflows.
>>
>> ## Solution
>> Move the overflow check to the end of CCP, where it must not fail.
>
> @quadhier Thanks for looking into this! This but is currently not assigned to you. Please always make sure that you have it assigned to you, or at least mention in JIRA that you are working on it. Currently, @enothum had it assigned and was also working on it.
>
> The regression test cleanly reproduces before the patch, good.
>
> Why is the overflow acceptable? Does that not mean that the calculation did something wrong?
>
> In the example, we have
>
> init_con = 3
> limit_con = 2147483647 = max_jint
> stride_con = 3
> stride_m = stride_con - 1 = 2
> trip_count = (limit_con - init_con + stride_m)/stride_con = 715827882
> final_con = init_con + stride_con*trip_count = 2147483649 = max_jint + 2 (overflow!)
> final_int = -2147483647 (overflow!)
>
>
> Does that not mean that we mis-calculated the `trip_count`? If it was 1 less, we would not have an overflow. Would that not fix the issue in a simpler way? Or did I get something wrong?
>
> Let's expand the formula:
>
> final_con = init_con + stride_con*trip_count
> final_con = init_con + stride_con * ((limit_con - init_con + stride_m) / stride_con)
> final_con = init_con + stride_con * ((limit_con - init_con + stride_con - 1) / stride_con)
>
>
> Is the issue not that instead of coming up with a final value that is slightly below `limit_con`, we come up with one that is slightly above `limit_con`, and can thus overflow?
>
> Would this be correct instead (for positive stride)?
>
> final_con = init_con + stride_con * ((limit_con - init_con + 1 - stride_con) / stride_con)
>
>
> Could the limit type ever overflow at runtime? Does the loop limit check not prevent that (unsure)?
>
> Can you explain again why exactly we calculate what we calculate here, and why that is correct?
This patch causes many compilation timeouts. I'd close this PR and wait for a better fix. Thanks for your review and suggestion @eme64 !
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14353#issuecomment-1582057788
More information about the hotspot-compiler-dev
mailing list