RFR: 8303466: C2: failed: malformed control flow. Limit type made precise with MaxL/MinL [v4]

Fri Apr 21 13:00:49 UTC 2023

On Mon, 17 Apr 2023 19:55:52 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> @jaskarth I think your issues are not related, though I can look at them again once I get back to IGVN verification.
>> 
>> @vnkozlov I thought about it a bit more. With a simple example like `Test::test`, I get unrolling `2048`, so we unroll 10-ish times. I see accordingly many `ConvI2L, SubL, MaxL, ConvL2I` nodes. Now, I can collapse the `ConvL2I -> ConvI2L` parts (the types guarantee that we never leave the `int` range, so conversion never clips anything), so it is only a chain of `SubL -> MaxL` nodes.
>> 
>> One idea would be to fold `SubL -> MaxL -> SubL -> MaxL` down to a single subtraction and maximum. Maybe that could be done, we just have to be very careful with the types. **I'll give it a try**, and it seems to work on a basic example.
>> 
>> The example:
>> 
>> `./java -Xcomp -XX:CompileCommand=compileonly,Test::test -XX:CompileCommand=printcompilation,Test::test -XX:+TraceLoopOpts -XX:+PrintIdeal Test.java`
>> 
>> 
>> public class Test {
>>     static int START = 0;
>>     static int FINISH = 512;
>>     static int RANGE = 512;
>> 
>>     public static void main(String args[]) {
>>         byte[] data = new byte[RANGE];
>>         test(data);
>>     }
>> 
>>     public static void test(byte[] data) {
>>         for (int j = START; j < FINISH; j++) {
>>             data[j] = (byte)(data[j] * 11);
>>         }
>>     }
>> }
>> 
>> 
>> What to do with this?
>>  - Performance testing did not show any difference. But maybe we do not trust that enough.
>>  - Before and now, the chain of unrolling-limits can be interrupted by range-check limits. We probably will just accept that this means that not all of the unrolling-limits can be folded together.
>> 
>> **I have an alternative proposal:**
>> Leave the `MaxL/MinL` node for the range-check limits, there are usually not that many RC-limits, and up to now we used a `CMove` node per such limit already anyway.
>> 
>> But for the unroll-limits, we introduce a `SubINoUnderflow` node, which does a safe (no-underflow) subtraction `limit-stride`.
>> These nodes can be folded together relatively easily.
>> I already had such an implementation before, and reverted it https://github.com/openjdk/jdk/pull/13269/commits/f5fcf6084a2446876ba2a85907a2991ef4c705b7
>> I had already discussed this idea with @chhagedorn a while ago. But then decided against it once I also saw that I wanted a unified solution for RC-limits and unroll-limits. The downside is that it takes a new special node.
>> 
>> With this `SubINoUnderflowNode` idea, we would have a constant number of nodes added per RC-limit. And then for all the unroll limit adjustments together, we would only have one `SubINoUnderflow` node, as they would all collapse into one. At macro expansion, I can then expand it into a single CMove node.
>> 
>> But I think I can do the same with just collapsing `SubL -> MaxL -> SubL -> MaxL` to `SubL -> MaxL`. That may be cleaner.
>> 
>> @vnkozlov What do you think? Do you have any other ideas? What solution would you prefer?
>
>> But I think I can do the same with just collapsing `SubL -> MaxL -> SubL -> MaxL` to `SubL -> MaxL`. That may be cleaner.
> 
> I prefer this if you can do it. So you have sequence (after folding `Conv` nodes) 
> 
> MaxL(SubL(MaxL(SubL(limit, stride), min_int), stride*2), min_int);
> 
> 
> Yes, I think it can be collapsed to:
> 
> MaxL(SubL(limit, stride*3), min_int);
> 
> 
> If in any point of chain `limit` become `min_int` it will stay `min_int`  (even if `stride` is `max_int`) because you use Long arithmetic and we have "small" limit on unrolling (16?).
> If it does not hit min_int the result it similar to SubL(SubL((limit, stride), stride*2).
> So you just need to correctly collect `stride*N` values.

@vnkozlov I added some more IGVN optimizations that help to fold the `SubL -> MaxL` chains.

1. `fold_subI_no_underflow_pattern` in `MaxLNode::Ideal`. Collapses `SubL -> MaxL->SubL -> MaxL` to a simple `SubL -> MaxL`.  

2. `ConvI2LNode::Identity` can now convert `I2L(L2I(x))` => `x`. We need this, so that the Casts are not in the way for the first optimization.

I added verification, that these optimizations are really taken:
https://github.com/openjdk/jdk/blob/33e1ad54d0322bdbc50c85fbea6f1ada0963f3ef/test/hotspot/jtreg/compiler/loopopts/TestLoopLimitSubtractionsCollapse.java#L50-L68

Is this now ok?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13269#issuecomment-1517798221