RFR: 8324655: Identify integer minimum and maximum patterns created with if statements [v3]

Tue Feb 27 10:41:44 UTC 2024

On Mon, 26 Feb 2024 16:13:57 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Thanks for looking at this again @eme64 :)
>> 
>>> Just FYI: I am slowly refactoring SuperWord / AutoVectorizer, and my hope is that I could eventually do if-conversion for loops. That means I would convert CFG to CMove (with min/max as special cases). But that is probably many months in the future. If we can vectorize, then CMove and min/max is often more clearly profitable: we probably execute both branches anyway. For straight-line code, we have tradeoffs with latency vs misprediction costs.
>> 
>> This is really exciting! It would be really nice to eventually do this transform with knowledge of nearby vectorization opportunities, to prioritize transforming `CMove`s and min/max that would benefit vectorization directly.
>> 
>>> @jaskarth can you construct a benchmark with `a (a cmp b) ? a : b` where you now convert to min/max, and we very often take the `a` branch, and where the `b` value is quite expensive, and the `a` value is very cheap? 
>> 
>> I think this would be an interesting experiment, I'll try to design a benchmark for this and get back with the results.
>> 
>>> @jaskarth what would it take to do the same for `float/double`? Maybe it is not so easy, I don't remember exactly, but the comparators may have a different semantics compared to min/max, especially for NaN values. But it would be nice if you explained this here.
>> 
>> Indeed, the trouble comes with `NaN` and negative zero. If we have `Math.min(1.0, NaN)` and `if_min(1.0, NaN)` (representing a min created with a branch statements), then `Math.min(1.0, NaN) == NaN` but `if_min(1.0, NaN) == 1.0`. Similarly for negative zeroes, `Math.min(0.0, -0.0) == -0.0` but `if_min(0.0, -0.0) == 0.0`. Theoretically, at least on x86 it would be simpler to encode the `if_min` behavior as we would only need to output the `vminsd` without the blends and compares needed to encode the NaN and -0.0 behavior, but we would need new nodes to encode the different semantics. It didn't seem worth it to create (and eventually maintain) those floating point min/max nodes with slightly different semantics so I limited the transform to only integers, as the semantics are identical there.
>
> @jaskarth 
>> I think this would be an interesting experiment, I'll try to design a benchmark for this and get back with the results.
> 
> Thanks, that would be excellent. This is to make sure that there is no regression. And if there is one, we'd have to see what to do about it.
> 
> And thanks for the explanation around `float/double`. Yes, we would probably need additional nodes, or somehow give additional options for the existing ones. But that is out of scope for this RFE 😊

@eme64 Thanks for your explanation, indeed that makes sense as a reason to limit this to `Min`/`Max` nodes only.

@jaskarth Is there anything preventing the baseline from transforming the patterns into `CMove`s. If no then what is the benefit of looking at the original control flow instead of the transformed 'CMove`?

Thanks very much.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17574#issuecomment-1966261039