RFR: 8324655: Identify integer minimum and maximum patterns created with if statements [v3]

Tue Mar 5 11:17:47 UTC 2024

On Tue, 5 Mar 2024 04:08:19 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

>> @jaskarth 
>>> With a bit of further reflection on all this, I think it might be best if this patch was changed so that it acts on CMove directly
>> 
>> You mean you would be matching for a `Cmp -> CMove` node pattern that is equivalent for `Min/Max`, rather than matching a `Cmp -> If -> Phi` pattern? I guess that would allow you to get better types, without having to deal with all the CMove-vs-branch-prediction heuristics.
>> 
>> BTW, I watched a fascinating talk about branch-predictors / branchless code yesterday:
>> `Branchless Programming in C++ - Fedor Pikus - CppCon 2021`
>> https://www.youtube.com/watch?v=g-WPhYREFjk
>> 
>> My conclusion from that: it is really hard to say ahead of time if the branch-predictor is successful. It depends on how predictable a condition is. The branch-predictor can see patterns (like alternating true-false). So even if a probability is 50% on a branch, it may be fully predictable, and branching code is much more efficient than branchless code. But in totally random cases, branchless code may be faster because you will have a large percentage of mispredictions, and mispredictions are expensive. But in both cases you would see `iff->_prob = 0.5`.
>> Really what we would need is profiling that checks how much a branch was `mispredicted`, and not how much it was `taken`. But not sure if we can even get that profiling data.
>
>> You mean you would be matching for a `Cmp -> CMove` node pattern that is equivalent for `Min/Max`, rather than matching a `Cmp -> If -> Phi` pattern?
> 
> Yeah, I was thinking it might be better to let the CMove transform happen first, since the conditions guarding both transforms are aiming to do the same thing in essence. My thought was that if the regression in your `testCostDifference` was fixed, it would be better to not have to do that fix in two different locations, since it impacts `is_minmax` as well.
> 
>> BTW, I watched a fascinating talk about branch-predictors / branchless code yesterday
> 
> Thank you for linking this talk, it was really insightful! I also wonder if it would be possible to capture branch execution patterns somehow, to drive branch flattening optimizations. I figure it could be possible to keep track of the sequence of a branch's history of execution, and then compute some "entropy" value from that sequence to determine if there's a pattern, or if it's random and likely to be mispredicted. However, implementing that in practice sounds pretty difficult.
> 
> @eme64 I've pushed a commit that fixes the benchmarks and sets the loop iteration count to 10_000. Could you check if this lets it vectorize on your machine? Thanks!

@jaskarth Why don't you first make the code change with starting from a `Cmp -> CMove` pattern rather than the `Cmp -> If -> Phi` pattern. Then I can look at both things together ;)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17574#issuecomment-1978523865