RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11]

Thu Feb 20 11:03:58 UTC 2025

On Tue, 18 Feb 2025 08:43:38 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> To make it more explicit: implementing long min/max in ad files as cmp will likely remove all the 100% regressions that are observed here. I'm going to repeat the same MinMaxVector int min/max reduction test above with the ad changes @rwestrel suggested to see what effect they have.
>
> @galderz I think we will have the same issue with both `int` and `long`: As far as I know, it is really a difficult problem to decide at compile-time if a `cmove` or `branch` is the better choice. I'm not sure there is any heuristic for which you will not find a micro-benchmark where the heuristic made the wrong choice.
> 
> To my understanding, these are the factors that impact the performance:
> - `cmove` requires all inputs to complete before it can execute, and it has an inherent latency of a cycle or so itself. But you cannot have any branch mispredictions, and hence no branch misprediction penalties (i.e. when the CPU has to flush out the ops from the wrong branch and restart at the branch).
> - `branch` can hide some latencies, because we can already continue with the branch that is speculated on. We do not need to wait for the inputs of the comparison to arrive, and we can already continue with the speculated resulting value. But if the speculation is ever wrong, we have to pay the misprediction penalty.
> 
> In my understanding, there are roughly 3 scenarios:
> - The branch probability is so extreme that the branch predictor would be correct almost always, and so it is profitable to do branching code.
> - The branching probability is somewhere in the middle, and the branch is not predictable. Branch mispredictions are very expensive, and so it is better to use `cmove`.
> - The branching probability is somewhere in the middle, but the branch is predictable (e.g. swapps back and forth). The branch predictor will have almost no mispredictions, and it is faster to use branching code.
> 
> Modeling this precisely is actually a little complex. You would have to know the cost of the `cmove` and the `branching` version of the code. That depends on the latency of the inputs, and the outputs: does the `cmove` dramatically increase the latency on the critical path, and `branching` could hide some of that latency? And you would have to know how good the branch predictor is, which you cannot derive from the branching probability of our profiling (at least not when the probabilities are in the middle, and you don't know if it is a random or predictable pattern).
> 
> If we can find a perfect heuristic - that would be fantastic ;)
> 
> If we cannot find a perfect heuristic, then we should think about what are the most "common" or "relevant" scenarios, I think.
> 
> But let's discuss all of this in a call / offline :)

FYI @eme64 @chhagedorn @rwestrel 

Since we know that vectorization does not always kick in, there was a worry if scalar fallbacks would heavily suffer with the work included in this PR to add long intrinsic for min/max. Looking at the same scenarios with int (read my comments https://github.com/openjdk/jdk/pull/20098#issuecomment-2669329851 and https://github.com/openjdk/jdk/pull/20098#issuecomment-2671144644), it looks clear that the same kind of regressions are also present there. So, if those int scalar regressions were not a problem when int min/max intrinsic was added, I would expect the same to apply to long.

Re: https://github.com/openjdk/jdk/pull/20098#issuecomment-2671144644 - I was trying to think what could be causing this. I thought maybe it's due to the int min/max backend, which is implemented in platform specific way, vs the long min/max backend which relies on platform independent macro expansion. But if that theory was true, I would expect the same behaviour with int max vs long max, but that's not the case. It seems odd to only see this difference with min.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2671163220