RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12]
Galder Zamarreño
galder at openjdk.org
Wed Feb 26 18:33:03 UTC 2025
On Wed, 26 Feb 2025 11:32:57 GMT, Galder Zamarreño <galder at openjdk.org> wrote:
> > That said: if we know that it is only in the high-probability cases, then we can address those separately. I would not consider it a blocking issue, as long as we file the follow-up RFE for int/max scalar case with high branch probability.
> > What would be really helpful: a list of all regressions / issues, and how we intend to deal with them. If we later find a regression that someone cares about, then we can come back to that list, and justify the decision we made here.
>
> I'll make up a list of regressions and post it here. I won't create RFEs for now. I'd rather wait until we have the list in front of us and we can decide which RFEs to create.
Before noting the regressions, it's worth noting that PR also improves performance certain scenarios. I will summarise those tomorrow.
Here's a summary of the regressions
### Regression 1
Given a loop with a long min/max reduction pattern with one side of branch taken near 100% of time, when Supeword finds the pattern not profitable, then HotSpot will use scalar instructions (cmov) and performance will regress.
Possible solutions:
a) make Superword recognise these scenarios as profitable.
### Regression 2
Given a loop with a long min/max reduction pattern with one side of branch near 100% of time, when the platform does not support vector instructions to achieve this (e.g. AVX-512 quad word vpmax/vpmin), then HotSpot will use scalar instructions (cmov) and performance will regress.
Possible solutions
a) find a way to use other vector instructions (vpcmp+vpblend+vmov?)
b) fallback on more suitable scalar instructions, e.g. cmp+mov, when the branch is very one-sided
### Regression 3
Given a loop with a long min/max non-reduction pattern (e.g. `longLoopMax`) with one side of branch taken near 100% of time, when the platform does not vectorize it (either lack of CPU instruction support, or Superword finding not profitable), then HotSpot will use scalar instructions (cmov) and performance will regress.
Possible solutions:
a) find a way to use other vector instructions (e.g. `longLoopMax` vectorizes with AVX2 and might also do with earlier instruction sets)
b) fallback on more suitable scalar instructions, e.g. cmp+mov, when the branch is very one-sided,
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2685865807
More information about the core-libs-dev
mailing list