RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12]

Thu Mar 6 15:26:09 UTC 2025

On Thu, 27 Feb 2025 16:38:30 GMT, Galder Zamarreño <galder at openjdk.org> wrote:

>> Galder Zamarreño has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 44 additional commits since the last revision:
>> 
>>  - Merge branch 'master' into topic.intrinsify-max-min-long
>>  - Fix typo
>>  - Renaming methods and variables and add docu on algorithms
>>  - Fix copyright years
>>  - Make sure it runs with cpus with either avx512 or asimd
>>  - Test can only run with 256 bit registers or bigger
>>    
>>    * Remove platform dependant check
>>    and use platform independent configuration instead.
>>  - Fix license header
>>  - Tests should also run on aarch64 asimd=true envs
>>  - Added comment around the assertions
>>  - Adjust min/max identity IR test expectations after changes
>>  - ... and 34 more: https://git.openjdk.org/jdk/compare/dfbb2ee6...a190ae68
>
> Also, I've started a [discussion on jmh-dev](https://mail.openjdk.org/pipermail/jmh-dev/2025-February/004094.html) to see if there's a way to minimise pollution of `Math.min(II)` compilation. As a follow to https://github.com/openjdk/jdk/pull/20098#issuecomment-2684701935 I looked at where the other `Math.min(II)` calls are coming from, and a big chunk seem related to the JMH infrastructure.

@galderz about:
> Additional performance improvement: extend backend capabilities for vectorization (see Regression 2 + 3). Optional.

I looked at `src/hotspot/cpu/x86/x86.ad`
bool Matcher::match_rule_supported_vector(int opcode, int vlen, BasicType bt) {

   1774     case Op_MaxV:                                                                                                                                                                                                             
   1775     case Op_MinV:
   1776       if (UseSSE < 4 && is_integral_type(bt)) {
   1777         return false;
   1778       }
...

So it seems that here lanewise min/max are supported for AVX2. But it seems that's different for reductions:

   1818     case Op_MinReductionV:
   1819     case Op_MaxReductionV:                                                                                                                                                                                                    
   1820       if ((bt == T_INT || is_subword_type(bt)) && UseSSE < 4) {
   1821         return false;
   1822       } else if (bt == T_LONG && (UseAVX < 3 || !VM_Version::supports_avx512vlbwdq())) {
   1823         return false;
   1824       }
...

So it seems maybe we could improve the AVX2 coverage for reductions. But honestly, I will probably find this issue again once I work on the other reductions above, and run the benchmarks. I think that will make it easier to investigate all of this. I will for example adjust the IR rules, and then it will be apparent where there are cases that are not covered.

@galderz you said you would add some extra comments, then I will review again :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2704159992
PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2704161929