RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long)

Tue Jul 9 12:12:53 UTC 2024

This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance.

Currently vectorization does not kick in for loops containing either of these calls because of the following error:

VLoop::check_preconditions: failed: control flow in loop not allowed

The control flow is due to the java implementation for these methods, e.g.

public static long max(long a, long b) {
    return (a >= b) ? a : b;
}

This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively.
By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization.
E.g.

SuperWord::transform_loop:
    Loop: N518/N126  counted [int,int),+4 (1025 iters)  main has_sfpt strip_mined
 518  CountedLoop  === 518 246 126  [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21)

Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1):

==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR
   jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
                                                         1     1     0     0
==============================
TEST SUCCESS

long min   1155
long max   1173

After the patch, on darwin/aarch64 (M1):

==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR
   jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
                                                         1     1     0     0
==============================
TEST SUCCESS

long min   1042
long max   1042

This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes.
Therefore, it still relies on the macro expansion to transform those into CMoveL.

I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results:

==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR
   jtreg:test/hotspot/jtreg:tier1                     2500  2500     0     0
>> jtreg:test/jdk:tier1                               2413  2412     1     0 <<
   jtreg:test/langtools:tier1                         4556  4556     0     0
   jtreg:test/jaxp:tier1                                 0     0     0     0
   jtreg:test/lib-test:tier1                            33    33     0     0
==============================

The failure I got is [CODETOOLS-7903745](https://bugs.openjdk.org/browse/CODETOOLS-7903745) so unrelated to these changes.

-------------

Commit messages:
 - 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long)

Changes: https://git.openjdk.org/jdk/pull/20098/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8307513
  Stats: 32 lines in 5 files changed: 32 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/20098.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20098/head:pull/20098

PR: https://git.openjdk.org/jdk/pull/20098