RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v2]
Galder Zamarreño
galder at openjdk.org
Fri Sep 27 14:18:41 UTC 2024
> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance.
>
> Currently vectorization does not kick in for loops containing either of these calls because of the following error:
>
>
> VLoop::check_preconditions: failed: control flow in loop not allowed
>
>
> The control flow is due to the java implementation for these methods, e.g.
>
>
> public static long max(long a, long b) {
> return (a >= b) ? a : b;
> }
>
>
> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively.
> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization.
> E.g.
>
>
> SuperWord::transform_loop:
> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined
> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21)
>
>
> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1):
>
>
> ==============================
> Test summary
> ==============================
> TEST TOTAL PASS FAIL ERROR
> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
> 1 1 0 0
> ==============================
> TEST SUCCESS
>
> long min 1155
> long max 1173
>
>
> After the patch, on darwin/aarch64 (M1):
>
>
> ==============================
> Test summary
> ==============================
> TEST TOTAL PASS FAIL ERROR
> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
> 1 1 0 0
> ==============================
> TEST SUCCESS
>
> long min 1042
> long max 1042
>
>
> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes.
> Therefore, it still relies on the macro expansion to transform those into CMoveL.
>
> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results:
>
>
> ==============================
> Test summary
> ==============================
> TEST TOTAL PASS FAIL ERROR
> jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0
>>> jtreg:test/jdk:tier1 ...
Galder Zamarreño has updated the pull request incrementally with 17 additional commits since the last revision:
- Remove previous benchmark effort
- Multiply array value in reduction for vectorization to kick in
- Renamed benchmark methods
- Add min/max benchmark that includes loops and reductions
- Skip single array benchmarks
- Add an intermediate % that is more representative of real life
- Fix compilation error
- Fix min case to distribute numbers as per probability
- Distribute values targetting a branch percentage
* Use a random increment algorithm,
to create an array of values such that min/max
branch percentage matches.
- Fix format of assembly for the movl to movq switch
- ... and 7 more: https://git.openjdk.org/jdk/compare/3dd72b89...28778c84
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/20098/files
- new: https://git.openjdk.org/jdk/pull/20098/files/3dd72b89..28778c84
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=01
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=00-01
Stats: 562 lines in 5 files changed: 418 ins; 132 del; 12 mod
Patch: https://git.openjdk.org/jdk/pull/20098.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/20098/head:pull/20098
PR: https://git.openjdk.org/jdk/pull/20098
More information about the hotspot-dev
mailing list