RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v4]

Fri Nov 29 11:43:46 UTC 2024

On Thu, 17 Oct 2024 10:10:56 GMT, Galder Zamarreño <galder at openjdk.org> wrote:

>> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance.
>> 
>> Currently vectorization does not kick in for loops containing either of these calls because of the following error:
>> 
>> 
>> VLoop::check_preconditions: failed: control flow in loop not allowed
>> 
>> 
>> The control flow is due to the java implementation for these methods, e.g.
>> 
>> 
>> public static long max(long a, long b) {
>>     return (a >= b) ? a : b;
>> }
>> 
>> 
>> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively.
>> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization.
>> E.g.
>> 
>> 
>> SuperWord::transform_loop:
>>     Loop: N518/N126  counted [int,int),+4 (1025 iters)  main has_sfpt strip_mined
>>  518  CountedLoop  === 518 246 126  [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21)
>> 
>> 
>> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1):
>> 
>> 
>> ==============================
>> Test summary
>> ==============================
>>    TEST                                              TOTAL  PASS  FAIL ERROR
>>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>>                                                          1     1     0     0
>> ==============================
>> TEST SUCCESS
>> 
>> long min   1155
>> long max   1173
>> 
>> 
>> After the patch, on darwin/aarch64 (M1):
>> 
>> 
>> ==============================
>> Test summary
>> ==============================
>>    TEST                                              TOTAL  PASS  FAIL ERROR
>>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>>                                                          1     1     0     0
>> ==============================
>> TEST SUCCESS
>> 
>> long min   1042
>> long max   1042
>> 
>> 
>> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes.
>> Therefore, it still relies on the macro expansion to transform those into CMoveL.
>> 
>> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results:
>> 
>> 
>> ==============================
>> Test summary
>> ==============================
>>    TEST                                              TOTAL  PA...
>
> Galder Zamarreño has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 30 additional commits since the last revision:
> 
>  - Use same default size as in other vector reduction benchmarks
>  - Renamed benchmark class
>  - Double/Float tests only when avx enabled
>  - Make state class non-final
>  - Restore previous benchmark iterations and default param size
>  - Add clipping range benchmark that uses min/max
>  - Encapsulate benchmark state within an inner class
>  - Avoid creating result array in benchmark method
>  - Merge branch 'master' into topic.intrinsify-max-min-long
>  - Revert "Implement cmovL as a jump+mov branch"
>    
>    This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44.
>  - ... and 20 more: https://git.openjdk.org/jdk/compare/20c7c9a0...0a8718e1

@galderz Thanks for taking this task on!
Had a quick look at it. So auto-vectorization in SuperWord should now be working, right? If yes:

It would be nice if you tested both for `IRNode.MIN_VL` and `IRNode.MIN_REDUCTION_V`, the same for max.

You may want to look at these existing tests, to see what other tests there are for the `int` version:
`test/hotspot/jtreg/compiler/loopopts/superword/MinMaxRed_Int.java`
`test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java`
`test/hotspot/jtreg/compiler/vectorization/TestAutoVecIntMinMax.java`
`test/hotspot/jtreg/compiler/c2/TestMinMaxSubword.java`
There may be some duplicates already here... not sure. And maybe you need to check what to do about probabilities as well.

src/hotspot/share/opto/library_call.cpp line 1952:

> 1950:   Node *a = nullptr;
> 1951:   Node *b = nullptr;
> 1952:   Node *n = nullptr;

If you are touching this, then you might as well fix the style.
Suggestion:

  Node* a = nullptr;
  Node* b = nullptr;
  Node* n = nullptr;

test/hotspot/jtreg/compiler/intrinsics/math/TestMinMaxInlining.java line 80:

> 78:     @IR(phase = { CompilePhase.BEFORE_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "1" })
> 79:     @IR(phase = { CompilePhase.AFTER_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "0" })
> 80:     private static long testLongMin(long a, long b) {

Can you add a comment why it disappears after macro expansion?

test/hotspot/jtreg/compiler/intrinsics/math/TestMinMaxInlining.java line 108:

> 106:     @Test
> 107:     @Arguments(values = { Argument.NUMBER_MINUS_42, Argument.NUMBER_42 })
> 108:     @IR(counts = { IRNode.MIN_F, "1" }, applyIfCPUFeatureOr = {"avx", "true"})

Is this not supported by `asimd`? Same question for the other cases.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20098#pullrequestreview-2391518909
PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1814398129
PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1863381007
PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1863381913