RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v6]

Fri Dec 20 15:44:40 UTC 2024

On Tue, 17 Dec 2024 18:12:24 GMT, Galder Zamarreño <galder at openjdk.org> wrote:

>> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance.
>> 
>> Currently vectorization does not kick in for loops containing either of these calls because of the following error:
>> 
>> 
>> VLoop::check_preconditions: failed: control flow in loop not allowed
>> 
>> 
>> The control flow is due to the java implementation for these methods, e.g.
>> 
>> 
>> public static long max(long a, long b) {
>>     return (a >= b) ? a : b;
>> }
>> 
>> 
>> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively.
>> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization.
>> E.g.
>> 
>> 
>> SuperWord::transform_loop:
>>     Loop: N518/N126  counted [int,int),+4 (1025 iters)  main has_sfpt strip_mined
>>  518  CountedLoop  === 518 246 126  [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21)
>> 
>> 
>> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1):
>> 
>> 
>> ==============================
>> Test summary
>> ==============================
>>    TEST                                              TOTAL  PASS  FAIL ERROR
>>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>>                                                          1     1     0     0
>> ==============================
>> TEST SUCCESS
>> 
>> long min   1155
>> long max   1173
>> 
>> 
>> After the patch, on darwin/aarch64 (M1):
>> 
>> 
>> ==============================
>> Test summary
>> ==============================
>>    TEST                                              TOTAL  PASS  FAIL ERROR
>>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>>                                                          1     1     0     0
>> ==============================
>> TEST SUCCESS
>> 
>> long min   1042
>> long max   1042
>> 
>> 
>> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes.
>> Therefore, it still relies on the macro expansion to transform those into CMoveL.
>> 
>> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results:
>> 
>> 
>> ==============================
>> Test summary
>> ==============================
>>    TEST                                              TOTAL  PA...
>
> Galder Zamarreño has updated the pull request incrementally with five additional commits since the last revision:
> 
>  - Added comment around the assertions
>  - Adjust min/max identity IR test expectations after changes
>  - Fix style
>  - Add max reduction test
>  - Add empty line

test/hotspot/jtreg/compiler/loopopts/superword/MinMaxRed_Long.java line 135:

> 133:     @IR(applyIf = {"SuperWordReductions", "true"},
> 134:         applyIfCPUFeatureOr = { "avx512", "true" },
> 135:         counts = {IRNode.MIN_REDUCTION_V, " > 0"})

> @eme64 I've addressed all your comments except aarch64 testing. `asimd` is not enough, you need `sve` for this, but I'm yet to make it work even with `sve`, something's up and need to debug it further.

Hi @galderz , may I ask if these long-reduction cases can't work even with `sve`? It might be related with the limitation [here](https://github.com/openjdk/jdk/blob/75420e9314c54adc5b45f9b274a87af54dd6b5a8/src/hotspot/share/opto/superword.cpp#L1564-L1566). Some `sve` machines have only 128 bits.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1894089883