RFR: 8342393: Promote commutative vector IR node sharing [v18]
Emanuel Peter
epeter at openjdk.org
Mon Jan 27 14:42:56 UTC 2025
On Mon, 27 Jan 2025 10:57:30 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering.
>> Similar to scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization.
>>
>> Following are the performance stats for JMH micro included with the patch.
>>
>>
>> Granite Rapids (P-core Xeon Server)
>> Baseline :
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 8982.549 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 6072.773 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2368.856 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 15215.087 ops/ms
>>
>> Withopt:
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 11963.554 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 7036.088 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2906.731 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 17148.131 ops/ms
>>
>> Sierra Forest (E-core Xeon Server)
>> Baseline:
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 2444.359 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 1710.256 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 308.766 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 3902.179 ops/ms
>>
>> Withopt:
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorCommutativeOperSharingBenchmark.com...
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>
> Review comments resolutions.
I'm getting failed IR rules now:
Failed IR Rules (1) of Methods (1)
----------------------------------
1) Method "public void compiler.vectorapi.VectorCommutativeOperSharingTest.testVectorIRSharing1(int)" - [Failed IR rules: 1]:
* @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#ADD_VI#_", "_ at any", " 2 ", "_#V#MUL_VI#_", "_ at any", " 2 ", "_#V#MAX_VI#_", "_ at any", " 2 ", "_#V#MIN_VI#_", "_ at any", " 2 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
> Phase "PrintIdeal":
- counts: Graph contains wrong number of nodes:
* Constraint 2: "(\\d+(\\s){2}(MulVI.*)+(\\s){2}===.*vector[A-Za-z]<I,\\d+>)"
- Failed comparison: [found] 0 = 2 [given]
- No nodes matched!
* Constraint 3: "(\\d+(\\s){2}(MaxV.*)+(\\s){2}===.*vector[A-Za-z]<I,\\d+>)"
- Failed comparison: [found] 0 = 2 [given]
- No nodes matched!
* Constraint 4: "(\\d+(\\s){2}(MinV.*)+(\\s){2}===.*vector[A-Za-z]<I,\\d+>)"
- Failed comparison: [found] 0 = 2 [given]
- No nodes matched!
With flags: `-XX:UseAVX=0 -XX:UseSSE=3`. Machine has `AVX2`. I suspect you need to add some CPU feature restriction on the IR rule.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/22863#issuecomment-2615942159
More information about the hotspot-compiler-dev
mailing list