RFR: 8342393: Promote commutative vector IR node sharing [v9]
Emanuel Peter
epeter at openjdk.org
Thu Jan 16 13:21:42 UTC 2025
On Thu, 16 Jan 2025 12:01:13 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering.
>> Unlike scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization, for vector IR we chose a simpler approach to decorate commutative operations with a special node-level flag during IR construction thus
>> obviating any dependency on explicit idealization routines. This flag is later used during GVN hashing to enable node sharing.
>>
>> Following are the performance stats for JMH micro included with the patch.
>>
>>
>> Granite Rapids (P-core Xeon Server)
>> Baseline :
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 8982.549 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 6072.773 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2368.856 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 15215.087 ops/ms
>>
>> Withopt:
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 11963.554 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 7036.088 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2906.731 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 17148.131 ops/ms
>>
>> Sierra Forest (E-core Xeon Server)
>> Baseline:
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 2444.359 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 1710.256 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 308.766 ops/ms
>> VectorCommutativeOperSharingBenc...
>
> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>
> Generalizing vector size constraints covering different AVX levels and KNLSetting
Increasing the level so that @TobiHartmann @vnkozlov also sign off on this.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/22863#issuecomment-2595649746
More information about the hotspot-compiler-dev
mailing list