RFR: 8342393: Promote commutative vector IR node sharing [v17]
Emanuel Peter
epeter at openjdk.org
Mon Jan 27 07:41:06 UTC 2025
On Thu, 23 Jan 2025 18:45:16 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering.
>> Similar to scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization.
>>
>> Following are the performance stats for JMH micro included with the patch.
>>
>>
>> Granite Rapids (P-core Xeon Server)
>> Baseline :
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 8982.549 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 6072.773 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2368.856 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 15215.087 ops/ms
>>
>> Withopt:
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 11963.554 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 7036.088 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2906.731 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 17148.131 ops/ms
>>
>> Sierra Forest (E-core Xeon Server)
>> Baseline:
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 2444.359 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 1710.256 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 308.766 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 3902.179 ops/ms
>>
>> Withopt:
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorCommutativeOperSharingBenchmark.com...
>
> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>
> Modifed scheme not based over fragile node level flags base solution.
@jatin-bhateja Yes, I think this is a bit of a simpler solution. Though the flag based one was also very interesting and would solve some other issues, i.e. other optimizations that order the inputs the other way around conflicting with ordering for value numbering. But we can leave that for another day ;)
I have a few suggestions below :)
src/hotspot/share/opto/vectornode.cpp line 1045:
> 1043: }
> 1044:
> 1045: bool VectorNode::can_swap_inputs() {
Suggestion:
bool VectorNode::should_swap_inputs() {
I think the method name `can_swap_inputs` should rather be `should_swap_inputs`. Because we `can` swap the edges even when the order is `in(1)->_idx <= in(2)->_idx` already, but we `should` not ;)
src/hotspot/share/opto/vectornode.cpp line 1091:
> 1089: if (in(1)->_idx > in(2)->_idx) {
> 1090: return true;
> 1091: }
@jatin-bhateja can you explain this / add a comment to the code?
Suggestion:
if (in(1)->_idx > in(2)->_idx) {
// Sort the inputs of commutative non-predicated vector operations to help value numbering.
return true;
}
-------------
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/22863#pullrequestreview-2574699037
PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1930083354
PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1930082583
More information about the hotspot-compiler-dev
mailing list