RFR: 8342393: Promote commutative vector IR node sharing [v8]
Xiaohong Gong
xgong at openjdk.org
Thu Jan 16 06:46:36 UTC 2025
On Wed, 15 Jan 2025 04:13:23 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering.
>> Unlike scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization, for vector IR we chose a simpler approach to decorate commutative operations with a special node-level flag during IR construction thus
>> obviating any dependency on explicit idealization routines. This flag is later used during GVN hashing to enable node sharing.
>>
>> Following are the performance stats for JMH micro included with the patch.
>>
>>
>> Granite Rapids (P-core Xeon Server)
>> Baseline :
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 8982.549 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 6072.773 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2368.856 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 15215.087 ops/ms
>>
>> Withopt:
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 11963.554 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 7036.088 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2906.731 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 17148.131 ops/ms
>>
>> Sierra Forest (E-core Xeon Server)
>> Baseline:
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 2444.359 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 1710.256 ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 308.766 ops/ms
>> VectorCommutativeOperSharingBenc...
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>
> Extension covering masked vector operations
test/hotspot/jtreg/compiler/vectorapi/VectorCommutativeOperSharingTest.java line 418:
> 416: // predicated (vec1 + vec2) + (vec2 + vec1)
> 417: vec1.lanewise(VectorOperators.ADD, vec2, vmask)
> 418: .lanewise(VectorOperators.ADD, vec2.lanewise(VectorOperators.ADD, vec1, vmask))
Thanks for your updating @jatin-bhateja!
But I don't think it's right here. Do you mean `vec1.lanewise(VectorOperators.ADD, vec2, vmake)` equals to `vec2.lanewise(VectorOperators.ADD, vec1, vmask)` ? The inactive lane values are not equal, right? Should it keep the lane value of first input for inactive lanes? Thanks!
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1917828758
More information about the hotspot-compiler-dev
mailing list