RFR: 8342393: Promote commutative vector IR node sharing [v9]

Thu Jan 16 13:07:38 UTC 2025

On Thu, 16 Jan 2025 12:01:13 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering.
>> Unlike scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization, for vector IR we chose a simpler approach to decorate commutative operations with a special node-level flag during IR construction thus
>> obviating any dependency on explicit idealization routines. This flag is later used during GVN hashing to enable node sharing.  
>> 
>> Following are the performance stats for JMH micro included with the patch.
>> 
>> 
>> Granite Rapids (P-core Xeon Server)
>> Baseline : 
>> Benchmark                                                                (size)   Mode  Cnt      Score   Error   Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing     1024  thrpt    2   8982.549          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing      1024  thrpt    2   6072.773          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing     1024  thrpt    2   2368.856          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing    1024  thrpt    2  15215.087          ops/ms
>> 
>> Withopt:
>> Benchmark                                                                (size)   Mode  Cnt      Score   Error   Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing     1024  thrpt    2  11963.554          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing      1024  thrpt    2   7036.088          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing     1024  thrpt    2   2906.731          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing    1024  thrpt    2  17148.131          ops/ms
>> 
>> Sierra Forest (E-core Xeon Server)
>> Baseline:
>> Benchmark                                                                (size)   Mode  Cnt     Score   Error   Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing     1024  thrpt    2  2444.359          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing      1024  thrpt    2  1710.256          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing     1024  thrpt    2   308.766          ops/ms
>> VectorCommutativeOperSharingBenc...
>
> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
> 
>   Generalizing vector size constraints covering different AVX levels and KNLSetting

I have a general question about the approach:
For commutative scalar ops (add, mul, etc), we reorder the edges to allow commoning.

  // Otherwise, sort inputs (commutativity) to help value numbering.
  if( in1->_idx > in2->_idx ) {
    add->swap_edges(1, 2);
    return true;
  }

Have you considered doing the same with vectors?
Or should we rather refactor also the scalars with the `check_for_collision`, and have a `is_commutative_operation` for scalar and vector ops?

I'm just afraid that we have now 2 ways of doing things, making things unnecessarily complicated.

Also the basic hash / hash collision functions now have an extra check. Probably the overhead is negligible, but who knows - they used to have `goto` there for performance... what do you think?

src/hotspot/share/opto/node.hpp line 836:

> 834:     Flag_is_predicated_using_blend   = 1 << 17,
> 835:     Flag_is_commutative_vector_oper  = 1 << 18,
> 836:     _last_flag                       = Flag_is_commutative_vector_oper

Bike-shedding: I would use either `operation` or simply `op`, the `oper` is not very common.

src/hotspot/share/opto/phaseX.cpp line 67:

> 65: //-----------------------------------------------------------------------------
> 66: 
> 67: bool NodeHash::check_for_collision(const Node* n, const Node* k) {

could this method be `const`?

-------------

PR Review: https://git.openjdk.org/jdk/pull/22863#pullrequestreview-2555968749
PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1918471166
PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1918473635