RFR: 8347405: MergeStores with reverse bytes order value

Emanuel Peter epeter at openjdk.org
Fri Jan 10 11:21:44 UTC 2025


On Fri, 10 Jan 2025 09:07:11 GMT, kuaiwei <duke at openjdk.org> wrote:

> This patch enhance MergeStores optimization to support merge value with reverse byte order.
> 
> Below is benchmark result before and after the patch:
> 
> On aliyun g8a (aarch64)
> |name   | before        | score2        | ratio |
> |---|---|---|---|
> |MergeStoreBench.setCharBS      |5669.655000    |5669.566000    | 0.00 %|
> |MergeStoreBench.setCharBV      |5516.911000    |5516.273000    | 0.01 %|
> |MergeStoreBench.setCharC       |5578.644000    |5552.809000    | 0.47 %|
> |MergeStoreBench.setCharLS      |5782.140000    |5779.264000    | 0.05 %|
> |MergeStoreBench.setCharLV      |5496.403000    |5499.195000    | -0.05 %|
> |MergeStoreBench.setIntB        |6087.703000    |2768.385000    | 119.90 %|
> |MergeStoreBench.setIntBU       |6733.813000    |2950.240000    | 128.25 %|
> |MergeStoreBench.setIntBV       |1362.233000    |1361.821000    | 0.03 %|
> |MergeStoreBench.setIntL        |2834.785000    |2833.042000    | 0.06 %|
> |MergeStoreBench.setIntLU       |2947.145000    |2946.874000    | 0.01 %|
> |MergeStoreBench.setIntLV       |5506.791000    |5506.229000    | 0.01 %|
> |MergeStoreBench.setIntRB       |7634.279000    |5611.058000    | 36.06 %|
> |MergeStoreBench.setIntRBU      |7766.737000    |5551.281000    | 39.91 %|
> |MergeStoreBench.setIntRL       |5689.793000    |5689.385000    | 0.01 %|
> |MergeStoreBench.setIntRLU      |5628.287000    |5628.789000    | -0.01 %|
> |MergeStoreBench.setIntRU       |5536.039000    |5534.910000    | 0.02 %|
> |MergeStoreBench.setIntU        |5595.363000    |5567.810000    | 0.49 %|
> |MergeStoreBench.setLongB       |13722.671000   |6811.098000    | 101.48 %|
> |MergeStoreBench.setLongBU      |13728.844000   |4280.240000    | 220.75 %|
> |MergeStoreBench.setLongBV      |2785.255000    |2785.949000    | -0.02 %|
> |MergeStoreBench.setLongL       |5714.615000    |5710.402000    | 0.07 %|
> |MergeStoreBench.setLongLU      |4128.746000    |4129.324000    | -0.01 %|
> |MergeStoreBench.setLongLV      |2793.125000    |2794.438000    | -0.05 %|
> |MergeStoreBench.setLongRB      |14465.223000   |7015.050000    | 106.20 %|
> |MergeStoreBench.setLongRBU     |14546.954000   |6173.210000    | 135.65 %|
> |MergeStoreBench.setLongRL      |6816.145000    |6813.348000    | 0.04 %|
> |MergeStoreBench.setLongRLU     |4289.445000    |4284.239000    | 0.12 %|
> |MergeStoreBench.setLongRU      |3132.471000    |3133.093000    | -0.02 %|
> |MergeStoreBench.setLongU       |3086.779000    |3087.298000    | -0.02 %|
> 
> AMD EPYC 9T24
> |name   | before | after  | ratio |
> |---|---|---|---|
> |MergeStoreBench.setChar...

This looks very promising, thanks for working on this! Makes me very happy that people are extending it 😊 

I have a few comments and suggestions below.

Can you please link the JBS issue to the other relevant RFE's for MergeStores?

Is there no way to reverse shorts and ints?

src/hotspot/share/opto/memnode.cpp line 2794:

> 2792:   StoreNode* const _store;
> 2793:   enum DataOrder { Unknown, Forward, Reverse};
> 2794:   DataOrder  _value_order;

Suggestion:

  // State machine with initial state Unknown
  // Allowed transitions:
  //   Unknown  -> Forward
  //   Unknown  -> Backward
  //   Forward  -> Forward
  //   Backward -> Backward
  enum DataOrder { Unknown, Forward, Reverse };
  DataOrder  _value_order;


Also: call it either consistently data-order or value-order ;)

src/hotspot/share/opto/memnode.cpp line 2956:

> 2954: }
> 2955: 
> 2956: bool MergePrimitiveStores::is_adjacent_input_pair(const Node* n1, const Node* n2, const int memory_size) {

Nit: This may be a little "nitpicky". But I don't like `is_...` methods that have side-effects. That is why I'd be sad to see the `const` go.

src/hotspot/share/opto/memnode.cpp line 2993:

> 2991:   }
> 2992: 
> 2993:   // initialize value_order once

Suggestion:

  // Initial state "Unknown": check for transition to Forward or Reverse.

src/hotspot/share/opto/memnode.cpp line 2996:

> 2994:   if (_value_order == DataOrder::Unknown) {
> 2995:     if (shift_n1 < shift_n2) {
> 2996:       _value_order = DataOrder::Forward;

Suggestion:

      _value_order = DataOrder::Forward; // First pair has Forward order.

src/hotspot/share/opto/memnode.cpp line 3002:

> 3000:                Matcher::match_rule_supported(Op_ReverseBytesL)) {
> 3001:       _value_order = DataOrder::Reverse;  // only support reverse bytes
> 3002: #endif

Can you leave a comment why we only need this for little endian? It seems you are now generating `ReverseByte` nodes on any platform, right?

src/hotspot/share/opto/memnode.cpp line 3010:

> 3008:   if ((_value_order == DataOrder::Forward && shift_n1 > shift_n2) ||
> 3009:       (_value_order == DataOrder::Reverse && shift_n1 < shift_n2)) {
> 3010:     // wrong order

Suggestion:

    // Wrong order: mixed Forward and Reverse not allowed.

src/hotspot/share/opto/memnode.cpp line 3044:

> 3042:     shift_out = 0;
> 3043:     return true;
> 3044:   }

Can you tell me why you added this?

src/hotspot/share/opto/memnode.cpp line 3270:

> 3268:          "merged_input_value is either int or long, and new_memory_size is small enough");
> 3269: 
> 3270:   if (_value_order == DataOrder::Reverse) {

Suggestion:

  if (_value_order == DataOrder::Reverse) {
    assert(_store->memory_size() == 1, "only implemented for bytes");

That would be correct, right?

src/hotspot/share/opto/memnode.cpp line 3276:

> 3274:       merged_input_value = _phase->transform(new ReverseBytesINode(nullptr, merged_input_value));
> 3275:     } else {
> 3276:       return nullptr;

Suggestion:

      // <say why we cannot do anything for 2-element byte case>.
      return nullptr;

-------------

PR Review: https://git.openjdk.org/jdk/pull/23030#pullrequestreview-2542142687
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910222532
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910203556
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910226335
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910227487
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910207983
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910224201
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910208790
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910232296
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910230898


More information about the hotspot-compiler-dev mailing list