RFR: 8347405: MergeStores with reverse bytes order value
Emanuel Peter
epeter at openjdk.org
Fri Jan 10 11:21:44 UTC 2025
On Fri, 10 Jan 2025 09:07:11 GMT, kuaiwei <duke at openjdk.org> wrote:
> This patch enhance MergeStores optimization to support merge value with reverse byte order.
>
> Below is benchmark result before and after the patch:
>
> On aliyun g8a (aarch64)
> |name | before | score2 | ratio |
> |---|---|---|---|
> |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %|
> |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %|
> |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %|
> |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %|
> |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %|
> |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %|
> |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %|
> |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %|
> |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %|
> |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %|
> |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %|
> |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %|
> |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %|
> |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %|
> |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %|
> |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %|
> |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %|
> |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %|
> |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %|
> |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %|
> |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %|
> |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %|
> |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %|
> |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %|
> |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %|
> |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %|
> |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %|
> |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %|
> |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %|
>
> AMD EPYC 9T24
> |name | before | after | ratio |
> |---|---|---|---|
> |MergeStoreBench.setChar...
This looks very promising, thanks for working on this! Makes me very happy that people are extending it 😊
I have a few comments and suggestions below.
Can you please link the JBS issue to the other relevant RFE's for MergeStores?
Is there no way to reverse shorts and ints?
src/hotspot/share/opto/memnode.cpp line 2794:
> 2792: StoreNode* const _store;
> 2793: enum DataOrder { Unknown, Forward, Reverse};
> 2794: DataOrder _value_order;
Suggestion:
// State machine with initial state Unknown
// Allowed transitions:
// Unknown -> Forward
// Unknown -> Backward
// Forward -> Forward
// Backward -> Backward
enum DataOrder { Unknown, Forward, Reverse };
DataOrder _value_order;
Also: call it either consistently data-order or value-order ;)
src/hotspot/share/opto/memnode.cpp line 2956:
> 2954: }
> 2955:
> 2956: bool MergePrimitiveStores::is_adjacent_input_pair(const Node* n1, const Node* n2, const int memory_size) {
Nit: This may be a little "nitpicky". But I don't like `is_...` methods that have side-effects. That is why I'd be sad to see the `const` go.
src/hotspot/share/opto/memnode.cpp line 2993:
> 2991: }
> 2992:
> 2993: // initialize value_order once
Suggestion:
// Initial state "Unknown": check for transition to Forward or Reverse.
src/hotspot/share/opto/memnode.cpp line 2996:
> 2994: if (_value_order == DataOrder::Unknown) {
> 2995: if (shift_n1 < shift_n2) {
> 2996: _value_order = DataOrder::Forward;
Suggestion:
_value_order = DataOrder::Forward; // First pair has Forward order.
src/hotspot/share/opto/memnode.cpp line 3002:
> 3000: Matcher::match_rule_supported(Op_ReverseBytesL)) {
> 3001: _value_order = DataOrder::Reverse; // only support reverse bytes
> 3002: #endif
Can you leave a comment why we only need this for little endian? It seems you are now generating `ReverseByte` nodes on any platform, right?
src/hotspot/share/opto/memnode.cpp line 3010:
> 3008: if ((_value_order == DataOrder::Forward && shift_n1 > shift_n2) ||
> 3009: (_value_order == DataOrder::Reverse && shift_n1 < shift_n2)) {
> 3010: // wrong order
Suggestion:
// Wrong order: mixed Forward and Reverse not allowed.
src/hotspot/share/opto/memnode.cpp line 3044:
> 3042: shift_out = 0;
> 3043: return true;
> 3044: }
Can you tell me why you added this?
src/hotspot/share/opto/memnode.cpp line 3270:
> 3268: "merged_input_value is either int or long, and new_memory_size is small enough");
> 3269:
> 3270: if (_value_order == DataOrder::Reverse) {
Suggestion:
if (_value_order == DataOrder::Reverse) {
assert(_store->memory_size() == 1, "only implemented for bytes");
That would be correct, right?
src/hotspot/share/opto/memnode.cpp line 3276:
> 3274: merged_input_value = _phase->transform(new ReverseBytesINode(nullptr, merged_input_value));
> 3275: } else {
> 3276: return nullptr;
Suggestion:
// <say why we cannot do anything for 2-element byte case>.
return nullptr;
-------------
PR Review: https://git.openjdk.org/jdk/pull/23030#pullrequestreview-2542142687
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910222532
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910203556
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910226335
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910227487
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910207983
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910224201
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910208790
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910232296
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910230898
More information about the hotspot-compiler-dev
mailing list