RFR: 8347405: MergeStores with reverse bytes order value [v11]

Emanuel Peter epeter at openjdk.org
Wed Feb 26 06:31:58 UTC 2025


On Tue, 28 Jan 2025 03:09:28 GMT, kuaiwei <duke at openjdk.org> wrote:

>> This patch enhance MergeStores optimization to support merge value with reverse byte order.
>> 
>> Below is benchmark result before and after the patch:
>> 
>> On aliyun g8y (aarch64)
>> |name   | before        | score2        | ratio |
>> |---|---|---|---|
>> |MergeStoreBench.setCharBS      |5669.655000    |5669.566000    | 0.00 %|
>> |MergeStoreBench.setCharBV      |5516.911000    |5516.273000    | 0.01 %|
>> |MergeStoreBench.setCharC       |5578.644000    |5552.809000    | 0.47 %|
>> |MergeStoreBench.setCharLS      |5782.140000    |5779.264000    | 0.05 %|
>> |MergeStoreBench.setCharLV      |5496.403000    |5499.195000    | -0.05 %|
>> |MergeStoreBench.setIntB        |6087.703000    |2768.385000    | 119.90 %|
>> |MergeStoreBench.setIntBU       |6733.813000    |2950.240000    | 128.25 %|
>> |MergeStoreBench.setIntBV       |1362.233000    |1361.821000    | 0.03 %|
>> |MergeStoreBench.setIntL        |2834.785000    |2833.042000    | 0.06 %|
>> |MergeStoreBench.setIntLU       |2947.145000    |2946.874000    | 0.01 %|
>> |MergeStoreBench.setIntLV       |5506.791000    |5506.229000    | 0.01 %|
>> |MergeStoreBench.setIntRB       |7634.279000    |5611.058000    | 36.06 %|
>> |MergeStoreBench.setIntRBU      |7766.737000    |5551.281000    | 39.91 %|
>> |MergeStoreBench.setIntRL       |5689.793000    |5689.385000    | 0.01 %|
>> |MergeStoreBench.setIntRLU      |5628.287000    |5628.789000    | -0.01 %|
>> |MergeStoreBench.setIntRU       |5536.039000    |5534.910000    | 0.02 %|
>> |MergeStoreBench.setIntU        |5595.363000    |5567.810000    | 0.49 %|
>> |MergeStoreBench.setLongB       |13722.671000   |6811.098000    | 101.48 %|
>> |MergeStoreBench.setLongBU      |13728.844000   |4280.240000    | 220.75 %|
>> |MergeStoreBench.setLongBV      |2785.255000    |2785.949000    | -0.02 %|
>> |MergeStoreBench.setLongL       |5714.615000    |5710.402000    | 0.07 %|
>> |MergeStoreBench.setLongLU      |4128.746000    |4129.324000    | -0.01 %|
>> |MergeStoreBench.setLongLV      |2793.125000    |2794.438000    | -0.05 %|
>> |MergeStoreBench.setLongRB      |14465.223000   |7015.050000    | 106.20 %|
>> |MergeStoreBench.setLongRBU     |14546.954000   |6173.210000    | 135.65 %|
>> |MergeStoreBench.setLongRL      |6816.145000    |6813.348000    | 0.04 %|
>> |MergeStoreBench.setLongRLU     |4289.445000    |4284.239000    | 0.12 %|
>> |MergeStoreBench.setLongRU      |3132.471000    |3133.093000    | -0.02 %|
>> |MergeStoreBench.setLongU       |3086.779000    |3087.298000    | -0.02 %|
>> 
>> AMD EPYC 9T24
>> ...
>
> kuaiwei has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - Allow ValueOrder::Reverse on big-endian platforms
>  - Revert "Merge more stores"
>    
>    This reverts commit 1e1113ed02ec5a9fe181f215d5667e8de487fe47.
>  - Revert "Fix test502aBE"
>    
>    This reverts commit f773fa368577c4f67957c4d40968c5c45e3ae205.

@kuaiwei sorry for the long delay. Thanks for your patience, and more importantly your good work here!

I have 2 nit-picks, but they are not very important. I'll run some internal testing now.

src/hotspot/share/opto/memnode.cpp line 3020:

> 3018:   ValueOrder input_value_order = find_adjacent_input_value_order(n1, n2, memory_size);
> 3019: 
> 3020:   if (input_value_order == ValueOrder::NotAdjacent) {

You check `input_value_order` against various cases here. Maybe a `switch` could be a good alternative.

src/hotspot/share/opto/memnode.cpp line 3028:

> 3026:        !Matcher::match_rule_supported(Op_ReverseBytesI) ||
> 3027:        !Matcher::match_rule_supported(Op_ReverseBytesL) ||
> 3028:        !Matcher::match_rule_supported(Op_ReverseBytesS)

Nit-pick: please order it from small to large: S -> I -> L
Suggestion:

       !Matcher::match_rule_supported(Op_ReverseBytesS) ||
       !Matcher::match_rule_supported(Op_ReverseBytesI) ||
       !Matcher::match_rule_supported(Op_ReverseBytesL)

test/hotspot/jtreg/compiler/c2/TestMergeStores.java line 727:

> 725:         applyIf = {"UseUnalignedAccesses", "true"},
> 726:         applyIfPlatform = {"little-endian", "true"})
> 727:     @IR(counts = {IRNode.STORE_B_OF_CLASS, "byte\\\\[int:>=0] \\\\(java/lang/Cloneable,java/io/Serializable\\\\)", "8",

You may want to add the bug number here at the top of the file.

-------------

PR Review: https://git.openjdk.org/jdk/pull/23030#pullrequestreview-2643204863
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1970974802
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1970973927
PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1970984644


More information about the hotspot-compiler-dev mailing list