RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v9]
Emanuel Peter
epeter at openjdk.org
Mon Mar 24 11:45:17 UTC 2025
On Mon, 24 Mar 2025 04:29:03 GMT, kuaiwei <duke at openjdk.org> wrote:
>> In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine.
>>
>> The benchmark result of MergeLoadBench.java
>> AMD EPYC 9T24 96-Core Processor:
>>
>> |name | -MergeLoads | +MergeLoads |delta|
>> |---|---|---|---|
>> |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 |
>> |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 |
>> |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 |
>> |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 |
>> |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 |
>> |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 |
>> |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 |
>> |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 |
>> |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 |
>> |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 |
>> |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 |
>> |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 |
>> |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 |
>> |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 |
>> |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 |
>> |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 |
>> |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 |
>> |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 |
>> |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 |
>> |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 |
>> |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 |
>> |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 |
>> |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 |
>> |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 |
>> |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 |
>> |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 |
>> |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 |
>> |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 |
>> |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 |
>> |Merg...
>
> kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision:
>
> - Merge remote-tracking branch 'origin/master' into dev/merge_loads
> - Fix test
> - Add more tests
> - Enable StressIGVN and riscv platform
> - Change tests as review comments
> - Fix test failure and change for review comments
> - Revert extract value and add more tests
> - Add tests
> - Fix test failure
> - Remove some debug trace
> - ... and 1 more: https://git.openjdk.org/jdk/compare/024633e7...e37c4bf3
src/hotspot/share/opto/memnode.cpp line 2396:
> 2394: assert(last_op != nullptr && (last_op->Opcode() == Op_OrI || last_op->Opcode() == Op_OrL), "sanity");
> 2395: _phase->is_IterGVN()->replace_node(last_op, replace);
> 2396: _phase->is_IterGVN()->_worklist.push(merged_load);
If you did this in `OrNode::Ideal`, then you just have to return the new load, and `IGVN` takes care of the replacing. That is the code pattern we use everywhere else.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2010012746
More information about the hotspot-compiler-dev
mailing list