RFR: JDK-8316991: Reduce nullable allocation merges

Cesar Soares Lucas cslucas at openjdk.org
Mon Oct 16 16:18:16 UTC 2023


### Description

Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges.

Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. 

The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing.

### Benchmarking

**Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case.
**Note 2:** Marging of error was negligible.

| Benchmark                            |  No RAM (ms/op)  |   Yes RAM (ms/op) |
|--------------------------------------|------------------|-------------------|
| TestTrapAfterMerge                   |      19.515      |      13.386       |
| TestArgEscape                        |      33.165      |      33.254       |
| TestCallTwoSide                      |      70.547      |      69.427       |
| TestCmpAfterMerge                    |      16.400      |       2.984       |
| TestCmpMergeWithNull_Second          |      27.204      |      27.293       |
| TestCmpMergeWithNull                 |       8.248      |       4.920       |
| TestCondAfterMergeWithAllocate       |      12.890      |       5.252       |
| TestCondAfterMergeWithNull           |       6.265      |       5.078       |
| TestCondLoadAfterMerge               |      12.713      |       5.163       |
| TestConsecutiveSimpleMerge           |      30.863      |       4.068       |
| TestDoubleIfElseMerge                |      16.069      |       2.444       |
| TestEscapeInCallAfterMerge           |      23.111      |      22.924       |
| TestGlobalEscape                     |      14.459      |      14.425       |
| TestIfElseInLoop                     |     246.061      |      42.786       |
| TestLoadAfterLoopAlias               |      45.808      |      45.812       |
| TestLoadAfterTrap                    |      28.370      |      28.514       |
| TestLoadInCondAfterMerge             |      12.538      |       4.720       |
| TestLoadInLoop                       |      25.534      |      17.079       |
| TestMergedAccessAfterCallNoWrite     |     169.837      |     169.881       |
| TestMergedAccessAfterCallWithWrite   |     149.669      |     152.105       |
| TestMergedLoadAfterDirectStore       |      16.496      |      16.473       |
| TestMergesAndMixedEscape             |      28.821      |      19.701       |
| TestNestedObjectsArray               |      31.207      |      27.832       |
| TestNestedObjectsNoEscapeObject      |      16.162      |      12.544       |
| TestNestedObjectsObject              |      16.117      |      12.204       |
| TestNoEscapeWithLoadInLoop           |     253.903      |     247.400       |
| TestNoEscapeWithWriteInLoop          |     113.710      |     113.714       |
| TestObjectIdentity                   |       2.442      |       2.442       |
| TestPartialPhis                      |       4.340      |       4.340       |
| TestPollutedNoWrite                  |       7.817      |       1.991       |
| TestPollutedPolymorphic              |      11.017      |       1.991       |
| TestPollutedWithWrite                |       8.596      |       8.593       |
| TestSRAndNSR_NoTrap_caller           |      14.865      |       8.536       |
| TestSRAndNSR_Trap_caller             |      45.689      |      40.930       |
| TestSimpleAliasedAlloc               |      16.297      |       2.447       |
| TestSimpleDoubleMerge                |      23.786      |       2.997       |
| TestString_one_caller                |      15.484      |      15.271       |
| TestString_two_caller                |      15.456      |      14.996       |
| TestSubclassesTrapping               |      26.820      |      26.143       |
| TestSubclasses                       |       6.521      |       3.834       |
| TestThreeWayAliasedAlloc             |      16.307      |       2.308       |
| TestTrappingAfterMerge               |      13.683      |       6.804       |

### Tests
- Linux x86_64: Tier1-4, DaCapo, Renaissance, SpecJBB
- MacOS Aarch64: Tier1-4
- Windows x86_64: Tier1-4

-------------

Commit messages:
 - Refrain from RAM of arrays and Phis controlled by Loop nodes.
 - Fix typo in test.
 - Fix build after merge.
 - Fix merge
 - Support for reducing nullable allocation merges.

Changes: https://git.openjdk.org/jdk/pull/15825/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15825&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8316991
  Stats: 2291 lines in 13 files changed: 2051 ins; 94 del; 146 mod
  Patch: https://git.openjdk.org/jdk/pull/15825.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15825/head:pull/15825

PR: https://git.openjdk.org/jdk/pull/15825


More information about the hotspot-dev mailing list