RFR: 8341293: Split field loads through Nested Phis [v7]

Fri Feb 7 07:23:17 UTC 2025

On Fri, 24 Jan 2025 19:13:13 GMT, Dhamoder Nalla <dhanalla at openjdk.org> wrote:

>> As an extension of the work done as part of https://github.com/openjdk/jdk/pull/12897, split the field loads (AddP -> Load*) with nested phi parent nodes to enable more scalar replacements, thereby reducing memory allocation.
>> 
>> 
>> Here are the sequence of Ideal graph transformations for Nested phi:
>> 
>>  
>> ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569)
>> 
>> ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81)
>> 
>> ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961)
>> 
>> JMH results:
>> with disabled RAM
>> 
>> Benchmark Mode Cnt Score Error Units
>> NestedPhiAndRematerialize.NopRAM.testBailOut_runner avgt 15 13.969 ± 0.248 ms/op
>> NestedPhiAndRematerialize.NopRAM.testFieldEscapeWithMerge_runner avgt 15 80.300 ± 4.306 ms/op
>> NestedPhiAndRematerialize.NopRAM.testMerge_TryCatchFinally_runner avgt 15 72.182 ± 1.781 ms/op
>> NestedPhiAndRematerialize.NopRAM.testMultiParentPhi_runner avgt 15 2.983 ± 0.001 ms/op
>> NestedPhiAndRematerialize.NopRAM.testNestedPhiPolymorphic_runner avgt 15 18.342 ± 0.731 ms/op
>> NestedPhiAndRematerialize.NopRAM.testNestedPhiProcessOrder_runner avgt 15 14.315 ± 0.443 ms/op
>> NestedPhiAndRematerialize.NopRAM.testNestedPhiWithLambda_runner avgt 15 18.511 ± 1.212 ms/op
>> NestedPhiAndRematerialize.NopRAM.testNestedPhiWithTrap_runner avgt 15 66.277 ± 1.478 ms/op
>> NestedPhiAndRematerialize.NopRAM.testNestedPhi_FieldLoad_runner avgt 15 17.968 ± 0.306 ms/op
>> NestedPhiAndRematerialize.NopRAM.testNestedPhi_TryCatch_runner avgt 15 14.186 ± 0.247 ms/op
>> NestedPhiAndRematerialize.NopRAM.testRematerialize_MultiObj_runner avgt 15 88.435 ± 4.869 ms/op
>> NestedPhiAndRematerialize.NopRAM.testRematerialize_SingleObj_runner avgt 15 29560.130 ± 48.797 ms/op
>> NestedPhiAndRematerialize.NopRAM.testRematerialize_TryCatch_runner avgt 15 49.150 ± 2.307 ms/op
>> NestedPhiAndRematerialize.NopRAM.testThreeLevelNestedPhi_runner avgt 15 18.236 ± 0.308 ms/op
>> 
>> with enabled RAM
>> Benchmark Mode Cnt Score Error Units
>> NestedPhiAndRematerialize.YesRAM.testBailOut_runner avgt 15 3.257 ± 0.423 ms/op
>> NestedPhiAndRematerialize.YesRAM.testFieldEscapeWithMerge_runner avgt 15 79.916 ± 3.477 ms/op
>> NestedPhiAndRematerialize.YesRAM.testMerge_TryCatchFinally_runner avgt 15 72.053 ± 1.916 ms/op
>> NestedPhiAndRematerialize.YesRAM.testMultiParentPhi_runner avgt 15 2.984 ± 0.001 ms/op
>> NestedPhiAndRematerialize.YesRAM.testNestedPhiPolymorphic_runner avgt ...
>
> Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Modify IR rules

We're a little limited on reviewers, so sorry this is taking a while to review.

I quickly scanned the code, and left some code-style comments.

I also launched some testing.

Can you please update your PR description, and add some more background info about the code you a remodifying? That would greatly help us review, since we would have to spend less time reading your code and the existing code. Can you please also format the benchmark results in a code block, so it is easier to read?

src/hotspot/share/opto/escape.cpp line 1313:

> 1311:   // Ensure that the splits are applied to the load fields of child phi nodes before the parent phi nodes take place.
> 1312:   for (uint i = 0; i < nested_phis.size(); i++) {
> 1313:     Node *nested_phi = nested_phis.at(i);

Suggestion:

    Node* nested_phi = nested_phis.at(i);

Drive by comment

src/hotspot/share/opto/memnode.cpp line 1661:

> 1659:       if (base->in(i)->is_Phi()) {
> 1660:         // base->in(i) is the parent phi node for base node.
> 1661:         Node *mem_node_for_load_after_opt = get_memory_node_for_nestedphi_after_split(phase, base, i);

Suggestion:

        Node* mem_node_for_load_after_opt = get_memory_node_for_nestedphi_after_split(phase, base, i);

Drive by style comment

src/hotspot/share/opto/memnode.cpp line 1676:

> 1674: // Note that this function doesn't actually perform the split.
> 1675: // If a split is impossible, it returns nullptr.
> 1676: Node* LoadNode::get_memory_node_for_nestedphi_after_split(PhaseGVN* phase, Node *base, uint parent_idx) {

Suggestion:

Node* LoadNode::get_memory_node_for_nested_phi_after_split(PhaseGVN* phase, Node* base, uint parent_idx) {

Drive-by style comment

test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesNestedPhiTests.java line 81:

> 79:                                              "-XX:CompileCommand=exclude,*::dummy*");
> 80: 
> 81:         framework.addScenarios(scenario0, scenario1, scenario2).start();

Would it make sense to add a scenario without any flags, or at least only sith the CompileCommand flags only that restrict compilation / inlining?

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21270#pullrequestreview-2600962969
PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r1946054121
PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r1946054632
PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r1946055102
PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r1946056790