RFR: 8341293: Split field loads through Nested Phis [v7]
Emanuel Peter
epeter at openjdk.org
Fri Feb 7 07:23:17 UTC 2025
On Fri, 24 Jan 2025 19:13:13 GMT, Dhamoder Nalla <dhanalla at openjdk.org> wrote:
>> As an extension of the work done as part of https://github.com/openjdk/jdk/pull/12897, split the field loads (AddP -> Load*) with nested phi parent nodes to enable more scalar replacements, thereby reducing memory allocation.
>>
>>
>> Here are the sequence of Ideal graph transformations for Nested phi:
>>
>>
>> 
>>
>> 
>>
>> 
>>
>> JMH results:
>> with disabled RAM
>>
>> Benchmark Mode Cnt Score Error Units
>> NestedPhiAndRematerialize.NopRAM.testBailOut_runner avgt 15 13.969 ± 0.248 ms/op
>> NestedPhiAndRematerialize.NopRAM.testFieldEscapeWithMerge_runner avgt 15 80.300 ± 4.306 ms/op
>> NestedPhiAndRematerialize.NopRAM.testMerge_TryCatchFinally_runner avgt 15 72.182 ± 1.781 ms/op
>> NestedPhiAndRematerialize.NopRAM.testMultiParentPhi_runner avgt 15 2.983 ± 0.001 ms/op
>> NestedPhiAndRematerialize.NopRAM.testNestedPhiPolymorphic_runner avgt 15 18.342 ± 0.731 ms/op
>> NestedPhiAndRematerialize.NopRAM.testNestedPhiProcessOrder_runner avgt 15 14.315 ± 0.443 ms/op
>> NestedPhiAndRematerialize.NopRAM.testNestedPhiWithLambda_runner avgt 15 18.511 ± 1.212 ms/op
>> NestedPhiAndRematerialize.NopRAM.testNestedPhiWithTrap_runner avgt 15 66.277 ± 1.478 ms/op
>> NestedPhiAndRematerialize.NopRAM.testNestedPhi_FieldLoad_runner avgt 15 17.968 ± 0.306 ms/op
>> NestedPhiAndRematerialize.NopRAM.testNestedPhi_TryCatch_runner avgt 15 14.186 ± 0.247 ms/op
>> NestedPhiAndRematerialize.NopRAM.testRematerialize_MultiObj_runner avgt 15 88.435 ± 4.869 ms/op
>> NestedPhiAndRematerialize.NopRAM.testRematerialize_SingleObj_runner avgt 15 29560.130 ± 48.797 ms/op
>> NestedPhiAndRematerialize.NopRAM.testRematerialize_TryCatch_runner avgt 15 49.150 ± 2.307 ms/op
>> NestedPhiAndRematerialize.NopRAM.testThreeLevelNestedPhi_runner avgt 15 18.236 ± 0.308 ms/op
>>
>> with enabled RAM
>> Benchmark Mode Cnt Score Error Units
>> NestedPhiAndRematerialize.YesRAM.testBailOut_runner avgt 15 3.257 ± 0.423 ms/op
>> NestedPhiAndRematerialize.YesRAM.testFieldEscapeWithMerge_runner avgt 15 79.916 ± 3.477 ms/op
>> NestedPhiAndRematerialize.YesRAM.testMerge_TryCatchFinally_runner avgt 15 72.053 ± 1.916 ms/op
>> NestedPhiAndRematerialize.YesRAM.testMultiParentPhi_runner avgt 15 2.984 ± 0.001 ms/op
>> NestedPhiAndRematerialize.YesRAM.testNestedPhiPolymorphic_runner avgt ...
>
> Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision:
>
> Modify IR rules
We're a little limited on reviewers, so sorry this is taking a while to review.
I quickly scanned the code, and left some code-style comments.
I also launched some testing.
Can you please update your PR description, and add some more background info about the code you a remodifying? That would greatly help us review, since we would have to spend less time reading your code and the existing code. Can you please also format the benchmark results in a code block, so it is easier to read?
src/hotspot/share/opto/escape.cpp line 1313:
> 1311: // Ensure that the splits are applied to the load fields of child phi nodes before the parent phi nodes take place.
> 1312: for (uint i = 0; i < nested_phis.size(); i++) {
> 1313: Node *nested_phi = nested_phis.at(i);
Suggestion:
Node* nested_phi = nested_phis.at(i);
Drive by comment
src/hotspot/share/opto/memnode.cpp line 1661:
> 1659: if (base->in(i)->is_Phi()) {
> 1660: // base->in(i) is the parent phi node for base node.
> 1661: Node *mem_node_for_load_after_opt = get_memory_node_for_nestedphi_after_split(phase, base, i);
Suggestion:
Node* mem_node_for_load_after_opt = get_memory_node_for_nestedphi_after_split(phase, base, i);
Drive by style comment
src/hotspot/share/opto/memnode.cpp line 1676:
> 1674: // Note that this function doesn't actually perform the split.
> 1675: // If a split is impossible, it returns nullptr.
> 1676: Node* LoadNode::get_memory_node_for_nestedphi_after_split(PhaseGVN* phase, Node *base, uint parent_idx) {
Suggestion:
Node* LoadNode::get_memory_node_for_nested_phi_after_split(PhaseGVN* phase, Node* base, uint parent_idx) {
Drive-by style comment
test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesNestedPhiTests.java line 81:
> 79: "-XX:CompileCommand=exclude,*::dummy*");
> 80:
> 81: framework.addScenarios(scenario0, scenario1, scenario2).start();
Would it make sense to add a scenario without any flags, or at least only sith the CompileCommand flags only that restrict compilation / inlining?
-------------
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/21270#pullrequestreview-2600962969
PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r1946054121
PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r1946054632
PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r1946055102
PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r1946056790
More information about the hotspot-compiler-dev
mailing list