RFR: 8341293: Split field loads through Nested Phis [v9]

Tue Jun 17 19:18:33 UTC 2025

On Tue, 8 Apr 2025 20:21:34 GMT, Dhamoder Nalla <dhanalla at openjdk.org> wrote:

>> This enhances the changes introduced in [JDK PR 12897](https://github.com/openjdk/jdk/pull/12897) by handling nested Phi nodes (phi -> phi -> AddP -> Load*) during scalar replacement. The primary goal is to split field loads (AddP -> Load*) involving nested Phi parent nodes, thereby increasing opportunities for scalar replacement and reducing memory allocations.
>> 
>> 
>> **Here is an illustration of the sequence of Ideal Graph Transformations applied to split through nested `Phi` nodes.**
>>  
>> **1. Initial State (Before Transformation)**
>> The graph contains a nested Phi structure where two Allocate nodes merge via a Phi node.
>> 
>> ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569)
>> 
>> **2. After Splitting Through Child Phi**
>> The transformation separates field loads by introducing additional AddP and Load nodes for each Allocate input.
>> 
>> ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81)
>> 
>> **3. After Splitting Load Field Through Parent Phi**
>> The field load operation (Load) is pushed even further up in the graph.
>> 
>> Instead of merging AddP pointers in a Phi node and then performing a Load, the transformation ensures that each path has its AddP -> Load sequence before merging.
>> 
>> This further eliminates the need to perform field loads on a Phi node, making the graph more conducive to scalar replacement.
>> 
>> ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961)
>> 
>> ### JMH Benchmark Results:
>> 
>> #### With Disabled RAM
>> 
>> | Benchmark | Mode | Count | Score | Error | Units |
>> |-----------|------|-------|-------|-------|-------|
>> | testBailOut_runner | avgt | 15 | 13.969 | ± 0.248 | ms/op |
>> | testFieldEscapeWithMerge_runner | avgt | 15 | 80.300 | ± 4.306 | ms/op |
>> | testMerge_TryCatchFinally_runner | avgt | 15 | 72.182 | ± 1.781 | ms/op |
>> | testMultiParentPhi_runner | avgt | 15 | 2.983 | ± 0.001 | ms/op |
>> | testNestedPhiPolymorphic_runner | avgt | 15 | 18.342 | ± 0.731 | ms/op |
>> | testNestedPhiProcessOrder_runner | avgt | 15 | 14.315 | ± 0.443 | ms/op |
>> | testNestedPhiWithLambda_runner | avgt | 15 | 18.511 | ± 1.212 | ms/op |
>> | testNestedPhiWithTrap_runner | avgt | 15 | 66.277 | ± 1.478 | ms/op |
>> | testNestedPhi_FieldLoad_runner | avgt | 15 | 17.968 | ± 0.306 | ms/op |
>> | testNestedPhi_TryCatch_runner | avgt | 15 | 14.186 | ± 0.247 | ms/op |
>> | testRematerialize_MultiObj_runner | avgt | 15 | 88.435 | ± 4.869 ...
>
> Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision:
> 
>   address CR comments

Thank you for persisting on this @dhanalla .
I just did a quick look. I'll look again and run tests as soon as I get some time.

src/hotspot/share/opto/escape.cpp line 465:

> 463: bool ConnectionGraph::can_reduce_phi_check_inputs(PhiNode* ophi) const {
> 464:   bool found_sr_allocate = false;
> 465:   int nof_input_phi_nodes = 0;

Looks like this can be just a boolean variable.

src/hotspot/share/opto/escape.cpp line 569:

> 567:     } else if (use->is_Phi()) {
> 568:       if (n->_idx == use->_idx) {
> 569:         NOT_PRODUCT(if (TraceReduceAllocationMerges) tty->print_cr("Cannot reduce Self loop nested Phi");)

NIT: "self"

src/hotspot/share/opto/escape.cpp line 571:

> 569:         NOT_PRODUCT(if (TraceReduceAllocationMerges) tty->print_cr("Cannot reduce Self loop nested Phi");)
> 570:         return false;
> 571:       } else if (!can_reduce_phi_check_inputs(use->as_Phi()) || !can_reduce_check_users(use->as_Phi(), phi_nest_level+1)) {

Maybe worth printing another trace message here? Saying we are not reducing the parent Phi because we can't reduce the child phi?

src/hotspot/share/opto/escape.cpp line 1310:

> 1308:     Node* use = ophi->fast_out(i);
> 1309:     if (use->is_Phi()) {
> 1310:       assert(use->_idx != ophi->_idx, "Unexpected selfloop Phi.");

Should we bailout of the reduction process if we somehow end up in this situation?
I.e., in a debug build we'll assert, but in a product build you're just ignoring the problem.

-------------

PR Review: https://git.openjdk.org/jdk/pull/21270#pullrequestreview-2936791060
PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r2152984586
PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r2152989734
PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r2152992496
PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r2153000431