RFR: 8289943: Simplify some object allocation merges [v13]

Mon Nov 7 02:05:18 UTC 2022

On Thu, 27 Oct 2022 06:58:38 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>>> 
>>> As of now, it serves dual purpose. It (1) marks a merge point as safe to be untangled during SR; and (2) caches information about field values.
>> 
>> An other important purpose of RAM is to have information at SafePoints after merge point for reallocation during deoptimization. You need Klass information. I don't think having only Phis for values is enough.
>> 
>>> 
>>> I believe you can solve it in a cleaner manner without introducing placeholder nodes and connection graph adjustments. IMO it's all about keeping escape status and properly handling "safe" merges in `split_unique_types`.
>>> 
>>> One possible way to handle merge points is:
>>> 
>>> * Handle merge points in `adjust_scalar_replaceable_state` and refrain from marking relevant bases as NSR when possible.
>>> * After `adjust_scalar_replaceable_state` is over, every merge point should have all its inputs as either NSR or SR.
>>> * `split_unique_types` incrementally builds value phis to eventually replace the base phi at merge point while processing SR allocations one by one.
>>> * After `split_unique_types` is done, there are no merge points anymore, each allocation has a dedicated memory graph and allocation elimination can proceed as before.
>> 
>> I am not sure how this could be possible. Currently EA rely on IGVN to propagate fields values based on unique memory slice. What you do with memory Load or Store nodes after merge point? Which memory slice you will use for them? 
>> 
>>> 
>>> Do you see any problems with such an approach?
>>> 
>>> One thing still confuses me though: the patch mentions that RAMs can merge both eliminated and not-yet-eliminated allocations. What's the intended use case? I believe it's still required to have all merged allocations to be eventually eliminated. Do you try to handle the case during allocation elimination when part of the inputs are already eliminated and the rest is pending their turn?
>> 
>> There is check for it in `ConnectionGraph::can_reduce_this_phi()`. The only supported cases is when no deoptimization point (SFP or UNCT) after merge point. It allow eliminate SR allocations even if they merge with NSR allocations. This was idea.
>
>> An other important purpose of RAM is to have information at SafePoints after merge point for reallocation during deoptimization. You need Klass information. I don't think having only Phis for values is enough.
> 
> Klass information is available either from Allocation node in `split_unique_types` or ConnectionGraph instance the Phi is part of. 
> 
> 
>> I am not sure how this could be possible. Currently EA rely on IGVN to propagate fields values based on unique memory slice. What you do with memory Load or Store nodes after merge point? Which memory slice you will use for them?
> 
> My understanding of how proposed approach is expected to work: merge points have to be simple enough to still allow splitting unique types for individual allocations. 
> 
> For example, `eliminate_ram_addp_use()` replaces `Load (AddP (Phi base1 ... basen) off) mem` with `Phi (val1 ... valn)` and `eliminate_reduced_allocation_merge()` performs similar transformation for `SafePoint`s. 
> 
> Alternatively, corresponding `Phi`s can be build incrementally while processing each individual `base` by `split_unique_types`. Or, just by splitting `Load`s through `Phi`: 
> 
> Load (AddP (Phi base_1 ... base_n) off) mem 
> == split-through-phi ==>
> Phi ((Load (AddP base_1 off) mem) ... (Load (AddP base_n off) mem)) 
> == split_unique_types ==>
> Phi ((Load (AddP base_1 off) mem_1) ... (Load (AddP base_n off) mem_n)) 
> == IGVN ==>
> Phi (val_1 ... val_n)
> ``` 
> 
>> There is check for it in ConnectionGraph::can_reduce_this_phi(). The only supported cases is when no deoptimization point (SFP or UNCT) after merge point. It allow eliminate SR allocations even if they merge with NSR allocations. This was idea.
> 
> That's nice! Now I see `has_call_as_user`-related code. 
> 
> It means that only `Load (AddP (Phi base_1 ... base_n)  off) mem` shapes are allowed now.
> I believe the aforementioned split-through-phi transformation should handle it well: 
> 
> Load (AddP (Phi base_1 ... base_n) off) mem 
> == split-through-phi ==>
> Phi ((Load (AddP base_1 off) mem) ... (Load (AddP base_n off) mem)) 
> == split_unique_types ==>
> Phi (... (Load (AddP base_SR_i off) mem_i) ... (Load (AddP base_NSR_n off) mem) ...) 
> == IGVN ==>
> Phi (... val_i ... (Load (AddP base_NSR_n off) mem) ... )

Hi @iwanowww - Thank you for clarifying things! 

After much thought and some testing, I think I can make the RAM node go away and achieve the results I want. Below are some additional comments & the overall approach that I'm going to switch to.

- `LoadNode::split_through_phi` requires `is_known_instance_field` and therefore can't be run before `split_unique_types` without changes.

-  I think splitting the merge Phi as part of `adjust_scalar_replaceable_state` might be the best place to create the value Phi for the fields. However, if the merge Phi is used by a SafePoint/UncommonTrap then it's necessary to also create a `SafePointScalarObjectNode` (SSON). As you know, there is already logic to create SSON as part of `PhaseMacroExpand::scalar_replacement`. So, we have to decide if it's best to re-use the code in `PhaseMacroExpand::scalar_replacement` in `adjust_scalar_replaceable_state` or if we want to add new code to create SSON for merge Phis in `PhaseMacroExpand::scalar_replacement`. Here are the Pros & Cons of the two options I mentioned above:
   - Create SSON in `adjust_scalar_replaceable_state`:
            Pros: all logic to split phi is centered in one place.
            Cons: the logic to create SSON is used outside `scalar_replacement` routine.

    - Create SafePointScalarReplacedNode in `scalar_replacement` routine.
            Pros: Scalar replacement of merge Phi is contained in scalar_replacement routine.
            Cons: the logic to split merge phi is spread throughout escape analysis and scalar replacement.

Does it look like a better approach? TIA!

-------------

PR: https://git.openjdk.org/jdk/pull/9073