RFC : Approach to handle Allocation Merges in C2 Scalar Replacement
Vladimir Kozlov
vladimir.kozlov at oracle.com
Fri Feb 11 18:51:16 UTC 2022
On 2/11/22 10:24 AM, Vladimir Ivanov wrote:
>
>> For this simple case we can teach C2's IGVN to split fields loads through Phi so that phi(p0, p1) is not used and
>> allocations as well. We can do that because we know that allocations and phi do not escape.
>
> Seems like split_unique_types() pass is a prerequisite to make such transformations predictable on a larger scale. It
> simplifies the memory graph and untangles instance-related memory operations from the rest of the graph.
Can you elaborate on your suggestion?
As you correctly pointed in your original post, EA and SR heavily depend on _instance_id which is Allocation node's _idx
value. And the solution for this is either remove reference to merge phi. Or introduce new "virtual allocation" which
fields are populated by loads - but it is again needs to split through phi.
>
> Otherwise, I believe wide memory states/merges would pose serious problems. E.g., when a load is split through a phi,
> you need to pick correct memory states for the new loads above the phi which quickly becomes quite a challenging task.
Especially in loops. Actually I did tried such approach before and I agree that it is hard.
Thanks,
Vladimir K
>
> Best regards,
> Vladimir Ivanov
>
>>
>> public int ex1(boolean cond, int first, int second) {
>> p0 = Allocate(...);
>> ...
>> p0.x = first;
>> p0.y = second;
>>
>> if (cond) {
>> p1 = Allocate(...);
>> ...
>> p1.x = second;
>> p1.y = first;
>> }
>>
>> p = phi(p0, p1) // unused and will be removed
>>
>>
>> return phi(p0.x,p1.x) - phi(p0.y, p1.y);
>> }
>>
>> Thanks,
>> Vladimir K
>>
>> On 2/10/22 11:29 AM, Vladimir Ivanov wrote:
>>> (BCCing hotspot-dev and moving the discussion to hotspot-compiler-dev.)
>>>
>>> Hi Cesar,
>>>
>>> Thanks for looking into enhancing EA.
>>>
>>> Overall, the proposal looks reasonable.
>>>
>>> I suggest to look more closely at split_unique_types().
>>> It introduces a dedicated class of alias categories for fields of the allocation being eliminated and clones memory
>>> graph. I don't see why it shouldn't work for multiple allocations.
>>>
>>> Moreover, split_unique_types() will break if you start optimizing multiple allocations at once. The notion of unique
>>> alias should be adjusted and cover the union of unique aliases for all interacting allocations.
>>>
>>> Seems like you need to enhance SR to work on non-intersecting clusters of allocations.
>>>
>>> One thing to take care of: scalar replacement relies on TypeOopPtr::instance_id().
>>>
>>> // If not InstanceTop or InstanceBot, indicates that this is
>>> // a particular instance of this type which is distinct.
>>> // This is the node index of the allocation node creating this instance.
>>> int _instance_id;
>>>
>>> It'll break when multiple allocations are in play.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> On 09.02.2022 04:45, Cesar Soares Lucas wrote:
>>>> Hi there again!
>>>>
>>>> Can you please give me feedback on the following approach to at least partially
>>>> address [1], the scalar replacement allocation merge issue?
>>>>
>>>> The problem that I am trying to solve arises when allocations are merged after a
>>>> control flow split. The code below shows _one example_ of such a situation.
>>>>
>>>> public int ex1(boolean cond, int x, int y) {
>>>> Point p = new Point(x, y);
>>>> if (cond)
>>>> p = new Point(y, x);
>>>> // Allocations for p are merged here.
>>>> return p.calc();
>>>> }
>>>>
>>>> Assuming the method calls on "p" are inlined then the allocations will not
>>>> escape the method. The C2 IR for this method will look like this:
>>>>
>>>> public int ex1(boolean cond, int first, int second) {
>>>> p0 = Allocate(...);
>>>> ...
>>>> p0.x = first;
>>>> p0.y = second;
>>>>
>>>> if (cond) {
>>>> p1 = Allocate(...);
>>>> ...
>>>> p1.x = second;
>>>> p1.y = first;
>>>> }
>>>>
>>>> p = phi(p0, p1)
>>>>
>>>> return p.x - p.y;
>>>> }
>>>>
>>>> However, one of the constraints implemented here [2], specifically the third
>>>> one, will prevent the objects from being scalar replaced.
>>>>
>>>> The approach that I'm considering for solving the problem is to replace the Phi
>>>> node `p = phi(p0, p1)` with new Phi nodes for each of the fields of the objects
>>>> in the original Phi. The IR for `ex1` would look something like this after the
>>>> transformation:
>>>>
>>>> public int ex1(boolean cond, int first, int second) {
>>>> p0 = Allocate(...);
>>>> ...
>>>> p0.x = first;
>>>> p0.y = second;
>>>>
>>>> if (cond) {
>>>> p1 = Allocate(...);
>>>> ...
>>>> p1.x = second;
>>>> p1.y = first;
>>>> }
>>>>
>>>> pX = phi(first, second)
>>>> pY = phi(second, first)
>>>>
>>>> return pX - pY;
>>>> }
>>>>
>>>> I understand that this transformation might not be applicable for all cases and
>>>> that it's not as simple as illustrated above. Also, it seems to me that much of
>>>> what I'd have to implement is already implemented in other steps of the Scalar
>>>> Replacement pipeline (which is a good thing). To work around these
>>>> implementation details I plan to use as much of the existing code as possible.
>>>> The algorithm for the transformation would be like this:
>>>>
>>>> split_phis(phi)
>>>> # If output of phi escapes, or something uses its identity, etc
>>>> # then we can't remove it. The conditions here might possible be the
>>>> # same as the ones implemented in `PhaseMacroExpand::can_eliminate_allocation`
>>>> if cant_remove_phi_output(phi)
>>>> return ;
>>>>
>>>> # Collect a set of tuples(F,U) containing nodes U that uses field F
>>>> # member of the object resulting from `phi`.
>>>> fields_used = collect_fields_used_after_phi(phi)
>>>>
>>>> foreach field in fields_used
>>>> producers = {}
>>>>
>>>> # Create a list with the last Store for each field "field" on the
>>>> # scope of each of the Phi input objects.
>>>> foreach o in phi.inputs
>>>> # The function called below might re-use a lot of the code/logic in `PhaseMacroExpand::scalar_replacement`
>>>> producers += last_store_to_o_field(0, field)
>>>> # Create a new phi node whose inputs are the Store's to 'field'
>>>> field_phi = create_new_phi(producers)
>>>>
>>>> update_consumers(field, field_phi)
>>>>
>>>> The implementation that I envisioned would be as a "pre-process" [3] step just
>>>> after EA but before the constraint checks in `adjust_scalar_replaceable_state`
>>>> [2]. If we agree that the overall Scalar Replacement implementation goes through
>>>> the following major phases:
>>>>
>>>> 1. Identify the Escape Status of objects.
>>>> 2. Adjust object Escape and/or Scalar Replacement status based on a set of constraints.
>>>> 3. Make call to Split_unique_types [4].
>>>> 4 Iterate over object and array allocations.
>>>> 4.1 Check if allocation can be eliminated.
>>>> 4.2 Perform scalar replacement. Replace uses of object in Safepoints.
>>>> 4.3 Process users of CheckCastPP other than Safepoint: AddP, ArrayCopy and CastP2X.
>>>>
>>>> The transformation that I am proposing would change the overall flow to look
>>>> like this:
>>>>
>>>> 1. Identify the Escape Status of objects.
>>>> 2. ----> New: "Split phi functions" <----
>>>> 2. Adjust object Escape and/or Scalar Replacement status based on a set of constraints.
>>>> 3. Make call to Split_unique_types [14].
>>>> 4 Iterate over object and array allocations.
>>>> 4.1 ----> Moved to split_phi: "Check if allocation can be eliminated" <----
>>>> 4.2 Perform scalar replacement. Replace uses of object in Safepoints.
>>>> 4.3 Process users of CheckCastPP other than Safepoint: AddP, ArrayCopy and CastP2X.
>>>>
>>>> Please let me know what you think and thank you for taking the time to review
>>>> this!
>>>>
>>>>
>>>> Regards,
>>>> Cesar
>>>>
>>>> Notes:
>>>>
>>>> [1] I am not sure yet how this approach will play with the case of a merge
>>>> with NULL.
>>>> [2]
>>>> https://github.com/openjdk/jdk/blob/2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff/src/hotspot/share/opto/escape.cpp#L1809
>>>>
>>>> [3] Another option would be to "patch" the current implementation to be able
>>>> to handle the merges. I am not certain that the "patch" approach would be
>>>> better, however, the "pre-process" approach is certainly much easier to test
>>>> and more readable.
>>>>
>>>> [4] I cannot say I understand 100% the effects of executing
>>>> split_unique_types(). Would the transformation that I am proposing need to
>>>> be after the call to split_unique_types?
More information about the hotspot-compiler-dev
mailing list