[External] : Re: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Fri Feb 11 22:34:48 UTC 2022
> Actually Roland suggested before to always run split_unique_types(),
> even if non-escaping object is not scalar replaceable.
I second Roland suggestion. Irrespective of SR enhancements being
discussed, it's beneficial to introduce unique instance slices for
non-escaping allocations which aren't scalar replaceable.
(I experimented a bit with that idea before [1] and got some promising
results.)
Best regards,
Vladimir Ivanov
[1] https://github.com/iwanowww/jdk/tree/split_unique_types
>>
>> > As you correctly pointed in your original post, EA and SR heavily
>> depend on _instance_id which is Allocation node's _idx value.
>> > And the solution for this is either remove reference to merge phi.
>> Or introduce new "virtual allocation" which fields are
>> > populated by loads - but it is again needs to split through phi.
>>
>> Vladimir K., are you suggesting something like this:
>
> Yes, that is what I had in mind but it could be more complicated to
> implement.
> The idea is to have new _instance_id after merge. It may not be useful
> for this simple case but may help in others.
> We would still need to improve split_unique_types to find values to
> store in its fields.
>
>>
>> public int ex1(boolean cond, int first, int second) {
>> p0 = ...;
>> p0.x = first;
>> p0.y = second;
>> if (cond) {
>> p1 = ...
>> p1.x = second;
>> p1.y = first;
>> }
>> p = VirtualAllocate(...);
>> p.x = phi(first, second);
>> p.y = phi(second, first);
>> return p.x - p.y;
>> }
>>
>> Would the phi/merges for individual fields cause problems in
>> split_unique_types/SR?
>
> Split_unique_types/SR work on allocation's references and not on what is
> stored in its fields.
>
> Thanks,
> Vladimir
>
>>
>>
>> Regards,
>> Cesar
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> *From:* Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> *Sent:* February 11, 2022 10:51 AM
>> *To:* Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; Cesar Soares
>> Lucas <Divino.Cesar at microsoft.com>; hotspot compiler
>> <hotspot-compiler-dev at openjdk.java.net>
>> *Cc:* Brian Stafford <Brian.Stafford at microsoft.com>; Martijn Verburg
>> <Martijn.Verburg at microsoft.com>
>> *Subject:* Re: RFC : Approach to handle Allocation Merges in C2 Scalar
>> Replacement
>> On 2/11/22 10:24 AM, Vladimir Ivanov wrote:
>>>
>>>> For this simple case we can teach C2's IGVN to split fields loads
>>>> through Phi so that phi(p0, p1) is not used and
>>>> allocations as well. We can do that because we know that allocations
>>>> and phi do not escape.
>>>
>>> Seems like split_unique_types() pass is a prerequisite to make such
>>> transformations predictable on a larger scale. It
>>> simplifies the memory graph and untangles instance-related memory
>>> operations from the rest of the graph.
>>
>> Can you elaborate on your suggestion?
>>
>> As you correctly pointed in your original post, EA and SR heavily
>> depend on _instance_id which is Allocation node's _idx
>> value. And the solution for this is either remove reference to merge
>> phi. Or introduce new "virtual allocation" which
>> fields are populated by loads - but it is again needs to split through
>> phi.
>>
>>>
>>> Otherwise, I believe wide memory states/merges would pose serious
>>> problems. E.g., when a load is split through a phi,
>>> you need to pick correct memory states for the new loads above the
>>> phi which quickly becomes quite a challenging task.
>>
>> Especially in loops. Actually I did tried such approach before and I
>> agree that it is hard.
>>
>> Thanks,
>> Vladimir K
>>
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>>
>>>> public int ex1(boolean cond, int first, int second) {
>>>> p0 = Allocate(...);
>>>> ...
>>>> p0.x = first;
>>>> p0.y = second;
>>>>
>>>> if (cond) {
>>>> p1 = Allocate(...);
>>>> ...
>>>> p1.x = second;
>>>> p1.y = first;
>>>> }
>>>>
>>>> p = phi(p0, p1) // unused and will be removed
>>>>
>>>>
>>>> return phi(p0.x,p1.x) - phi(p0.y, p1.y);
>>>> }
>>>>
>>>> Thanks,
>>>> Vladimir K
>>>>
>>>> On 2/10/22 11:29 AM, Vladimir Ivanov wrote:
>>>>> (BCCing hotspot-dev and moving the discussion to
>>>>> hotspot-compiler-dev.)
>>>>>
>>>>> Hi Cesar,
>>>>>
>>>>> Thanks for looking into enhancing EA.
>>>>>
>>>>> Overall, the proposal looks reasonable.
>>>>>
>>>>> I suggest to look more closely at split_unique_types().
>>>>> It introduces a dedicated class of alias categories for fields of
>>>>> the allocation being eliminated and clones memory
>>>>> graph. I don't see why it shouldn't work for multiple allocations.
>>>>>
>>>>> Moreover, split_unique_types() will break if you start optimizing
>>>>> multiple allocations at once. The notion of unique
>>>>> alias should be adjusted and cover the union of unique aliases for
>>>>> all interacting allocations.
>>>>>
>>>>> Seems like you need to enhance SR to work on non-intersecting
>>>>> clusters of allocations.
>>>>>
>>>>> One thing to take care of: scalar replacement relies on
>>>>> TypeOopPtr::instance_id().
>>>>>
>>>>> // If not InstanceTop or InstanceBot, indicates that this is
>>>>> // a particular instance of this type which is distinct.
>>>>> // This is the node index of the allocation node creating this
>>>>> instance.
>>>>> int _instance_id;
>>>>>
>>>>> It'll break when multiple allocations are in play.
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>> On 09.02.2022 04:45, Cesar Soares Lucas wrote:
>>>>>> Hi there again!
>>>>>>
>>>>>> Can you please give me feedback on the following approach to at
>>>>>> least partially
>>>>>> address [1], the scalar replacement allocation merge issue?
>>>>>>
>>>>>> The problem that I am trying to solve arises when allocations are
>>>>>> merged after a
>>>>>> control flow split. The code below shows _one example_ of such a
>>>>>> situation.
>>>>>>
>>>>>> public int ex1(boolean cond, int x, int y) {
>>>>>> Point p = new Point(x, y);
>>>>>> if (cond)
>>>>>> p = new Point(y, x);
>>>>>> // Allocations for p are merged here.
>>>>>> return p.calc();
>>>>>> }
>>>>>>
>>>>>> Assuming the method calls on "p" are inlined then the allocations
>>>>>> will not
>>>>>> escape the method. The C2 IR for this method will look like this:
>>>>>>
>>>>>> public int ex1(boolean cond, int first, int second) {
>>>>>> p0 = Allocate(...);
>>>>>> ...
>>>>>> p0.x = first;
>>>>>> p0.y = second;
>>>>>>
>>>>>> if (cond) {
>>>>>> p1 = Allocate(...);
>>>>>> ...
>>>>>> p1.x = second;
>>>>>> p1.y = first;
>>>>>> }
>>>>>>
>>>>>> p = phi(p0, p1)
>>>>>>
>>>>>> return p.x - p.y;
>>>>>> }
>>>>>>
>>>>>> However, one of the constraints implemented here [2], specifically
>>>>>> the third
>>>>>> one, will prevent the objects from being scalar replaced.
>>>>>>
>>>>>> The approach that I'm considering for solving the problem is to
>>>>>> replace the Phi
>>>>>> node `p = phi(p0, p1)` with new Phi nodes for each of the fields
>>>>>> of the objects
>>>>>> in the original Phi. The IR for `ex1` would look something like
>>>>>> this after the
>>>>>> transformation:
>>>>>>
>>>>>> public int ex1(boolean cond, int first, int second) {
>>>>>> p0 = Allocate(...);
>>>>>> ...
>>>>>> p0.x = first;
>>>>>> p0.y = second;
>>>>>>
>>>>>> if (cond) {
>>>>>> p1 = Allocate(...);
>>>>>> ...
>>>>>> p1.x = second;
>>>>>> p1.y = first;
>>>>>> }
>>>>>>
>>>>>> pX = phi(first, second)
>>>>>> pY = phi(second, first)
>>>>>>
>>>>>> return pX - pY;
>>>>>> }
>>>>>>
>>>>>> I understand that this transformation might not be applicable for
>>>>>> all cases and
>>>>>> that it's not as simple as illustrated above. Also, it seems to me
>>>>>> that much of
>>>>>> what I'd have to implement is already implemented in other steps
>>>>>> of the Scalar
>>>>>> Replacement pipeline (which is a good thing). To work around these
>>>>>> implementation details I plan to use as much of the existing code
>>>>>> as possible.
>>>>>> The algorithm for the transformation would be like this:
>>>>>>
>>>>>> split_phis(phi)
>>>>>> # If output of phi escapes, or something uses its identity, etc
>>>>>> # then we can't remove it. The conditions here might possible
>>>>>> be the
>>>>>> # same as the ones implemented in
>>>>>> `PhaseMacroExpand::can_eliminate_allocation`
>>>>>> if cant_remove_phi_output(phi)
>>>>>> return ;
>>>>>>
>>>>>> # Collect a set of tuples(F,U) containing nodes U that uses
>>>>>> field F
>>>>>> # member of the object resulting from `phi`.
>>>>>> fields_used = collect_fields_used_after_phi(phi)
>>>>>>
>>>>>> foreach field in fields_used
>>>>>> producers = {}
>>>>>>
>>>>>> # Create a list with the last Store for each field
>>>>>> "field" on the
>>>>>> # scope of each of the Phi input objects.
>>>>>> foreach o in phi.inputs
>>>>>> # The function called below might re-use a lot of the
>>>>>> code/logic in `PhaseMacroExpand::scalar_replacement`
>>>>>> producers += last_store_to_o_field(0, field)
>>>>>> # Create a new phi node whose inputs are the Store's to
>>>>>> 'field'
>>>>>> field_phi = create_new_phi(producers)
>>>>>>
>>>>>> update_consumers(field, field_phi)
>>>>>>
>>>>>> The implementation that I envisioned would be as a "pre-process"
>>>>>> [3] step just
>>>>>> after EA but before the constraint checks in
>>>>>> `adjust_scalar_replaceable_state`
>>>>>> [2]. If we agree that the overall Scalar Replacement
>>>>>> implementation goes through
>>>>>> the following major phases:
>>>>>>
>>>>>> 1. Identify the Escape Status of objects.
>>>>>> 2. Adjust object Escape and/or Scalar Replacement status
>>>>>> based on a set of constraints.
>>>>>> 3. Make call to Split_unique_types [4].
>>>>>> 4 Iterate over object and array allocations.
>>>>>> 4.1 Check if allocation can be eliminated.
>>>>>> 4.2 Perform scalar replacement. Replace uses of object in
>>>>>> Safepoints.
>>>>>> 4.3 Process users of CheckCastPP other than Safepoint:
>>>>>> AddP, ArrayCopy and CastP2X.
>>>>>>
>>>>>> The transformation that I am proposing would change the overall
>>>>>> flow to look
>>>>>> like this:
>>>>>>
>>>>>> 1. Identify the Escape Status of objects.
>>>>>> 2. ----> New: "Split phi functions" <----
>>>>>> 2. Adjust object Escape and/or Scalar Replacement status
>>>>>> based on a set of constraints.
>>>>>> 3. Make call to Split_unique_types [14].
>>>>>> 4 Iterate over object and array allocations.
>>>>>> 4.1 ----> Moved to split_phi: "Check if allocation can be
>>>>>> eliminated" <----
>>>>>> 4.2 Perform scalar replacement. Replace uses of object in
>>>>>> Safepoints.
>>>>>> 4.3 Process users of CheckCastPP other than Safepoint:
>>>>>> AddP, ArrayCopy and CastP2X.
>>>>>>
>>>>>> Please let me know what you think and thank you for taking the
>>>>>> time to review
>>>>>> this!
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Cesar
>>>>>>
>>>>>> Notes:
>>>>>>
>>>>>> [1] I am not sure yet how this approach will play with the
>>>>>> case of a merge
>>>>>> with NULL.
>>>>>> [2]
>>>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2F2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff%2Fsrc%2Fhotspot%2Fshare%2Fopto%2Fescape.cpp%23L1809&data=04%7C01%7CDivino.Cesar%40microsoft.com%7Cc2580fe523c640d628d708d9ed8f81f3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637802022912055025%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=eNNCJjkdjWPcmIxl0VVqkpjdRcaCCSDB5Focx0R9EuY%3D&reserved=0
>>
>> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2F2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff*2Fsrc*2Fhotspot*2Fshare*2Fopto*2Fescape.cpp*23L1809&data=04*7C01*7CDivino.Cesar*40microsoft.com*7Cc2580fe523c640d628d708d9ed8f81f3*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637802022912055025*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000&sdata=eNNCJjkdjWPcmIxl0VVqkpjdRcaCCSDB5Focx0R9EuY*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!e1Tzr2hFook2sJaiA9-MXwtcKvodRKFQqkwhQ8dvjTwf0ynm2nYg54zUTBwUJuxvji6W0Q$>
>>
>>>>>>
>>>>>> [3] Another option would be to "patch" the current
>>>>>> implementation to be able
>>>>>> to handle the merges. I am not certain that the "patch"
>>>>>> approach would be
>>>>>> better, however, the "pre-process" approach is certainly
>>>>>> much easier to test
>>>>>> and more readable.
>>>>>>
>>>>>> [4] I cannot say I understand 100% the effects of executing
>>>>>> split_unique_types(). Would the transformation that I am
>>>>>> proposing need to
>>>>>> be after the call to split_unique_types?
More information about the hotspot-compiler-dev
mailing list