RFC: Partial Escape Analysis in HotSpot C2

Tue Oct 11 22:12:49 UTC 2022

Yes, you should delegate Scalar replacement to existing EA in C2. As you wrote in your proposal, PAE should be used for 
escaping cases.

Your test with safepoints should be your next step after you resolved Vladimir's case without safepoints. You should 
show that your implementation can rematirealize an object at any escape site. Don't worry about rematirealize during 
deoptimization for now.

Also in your test there should be no merge at safepoint2 because `obj` is "not alive" (not referenced) anymore.

Thanks,
Vladimir K

On 10/11/22 3:00 PM, Liu, Xin wrote:
> hi, Vladimir Ivanov,
> 
> Thanks. I will include your example too.
> 
> I would like to verify the idea that we can delegate the obsolete object
> to C2 EA/SR.
> 
> Here is even more general form of your example. I add 2 safepoints here.
> they are before and after the idiom respectively.
> 
> void test(boolean unlikely_condition) {
>      MyClass obj = new MyClass();
>      safepoint1();
>      if (unlikely_condition) {
>        doCall(obj); // escape point; not inlinined
>      }
>      safepoint2();
>   }
> 
> Here is what I expect to see after PEA. Materialization will take place
> at 2 places. I use obj1 and obj2 to highlight them. please note that I
> intentionally clone objects in PEA materialization. They eclipse the
> live range of the original obj. I refer to it as 'obsolete'.
> 
>   void test(boolean unlikely_condition) {
>      MyClass obj = new MyClass();
>      safepoint1();
>      if (unlikely_condition) {
>        obj1 = new MyClass();
>        doCall(obj1); // escape point; not inlinined
>      }
>      obj = merge(obj2=new MyClass(), obj1);
>      safepoint2();
>   }
> 
> I expect C2 EA/SR to pick up the obsolete 'obj'.
> At SafePoint1, it's not dead. C2 will convert it to
> SafePointScalarObjectNode. If I can prove this idea, it means that we
> can delegate Scalar Replacement to C2 EA/SR. We may get away with scalar
> replacement part in PEA implementation!
> 
> thanks,
> --lx
> 
> 
> 
> On 10/11/22 1:00 PM, Vladimir Ivanov wrote:
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>> I'd suggest an even simpler case to start with:
>>
>> void test(boolean unlikely_condition) {
>>     MyClass obj = new MyClass();
>>     if (unlikely_condition) {
>>       doCall(obj); // escape point; not inlinined
>>     }
>> }
>>
>> and try to turn it into:
>>
>> void test(boolean unlikely_condition) {
>>     if (unlikely_condition) {
>>       doCall(new MyClass()); // escape point; not inlinined
>>     }
>> }
>>
>> It allows you to not bother about JVM state at all, because there's
>> already a valid one captured by the call.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 10/11/22 12:12, Liu, Xin wrote:
>>> Hi, Vladimir and Igor,
>>>
>>> Thanks you for your comments.
>>>
>>> I just start it. My first target is 'Figure-1' in the RFC. There are
>>> only 3 blocks and no merge and no alias. Class Object is so trivial that
>>> we even don't need to initialize it (no field to initialize).
>>>
>>> I expect to get code like this after parser with a new flag.
>>>
>>> private Object _cache;
>>> public void foo(boolean cond) {
>>>       Object x = new Object();
>>>
>>>       if (cond) {
>>>           Object x1 = new Object(); (clone object right before escapement)
>>>           _cache = x1;
>>>       }
>>> }
>>>
>>> I know it's over-simplified. I am going to test whether it is possible
>>> to implement the algorithm in parser and how C2 EA/SR interacts with the
>>> obsolete object x.
>>>
>>> Like you said, I am going to focus on ordinary objects first. Either
>>> good or bad, I will update my progress and results. I will see how to
>>> leverage Cesar's test and microbenchmark too. That's my intention too.
>>>
>>> thanks,
>>> --lx
>>>
>>> On 10/11/22 8:44 AM, Vladimir Kozlov wrote:
>>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>>
>>>>
>>>>
>>>> Hi Liu,
>>>>
>>>> To clarify, I think your proposal is reasonable and we would like you to continue to work on it.
>>>> Can you share details of its current status?
>>>>
>>>> Igor and I commented about C2 specifics you need to take into account.
>>>>
>>>> I can suggest to start with simple things: only objects instances. Arrays can be covered later.
>>>>
>>>> You can use the test from Cesar's PR [1] to test your implementation and extend it with additional cases.
>>>>
>>>> Thanks,
>>>> Vladimir K
>>>>
>>>> [1] https://github.com/openjdk/jdk/pull/9073
>>>>
>>>> On 10/7/22 3:26 PM, Liu, Xin wrote:
>>>>> Hi, Igor and Vladimir,
>>>>>
>>>>> I am not inventing anything new. All I am thinking is how to adapt
>>>>> Stadler's algorithm to C2. All innovation belong to the author.
>>>>>
>>>>> Figure-3 of my RFC is a copy of Listing-7 in his paper. Allow me to
>>>>> repeat his data structure here. I drop "class Id" because I think I can
>>>>> use AllocationNode pointer or even node idx instead.
>>>>>
>>>>> // this is per allocation, identified by 'Id'.
>>>>> class VirtualState: extends ObjectState {
>>>>>      int lockCount;
>>>>>      Node[] entries;
>>>>> };
>>>>>
>>>>> // this is per basic-block
>>>>> class State {
>>>>>      Map<Id , ObjectState> state;
>>>>>      Map<Node, Id> alias;
>>>>> };
>>>>>
>>>>> In basic block, PEA keeps tracking the allocation state of an object
>>>>> using VirtualState. In his paper, Figure-4 (b) and (e) depict how the
>>>>> algorithm tracks stores. To get flow-sensitive information, Stadler
>>>>> iterates the scheduled nodes in a basic block. I propose to iterate
>>>>> bytecodes within a basic block.
>>>>>
>>>>>> when you rematerialize the object, it consumes the current updated
>>>>> values to construct it. How to you intend to track those?
>>>>>>> Yes, you either track stores in Parser or do what current C2 EA does
>>>>> and create unique memory slices for VirtualObject.
>>>>>
>>>>> I plan to follow suit and track stores in parser! I also need to create
>>>>> a unique memory slice when I have to materialize a virtual object. This
>>>>> is for InitializeNode and I need to initialize the object to the
>>>>> cumulative state.
>>>>>
>>>>> thanks,
>>>>> --lx
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 10/7/22 1:21 PM, Vladimir Kozlov wrote:
>>>>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/7/22 10:37 AM, Igor Veresov wrote:
>>>>>>> The major difference between Graal and C2 is that graal captures the state at side effects and C2 captures the state at deopt points. That allows Graal to deduce state at any time, including when it needs to insert a rematerializing allocation during PEA. So, with C2 you have to either do everything in the parser as you are proposing or do the same thing as Graal and at least capture the state for stores. Having a state different from the original allocation point is ok. Both Graal and C2 would throw OOMs from place that could be far from the original point because of the EA.
>>>>>>>
>>>>>>> I think you also have to track the values of all of the object components, right? So when you rematerialize the object, it consumes the current updated values to construct it. How to you intend to track those?
>>>>>>
>>>>>> Yes, you either track stores in Parser or do what current C2 EA does and create unique memory slices for VirtualObject.
>>>>>>
>>>>>> Current C2 EA [1] looks for latest stores (or initial values) to the object (which has unique Aloccation node id)
>>>>>> staring from Safepoint memory input when we replace Allocate with SafePointScalarObject.
>>>>>>
>>>>>> You would need to use VirtualObject node id as unique instance id. And you need to create separate memory slices for it
>>>>>> as we do in EA for Allocation node.
>>>>>>
>>>>>> Vladimir K
>>>>>>
>>>>>> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macro.cpp#L452
>>>>>>
>>>>>>>
>>>>>>> igor
>>>>>>>
>>>>>>>> On Oct 6, 2022, at 5:09 PM, Liu, Xin <xxinliu at amazon.com> wrote:
>>>>>>>>
>>>>>>>> hi, Ignor,
>>>>>>>>
>>>>>>>> You are right. Cloning the JVMState of original Allocation Node isn't
>>>>>>>> the correct behavior. I need the JVMState right at materialization. I
>>>>>>>> think it is available because we are in parser. For 2 places of
>>>>>>>> materialization:
>>>>>>>> 1) we are handling the bytecode which causes the object to escape. It's
>>>>>>>> probably putfield/return/invoke. Current JVMState it is.
>>>>>>>> 2) we are in MergeProcessor. We need to materialize a virtual object in
>>>>>>>> its predecessors. We can extract the exiting JVMState from the
>>>>>>>> predecessor Block.
>>>>>>>>
>>>>>>>> I just realize maybe that's the one of the reasons Graal saves
>>>>>>>> 'FrameState' at store nodes. Graal needs to revisit the 'FrameState'
>>>>>>>> when its PEA phase does materialization in high-tier.
>>>>>>>>
>>>>>>>> Apart from safepoint, there's one corner case bothering me. JLS says
>>>>>>>> that creation of a class instance may throw an
>>>>>>>> OOME.(https://docs.oracle.com/javase/specs/jls/se19/html/jls-15.html#jls-15.9.4)
>>>>>>>>
>>>>>>>> "
>>>>>>>> space is allocated for the new class instance. If there is insufficient
>>>>>>>> space to allocate the object, evaluation of the class instance creation
>>>>>>>> expression completes abruptly by throwing an OutOfMemoryError.
>>>>>>>> "
>>>>>>>>
>>>>>>>> and it's cross-referenced by bytecode new in JVMS
>>>>>>>> https://docs.oracle.com/javase/specs/jvms/se19/html/jvms-6.html#jvms-6.5.new
>>>>>>>>
>>>>>>>> If we have moved the Allocation Node and JVM happens to run out of
>>>>>>>> memory, the first frame of stacktrace will drift a little bit, right?
>>>>>>>> The bci and source linenum will be wrong. Does it matter? I can't
>>>>>>>> imagine that user's programs rely on this information.
>>>>>>>>
>>>>>>>> I think it's possible to amend this bci/line number in JVMState level. I
>>>>>>>> will leave it as an open question and revisit it later.
>>>>>>>>
>>>>>>>> Do I understand your concern? if it makes sense to you, I will update
>>>>>>>> the RFC doc.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> --lx
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/6/22 3:00 PM, Igor Veresov wrote:
>>>>>>>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> You say that when you materialize the clone you plan to have the same jvm state as the original allocation. How is that possible in a general case? There can be arbitrary changes of state between the original allocation point and where the clone materializes.
>>>>>>>>>
>>>>>>>>> Igor
>>>>>>>>>
>>>>>>>>>> On Oct 6, 2022, at 10:42 AM, Liu, Xin <xxinliu at amazon.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> We would like to pursuit PEA in HotSpot. I spent time thinking how to
>>>>>>>>>> adapt Stadler's Partial Escape Analysis[1] to C2. I think there are 3
>>>>>>>>>> elements in it. 1) flow-sensitive escape analysis 2) lazy code motion
>>>>>>>>>> for the allocation and initialization 3) on-the-fly scalar replacement.
>>>>>>>>>> The most complex part is 3) and it has done by C2. I'd like to leverage
>>>>>>>>>> that, so I come up an idea to focus only on escaped objects in the
>>>>>>>>>> algorithm and delegate others to the existing C2 phases. Here is my RFC.
>>>>>>>>>> May I get your precious time on this?
>>>>>>>>>>
>>>>>>>>>> https://gist.github.com/navyxliu/62a510a5c6b0245164569745d758935b#rfc-partial-escape-analysis-in-hotspot-c2
>>>>>>>>>>
>>>>>>>>>> The idea is based on the following two observations.
>>>>>>>>>>
>>>>>>>>>> 1. Stadler's PEA can cooperate with C2 EA/SR.
>>>>>>>>>>
>>>>>>>>>> If an object moves to the place it is about to escape, it won't impact
>>>>>>>>>> C2 EA/SR later. It's because it will be marked as 'GlobalEscaped'. C2 EA
>>>>>>>>>> won't do anything for it anyway.
>>>>>>>>>>
>>>>>>>>>> If PEA don't touch a non-escaped object, it won't change its
>>>>>>>>>> escapability. It can punt it to C2 EA/SR and the result is still same.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2. The original AllocationNode is either dead or scalar replaceable
>>>>>>>>>> after Stadler's PEA.
>>>>>>>>>>
>>>>>>>>>> Stadler's algorithm virtualizes an allocation Node and materializes it
>>>>>>>>>> on demand. There are 2 places to materialize it. 1) the virtual object
>>>>>>>>>> is about to escape 2) MergeProcessor needs to merge an object and at
>>>>>>>>>> least one of its predecessor has materialized. MergeProcessor has to
>>>>>>>>>> materialize all virtual objects in other predecessors([1] 5.3, Merge nodes).
>>>>>>>>>>
>>>>>>>>>> We can prove the observation 2 using 'proof of contradiction' here.
>>>>>>>>>> Assume the original Allocation node is neither dead nor Scalar Replaced
>>>>>>>>>> after Stadler's PEA, and program is still correct.
>>>>>>>>>>
>>>>>>>>>> Program must need the original allocation node somewhere. The algorithm
>>>>>>>>>> has deleted the original allocation node in virtualization step and
>>>>>>>>>> never bring it back. It contradicts that the program is still correct. QED.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If you're convinced, then we can leverage it. In my design, I don't
>>>>>>>>>> virtualize the original node but just leave it there. C2 MacroExpand
>>>>>>>>>> phase will take care of the original allocation node as long as it's
>>>>>>>>>> either dead or scalar-replaceable. It never get a chance to expand.
>>>>>>>>>>
>>>>>>>>>> If we restrain on-the-fly scalar replacement in Stadler's PEA, we can
>>>>>>>>>> delegate it to C2 EA/SR! There are 3 gains:
>>>>>>>>>>
>>>>>>>>>> 1) I don't think I can write bug-free Scalar Replacement...
>>>>>>>>>> 2) This approach can automatically pick up C2 EA/SR improvements in the
>>>>>>>>>> future, such as JDK-8289943.
>>>>>>>>>> 3) If we focus only on 'escaped objects', we even don't need to deal
>>>>>>>>>> with deoptimization. Only 'scalar replaceable' objects need to save
>>>>>>>>>> Object states for deoptimization. Escaped objects disqualify that.
>>>>>>>>>>
>>>>>>>>>> [1]: Stadler, Lukas, Thomas Würthinger, and Hanspeter Mössenböck.
>>>>>>>>>> "Partial escape analysis and scalar replacement for Java." Proceedings
>>>>>>>>>> of Annual IEEE/ACM International Symposium on Code Generation and
>>>>>>>>>> Optimization. 2014.
>>>>>>>>>>
>>>>>>>>>> thanks,
>>>>>>>>>> --lx
>>>>>>>>>> <OpenPGP_0xB9D934C61E047B0D.asc>
>>>>>>>> <OpenPGP_0xB9D934C61E047B0D.asc>
>>>>>> >