[External] : Re: RFC: Partial Escape Analysis in HotSpot C2

Fri Oct 7 20:21:44 UTC 2022

On 10/7/22 10:37 AM, Igor Veresov wrote:
> The major difference between Graal and C2 is that graal captures the state at side effects and C2 captures the state at deopt points. That allows Graal to deduce state at any time, including when it needs to insert a rematerializing allocation during PEA. So, with C2 you have to either do everything in the parser as you are proposing or do the same thing as Graal and at least capture the state for stores. Having a state different from the original allocation point is ok. Both Graal and C2 would throw OOMs from place that could be far from the original point because of the EA.
> 
> I think you also have to track the values of all of the object components, right? So when you rematerialize the object, it consumes the current updated values to construct it. How to you intend to track those?

Yes, you either track stores in Parser or do what current C2 EA does and create unique memory slices for VirtualObject.

Current C2 EA [1] looks for latest stores (or initial values) to the object (which has unique Aloccation node id) 
staring from Safepoint memory input when we replace Allocate with SafePointScalarObject.

You would need to use VirtualObject node id as unique instance id. And you need to create separate memory slices for it 
as we do in EA for Allocation node.

Vladimir K

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macro.cpp#L452

> 
> igor
> 
>> On Oct 6, 2022, at 5:09 PM, Liu, Xin <xxinliu at amazon.com> wrote:
>>
>> hi, Ignor,
>>
>> You are right. Cloning the JVMState of original Allocation Node isn't
>> the correct behavior. I need the JVMState right at materialization. I
>> think it is available because we are in parser. For 2 places of
>> materialization:
>> 1) we are handling the bytecode which causes the object to escape. It's
>> probably putfield/return/invoke. Current JVMState it is.
>> 2) we are in MergeProcessor. We need to materialize a virtual object in
>> its predecessors. We can extract the exiting JVMState from the
>> predecessor Block.
>>
>> I just realize maybe that's the one of the reasons Graal saves
>> 'FrameState' at store nodes. Graal needs to revisit the 'FrameState'
>> when its PEA phase does materialization in high-tier.
>>
>> Apart from safepoint, there's one corner case bothering me. JLS says
>> that creation of a class instance may throw an
>> OOME.(https://docs.oracle.com/javase/specs/jls/se19/html/jls-15.html#jls-15.9.4)
>>
>> "
>> space is allocated for the new class instance. If there is insufficient
>> space to allocate the object, evaluation of the class instance creation
>> expression completes abruptly by throwing an OutOfMemoryError.
>> "
>>
>> and it's cross-referenced by bytecode new in JVMS
>> https://docs.oracle.com/javase/specs/jvms/se19/html/jvms-6.html#jvms-6.5.new
>>
>> If we have moved the Allocation Node and JVM happens to run out of
>> memory, the first frame of stacktrace will drift a little bit, right?
>> The bci and source linenum will be wrong. Does it matter? I can't
>> imagine that user's programs rely on this information.
>>
>> I think it's possible to amend this bci/line number in JVMState level. I
>> will leave it as an open question and revisit it later.
>>
>> Do I understand your concern? if it makes sense to you, I will update
>> the RFC doc.
>>
>> thanks,
>> --lx
>>
>>
>>
>>
>> On 10/6/22 3:00 PM, Igor Veresov wrote:
>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>
>>>
>>>
>>> Hi,
>>>
>>> You say that when you materialize the clone you plan to have the same jvm state as the original allocation. How is that possible in a general case? There can be arbitrary changes of state between the original allocation point and where the clone materializes.
>>>
>>> Igor
>>>
>>>> On Oct 6, 2022, at 10:42 AM, Liu, Xin <xxinliu at amazon.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We would like to pursuit PEA in HotSpot. I spent time thinking how to
>>>> adapt Stadler's Partial Escape Analysis[1] to C2. I think there are 3
>>>> elements in it. 1) flow-sensitive escape analysis 2) lazy code motion
>>>> for the allocation and initialization 3) on-the-fly scalar replacement.
>>>> The most complex part is 3) and it has done by C2. I'd like to leverage
>>>> that, so I come up an idea to focus only on escaped objects in the
>>>> algorithm and delegate others to the existing C2 phases. Here is my RFC.
>>>> May I get your precious time on this?
>>>>
>>>> https://gist.github.com/navyxliu/62a510a5c6b0245164569745d758935b#rfc-partial-escape-analysis-in-hotspot-c2
>>>>
>>>> The idea is based on the following two observations.
>>>>
>>>> 1. Stadler's PEA can cooperate with C2 EA/SR.
>>>>
>>>> If an object moves to the place it is about to escape, it won't impact
>>>> C2 EA/SR later. It's because it will be marked as 'GlobalEscaped'. C2 EA
>>>> won't do anything for it anyway.
>>>>
>>>> If PEA don't touch a non-escaped object, it won't change its
>>>> escapability. It can punt it to C2 EA/SR and the result is still same.
>>>>
>>>>
>>>> 2. The original AllocationNode is either dead or scalar replaceable
>>>> after Stadler's PEA.
>>>>
>>>> Stadler's algorithm virtualizes an allocation Node and materializes it
>>>> on demand. There are 2 places to materialize it. 1) the virtual object
>>>> is about to escape 2) MergeProcessor needs to merge an object and at
>>>> least one of its predecessor has materialized. MergeProcessor has to
>>>> materialize all virtual objects in other predecessors([1] 5.3, Merge nodes).
>>>>
>>>> We can prove the observation 2 using 'proof of contradiction' here.
>>>> Assume the original Allocation node is neither dead nor Scalar Replaced
>>>> after Stadler's PEA, and program is still correct.
>>>>
>>>> Program must need the original allocation node somewhere. The algorithm
>>>> has deleted the original allocation node in virtualization step and
>>>> never bring it back. It contradicts that the program is still correct. QED.
>>>>
>>>>
>>>> If you're convinced, then we can leverage it. In my design, I don't
>>>> virtualize the original node but just leave it there. C2 MacroExpand
>>>> phase will take care of the original allocation node as long as it's
>>>> either dead or scalar-replaceable. It never get a chance to expand.
>>>>
>>>> If we restrain on-the-fly scalar replacement in Stadler's PEA, we can
>>>> delegate it to C2 EA/SR! There are 3 gains:
>>>>
>>>> 1) I don't think I can write bug-free Scalar Replacement...
>>>> 2) This approach can automatically pick up C2 EA/SR improvements in the
>>>> future, such as JDK-8289943.
>>>> 3) If we focus only on 'escaped objects', we even don't need to deal
>>>> with deoptimization. Only 'scalar replaceable' objects need to save
>>>> Object states for deoptimization. Escaped objects disqualify that.
>>>>
>>>> [1]: Stadler, Lukas, Thomas Würthinger, and Hanspeter Mössenböck.
>>>> "Partial escape analysis and scalar replacement for Java." Proceedings
>>>> of Annual IEEE/ACM International Symposium on Code Generation and
>>>> Optimization. 2014.
>>>>
>>>> thanks,
>>>> --lx
>>>> <OpenPGP_0xB9D934C61E047B0D.asc>
>> <OpenPGP_0xB9D934C61E047B0D.asc>
>