RFC: Partial Escape Analysis in HotSpot C2

Mon Oct 24 20:11:20 UTC 2022

hi, Vladimir Ivanov,

Your email is my starting point. It's thorough and very insightful.
We spent a lot of time trying to crack your questions. The RFC is the
summary of what we got. I own your a big thank!

Sorry, I still haven't had a clear answer for "2. Move vs Split
decision" yet. Stadler's algorithm just materialize a virtual object on
demand. After then, its state changes from virtual to materialized.
IMHO, I don't think it is optimal placement. I feel the optimal
placement should be along domination frontier of the original
AllocateNode and Exit. It is like the minimal phi construction and we
might borrow the idea from it. I put aside it because it's indeed an
optimization problem. Maybe it's not a big deal in common cases.

I try to answer your questions inline.

On 10/20/22 5:26 PM, Vladimir Ivanov wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> Hi,
> 
>> I would like to update on this. I manage to get PEA work in Vladimir
>> Ivanov's testcase. I put the testcase, assembly and graphs here[1].
>>
>> Even though it is a quite simple case, I think it demonstrates that the
>> RFC is practical in C2. I proposed 3 major differences from Graal.
> 
> Nice! Also, a very similar (but a much more popular case) should be
> escape sites in catch blocks (as reported by JDK-8267532 [1]).
> 
>> 1. The algorithm runs in parser instead of optimizer.
>> 2. Prefer clone-and-eliminate strategy rather than
>> virtualize-and-materialize.
>> 3. Refrain from scalar replacement on-the-fly.
> 
> I don't understand how you plan to implement it solely during parsing.
> You could do some bookkeeping during parsing and capture JVM state, but
> I don't see how to do EA that early.
> 
> Also, please, elaborate on #3. It's not clear to me what do you mean there.
> 
I added a PEAState for each basic block[2].

If we need to materialize a virtual object, we just create a new
AllocateNode and its children in-place[3].

In Stadler's algorithm, it inherently performs "scalar replacement". Its
first step is to delete the original allocation node. PEA phase replaces
LoadField nodes with scalars. Because I propose to focus on 'escaping
object' only, we don't perform 'scalar replacement'. I leave them to the
C2 EA/SR. that's why I say "I refrain from on-the-fly scalar replacement".

>> The test excises them all. I pasted 3 graphs here[2]. When we
>> materialize an object, we just clone it with the right JVMState. It
>> shows that C2 IterEA can automatically picks up the obsolete object and
>> get rid of it, as we expected.
>>
>> It turns out cloning an object isn't as complex as I thought. I mainly
>> spent time on adjusting JVMState for the cloned AllocateNode. Not only
>> to call sync_jvm(), I also need to 1) kill dead locals 2) clean stack
>> and even avoid reexecution that bci.
>>
>>    JVMState* jvms = parser->sync_jvms();
>>    SafePointNode* map = jvms->map();
>>    parser->kill_dead_locals();
>>    parser->clean_stack(jvms->sp());
>>    jvms->set_should_reexecute(false);
>>
>> Clearly, the algorithm hasn't completed yet. I am still working on
>> MergeProcessor, general classes fields and loop construct.
> 
> There was a previous discussion on PEA for C2 back in 2021 [2] [3]. One
> interesting observation related to your current experiments was:
> 
> "4. Escape sites separate the graph into 2 parts: before and after the
> instance escapes. In order to preserve identity invariants (and avoid
> identity paradoxes), PEA can't just put an allocation at every escape
> site. It should respect the order of escape events and ensure that the
> very same object is observed when multiple escape events happen.
> 
> Dynamic invariant can be formulated as: there should never be more than
> 1 allocation at runtime per 1 eliminated allocation.
> 
> Considering non-escaping operations can force materialization on their
> own, it poses additional constraints."
> 
> So, when you clone an allocation, you should ensure that only a single
> instance can be observed. And safepoints can be escape points as well
> (rematerialization in case of deoptimization event).
> 
This is fairly complex. That's why I suggest to focus on 'escaping
objects' only in C2 PEA. Please assume that all non-escape objects
remain intact after our PEA.

I think we can guarantee dynamic invariant here. First of all, we
traverse basic blocks in reverse-post-order(RPO). It's in the same
direction of execution. An object allocation is virtual initially. Once
a virtual object becomes materialized, it won't change back. We track
allocation states so we know that. The following appearances of an
materialized object won't cause 'materialization' again. It will be
treated as a ordinary object.

Here is Example2 and I am still working on it. Beside the place where x
escapes to _cache, we also need to materialize the virtual object before
merging two basic blocks. this is described in 5.3 Merge nodes of
Stadler's CGO paper.

class Example2 {
    private Object _cache;
    public Object foo(boolean cond) {
        Object x = new Object();

        blackhole();

        if (cond) {
            _cache = x;
        }
        return x;
    }

    public static void blackhole() {}
...
}

We expect to see code as follows after PEA. "x2 = new Object()" is the
result of materialization at merging point.

    public Object foo(boolean cond) {
        Object x0 = new Object();

        blackhole();

        if (cond) {
            x1 = new Object();
            _cache = x1;
        }
        x3 = phi(x2 = new Object(), x1);
        return x3;
    }

We've proved that the obsolete object is either dead or scalar
replaceable after PEA[4]. We expect C2 EA/SR to get rid of x0 down the
road. Please note that x0(the obsolete obj) dominates all clones(x1 and
x2) because we materialize them before merging, we can guarantee dynamic
invariant.

I plan to mark the original AllocateNode 'obsolete' if PEA does
materialize it. I am going to assert in MacroExpansion that it won't
expand an obsolete object. If it did, it would introduce redundancy and
violate the 'dynamic invariant'.

[1]https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2021-May/047536.html
[2]https://github.com/navyxliu/jdk/blob/PEA_parser/src/hotspot/share/opto/parse.hpp#L171

[3]
https://github.com/navyxliu/jdk/blob/PEA_parser/src/hotspot/share/opto/parseHelper.cpp#L367

[4]https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2022-October/059432.html

>> I haven't figured out how to test PEA in a reliable way. It is not easy
>> for IR framework to capture node movement. If we measure allocation
>> rate, it will be subject to CPU capability and also the sampling rate. I
>> came up with an idea so-called 'Epsilon-Test'. We create a JVM with
>> EpsilonGC and a fixed Java heap. Because EpsilonGC never replenish the
>> java heap, we can count how many iterations a test can run before OOME.
>> The less allocation made in a method, the more iterations HotSpot can
>> execute the method. This isn't perfect either. I found that hotspot
>> can't guarantee to execute the final-block in this case[3]. So far, I
>> just measure execution time instead.
> 
> It sounds more like a job for benchmarks, but focused on measuring
> allocation rate (per iteration). ("-prof gc" mode in JMH terms.)
> 
> Personally, I very much liked the IR framework-based approach Cesar used
> in the unit test for allocation merges [4]. Do you see any problems with
> that?
> 
> Best regards,
> Vladimir Ivanov
> 

Okay. I will follow this direction.

thanks,
--lx

> [1] https://bugs.openjdk.org/browse/JDK-8267532
> [2]
> https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2021-May/047486.html
> [3]
> https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2021-May/047536.html
> [4] https://github.com/openjdk/jdk/pull/9073
> 
> 
>>
>> Appreciate your feedbacks or you spot any redflag.
>>
>> [1] https://gist.github.com/navyxliu/9c325d5c445899c02a0d115c6ca90a79
>>
>> [2]
>> https://gist.github.com/navyxliu/9c325d5c445899c02a0d115c6ca90a79?permalink_comment_id=4341838#gistcomment-4341838
>>
>> [3] https://gist.github.com/navyxliu/9c325d5c445899c02a0d115c6ca90a79#file-example1-java-L43
>>
>> thanks,
>> --lx
>>
>>
>>
>>
>> On 10/12/22 11:17 AM, Vladimir Kozlov wrote:
>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>
>>>
>>>
>>> On 10/12/22 7:58 AM, Liu, Xin wrote:
>>>> hi, Vladimir,
>>>>> You should show that your implementation can rematirealize an object
>>>> at any escape site.
>>>>
>>>> My understanding is I suppose to 'materialize' an object at any escape site.
>>>
>>> Words ;^)
>>>
>>> Yes, I mistyped and misspelled.
>>>
>>> Vladimir K
>>>
>>>>
>>>> 'rematerialize' refers to 'create an scalar-replaced object on heap' in
>>>> deoptimization. It's for interpreter as if the object was created in the
>>>> first place. It doesn't apply to an escaped object because it's marked
>>>> 'GlobalEscaped' in C2 EA.
>>>>
>>>>
>>>> Okay. I will try this idea!
>>>>
>>>> thanks,
>>>> --lx
>>>>
>>>>
>>>>
>>>>
>>>> On 10/11/22 3:12 PM, Vladimir Kozlov wrote:
>>>>> Also in your test there should be no merge at safepoint2 because `obj` is "not alive" (not referenced) anymore.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xB9D934C61E047B0D.asc
Type: application/pgp-keys
Size: 3675 bytes
Desc: OpenPGP public key
URL: <https://mail.openjdk.org/pipermail/hotspot-compiler-dev/attachments/20221024/7e1f390d/OpenPGP_0xB9D934C61E047B0D.asc>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 665 bytes
Desc: OpenPGP digital signature
URL: <https://mail.openjdk.org/pipermail/hotspot-compiler-dev/attachments/20221024/7e1f390d/OpenPGP_signature.sig>