RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]
Emanuel Peter
epeter at openjdk.org
Fri Sep 12 13:12:34 UTC 2025
On Wed, 10 Sep 2025 21:34:37 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:
>> src/hotspot/share/opto/compile.cpp line 2522:
>>
>>> 2520: if (failing()) return;
>>> 2521: assert(_reachability_fences.length() == 0, "no RF nodes allowed");
>>> 2522: }
>>
>> Looks better than before :)
>>
>> I'm still wondering: do we need to do a whole loop-opts phase here? It probably has a performance impact, right?
>> Have you measured that?
>>
>> If it is measurable: could we just go through `_reachability_fences`, and hack the graph and clean up with IGVN? Or do we really need the loop state to do this successfully?
>
>> could we just go through _reachability_fences, and hack the graph and clean up with IGVN? Or do we really need the loop state to do this successfully?
>
> RF elimination needs control for referent to enumerate all interfering safepoints.
>
> Theoretically, it's possible to use a conservative estimate, but then:
> (1) it can worsen the result (by enumerating more interfering safepoints than needed); and
> (2) build an unschedulable graph if referent doesn't dominate safepoint node (if estimate is way too conservative).
>
> IMO it's safer to build full dominator tree here.
>
>> It probably has a performance impact, right? Have you measured that?
>
> It does have a noticeable cost. On my laptop it bumps the time spent doing RF processing from 170ms to 210ms
>
> $ java -Xcomp -XX:-TieredCompilation -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:-StressReachabilityFences
>
> IdealLoop: 0.173 s
> ReachabilityFence: 0.000 s
> Optimize: 0.000 s
> Eliminate: 0.000 s
> ```
> vs
>
> $ java -Xcomp -XX:-TieredCompilation -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:+StressReachabilityFences
>
> IdealLoop: 0.212 s
> ReachabilityFence: 0.030 s
> Optimize: 0.004 s
> Eliminate: 0.004 s
> ```
>
> I reimplemented it to piggyback on the last loop optimization attempt if there's any and it drastically improves the situation:
>
> $ java -Xcomp -XX:-TieredCompilation -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:+StressReachabilityFences
>
> IdealLoop: 0.193 s
> ReachabilityFence: 0.009 s
> Optimize: 0.003 s
> Eliminate: 0.004 s
@iwanowww
Ok, thanks for measuring this. We really need to keep an eye on this, otherwise it will surely trip @robcasloz 's C2 compile time benchmarking eventualyl ;)
Can you point me to the code where you are actually using the dominator information? I think I did not find it the last time I reviewed.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344212858
More information about the hotspot-compiler-dev
mailing list