RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v8]

Emanuel Peter epeter at openjdk.org
Fri Sep 12 13:12:34 UTC 2025


On Wed, 10 Sep 2025 21:34:37 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> src/hotspot/share/opto/compile.cpp line 2522:
>> 
>>> 2520:     if (failing())  return;
>>> 2521:     assert(_reachability_fences.length() == 0, "no RF nodes allowed");
>>> 2522:   }
>> 
>> Looks better than before :)
>> 
>> I'm still wondering: do we need to do a whole loop-opts phase here? It probably has a performance impact, right?
>> Have you measured that?
>> 
>> If it is measurable: could we just go through `_reachability_fences`, and hack the graph and clean up with IGVN? Or do we really need the loop state to do this successfully?
>
>> could we just go through _reachability_fences, and hack the graph and clean up with IGVN? Or do we really need the loop state to do this successfully?
> 
> RF elimination needs control for referent to enumerate all interfering safepoints. 
> 
> Theoretically, it's possible to use a conservative estimate, but then:
>  (1) it can worsen the result (by enumerating more interfering safepoints than needed); and
>  (2) build an unschedulable graph if referent doesn't dominate safepoint node (if estimate is way too conservative). 
> 
> IMO it's safer to build full dominator tree here.  
> 
>> It probably has a performance impact, right? Have you measured that? 
> 
> It does have a noticeable cost. On my laptop it bumps the time spent doing RF processing from 170ms to 210ms
> 
> $ java -Xcomp -XX:-TieredCompilation -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:-StressReachabilityFences
> 
>          IdealLoop:             0.173 s
>            ReachabilityFence:   0.000 s
>              Optimize:          0.000 s
>              Eliminate:         0.000 s
> ``` 
> vs
> 
> $ java -Xcomp -XX:-TieredCompilation -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:+StressReachabilityFences
> 
>          IdealLoop:             0.212 s
>            ReachabilityFence:   0.030 s
>              Optimize:          0.004 s
>              Eliminate:         0.004 s
> ``` 
> 
> I reimplemented it to piggyback on the last loop optimization attempt if there's any and it drastically improves the situation:
> 
> $ java -Xcomp -XX:-TieredCompilation -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:+StressReachabilityFences
> 
>          IdealLoop:             0.193 s
>            ReachabilityFence:   0.009 s
>              Optimize:          0.003 s
>              Eliminate:         0.004 s

@iwanowww 
Ok, thanks for measuring this. We really need to keep an eye on this, otherwise it will surely trip @robcasloz 's C2 compile time benchmarking eventualyl ;)

Can you point me to the code where you are actually using the dominator information? I think I did not find it the last time I reviewed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2344212858


More information about the hotspot-compiler-dev mailing list