RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v19]

Thu Oct 30 15:59:08 UTC 2025

On Tue, 28 Oct 2025 22:25:09 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> This PR introduces C2 support for `Reference.reachabilityFence()`.
>> 
>> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected.
>> 
>> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.
>> 
>> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.
>> 
>> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. 
>> 
>> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
>> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
>> 
>> Testing:
>> - [x] hs-tier1 - hs-tier8
>> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations
>> - [x] java/lang/foreign microbenchmarks
>
> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix merge

That is how far I got today. Only was able to look at details, I'll have to get the overview tomorrow or next week.

Thanks for keeping on working on this, this is a complex one!

I left off at `reachability.cpp` at `PhaseIdealLoop::insert_rf`, marking it for myself ;)

src/hotspot/share/opto/c2_globals.hpp line 86:

> 84:                                                                             \
> 85:   product(bool, StressReachabilityFences, false, DIAGNOSTIC,                \
> 86:           "Aggressively insert reachability fences for all oop arguments")  \

It could be nice if you gave some more detail here, what these flags do.

src/hotspot/share/opto/compile.cpp line 4029:

> 4027:       Node* in = n->in(j);
> 4028:       if (in->is_DecodeNarrowPtr() && (is_uncommon || !in->has_non_debug_uses())) {
> 4029:         n->set_req(j, in->in(1));

Can you say why you changed this code here? Is it equivalent?

src/hotspot/share/opto/escape.cpp line 1230:

> 1228:     SafePointNode* sfpt = safepoints.at(spi)->as_SafePoint();
> 1229: 
> 1230:     sfpt->remove_non_debug_edges(non_debug_edges_worklist);

This looks a bit "hacky". Can you add some code comments why we need to do it this way?

src/hotspot/share/opto/loopTransform.cpp line 76:

> 74:       return head()->as_OuterStripMinedLoop()->outer_loop_exit();
> 75:     } else {
> 76:       // For now, conservatively report multiple loop exits exist.

Can this happen? Do you have an example?

src/hotspot/share/opto/loopnode.cpp line 5119:

> 5117:   if (stop_early) {
> 5118:     assert(do_expensive_nodes || do_optimize_reachability_fences, "why are we here?");
> 5119:     if (do_optimize_reachability_fences && optimize_reachability_fences()) {

Can you explain why you call `optimize_reachability_fences` here and also below?

src/hotspot/share/opto/loopnode.hpp line 1150:

> 1148: 
> 1149:   void remove_dead_node(Node* dead) {
> 1150:     assert(dead->outcnt() == 0 && !dead->is_top(), "node must be dead");

Could you assert `dead->is_dead()` here?
We should probably also not call this on a `CFG` node, otherwise we might destroy the "ctrl forwarding", see: https://git.openjdk.org/jdk/pull/27892

I'm only putting so much scrutiny here, because you are adding a new public method to `PhaseIdealLoop`, and that would require that it is clear how to use it, and not to use it.

src/hotspot/share/opto/reachability.cpp line 89:

> 87: // In terms of dominance relation it can be formulated as "a referent has a user which is dominated by the redundant RF".
> 88: // Until loop opts are over, only RF nodes are considered as usages (controlled by rf_only flag).
> 89: static bool is_redundant_rf_helper(Node* ctrl, Node* referent, PhaseIdealLoop* phase, PhaseGVN& gvn, bool rf_only) {

Nit: `_helper` is fine if it is used as some internal method, i.e. only `is_redundant_rf` uses `is_redundant_rf_helper`. But it seems you are using it from different places. Can you find a better name?

src/hotspot/share/opto/reachability.cpp line 102:

> 100:       return true;
> 101:     }
> 102:   }

Can you explain in a code comment?

src/hotspot/share/opto/reachability.cpp line 118:

> 116:           }
> 117:         } else {
> 118:           assert(rf_only, "");

Does `phase == nullptr` imply `rf_only`? If so, you should add an assert at the top of the method.

src/hotspot/share/opto/reachability.cpp line 180:

> 178:       return false; // uncommon traps are exit points
> 179:     }
> 180:     return true;

Looks like we have established "significance" by principle of exclusion. That feels a little brittle, what if there is yet another category we would have to exclude? Would that lead to correctness issues, or only be inefficient?

Also: "significant" is a bit of a vague term. Significant for what? "reachability tracking purposes", of course, we are in `reachability.hpp` ;)
But can you be more specific?

-------------

PR Review: https://git.openjdk.org/jdk/pull/25315#pullrequestreview-3399439848
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2477973204
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478484569
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478466641
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478491570
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478507770
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478533282
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478583211
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478587566
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478608309
PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2478647914