RFR: 8345067: C2: enable implicit null checks for ZGC reads

Thu May 8 11:29:01 UTC 2025

On Tue, 6 May 2025 13:28:28 GMT, Roberto Castañeda Lozano <rcastanedalo at openjdk.org> wrote:

> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335).
> 
> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands:
> 
> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753)
> 
> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates.
> 
> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks.
> 
> #### Testing
> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode).

@roberto Thanks a lot for taking the time to explain how implicit null checks work, and giving me some background for the PR :)

Below, I have mostly code style / naming suggestions, that you are welcome to use as inspiration. But you do not have to apply any of them, it is totally up to you :)

I'm definitely not an expert here, but your approach seems reasonable to me. The opt-in annotation `ins_is_late_expanded_null_check_candidate` makes sure we only do the optimization when we are sure it is ok. It is a limitation that we require the first operation to be the memory access. But the alternative would probably be significantly more complicated, i.e. to track the location of all the memory locations.

In our offline discussion, I had some hesitation about the case where the load is at the beginning, but the barrier may have more loads. I wondered: what if the first load does not trigger the NullPointerException, but a later load then encounters the null pointer. But I suppose that cannot happen, because the GC only moves the pointer, so if the old pointer was non-null, the new pointer must be non-null as well. Maybe that was so trivial that you did not even understand my question there 😅 But it could be helpful to write that down somewhere, just to make sure people are aware of this.

I think I was also worried that we would re-load the pointer itself. Then the old pointer may be non-null, but once we load the pointer again it may be null because another thread changed the reference. But now I thought about that again: that would really violate the Java Memory Model, you cannot duplicate the load of the pointer. So I suppose rather we got the old pointer from somewhere, and then we check if that old pointer is still valid in the barrier, and if not, we somehow directly translate the old pointer to a new pointer? Is that what the oop map is used for?

src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad line 130:

> 128:              Address::offset_ok_for_immed(ref_addr.offset(), exact_log2(size)),
> 129:              "an instruction that can be used for implicit null checking should emit the candidate memory access first");
> 130:       ref_addr = __ legitimize_address(ref_addr, size, rscratch2);

For context:

   132   /* Sometimes we get misaligned loads and stores, usually from Unsafe
   133      accesses, and these can exceed the offset range. */
   134   Address legitimize_address(const Address &a, int size, Register scratch) {                                                                                                                                         
   135     if (a.getMode() == Address::base_plus_offset) {
   136       if (! Address::offset_ok_for_immed(a.offset(), exact_log2(size))) {
   137         block_comment("legitimize_address {");
   138         lea(scratch, a);
   139         block_comment("} legitimize_address");
   140         return Address(scratch);
   141       }
   142     }
   143     return a;
   144   }

I wonder if it might be worth to create a `legitimize_address_requires_lea` that does the checks. Then you could refactor `legitimize_address` with it, and also use it here. Not sure if it is worth it, but it could ensure that the checks stay in sync. Up to you.

src/hotspot/share/opto/block.hpp line 468:

> 466: 
> 467:   // If necessary, hoist orphan node n into the end of block b.
> 468:   void maybe_hoist_into(Node* n, Block* b);

Hmm. It is "if necessary" or "if possible"?
I wonder if we could come up with a name that is a little longer and expresses this condition?

src/hotspot/share/opto/lcm.cpp line 79:

> 77: }
> 78: 
> 79: void PhaseCFG::move_into(Node* n, Block* b) {

Suggestion:

void PhaseCFG::move_node_and_its_projections_to_block(Node* n, Block* b) {

src/hotspot/share/opto/lcm.cpp line 89:

> 87:     if (!out->is_MachProj()) {
> 88:       continue;
> 89:     }

What about the `MachTemp`?

Also: how specific to implicit null checks are your methods `move_into` and `maybe_hoist_into`? If they are not reusable elsewhere, it may be good to give them a more specific name.

src/hotspot/share/opto/lcm.cpp line 105:

> 103:          "need for recursive hoisting not expected");
> 104:   move_into(n, b);
> 105: }

Do I understand this right: You are looking at some input `n` here, and want to make sure that it is located at `b` or before?
Suggestion to make it a bit more clear:
Suggestion:

// We want to ensure that n happens at b or before, i.e. at a block that dominates b.
void PhaseCFG::ensure_node_is_at_block_or_before(Node* n, Block* b) {
  Block* current = get_block_for_node(n);
  if (current->dominates(b)) {
    return; // n already happens before b, do nothing.
  }
  // We only expect nodes without further inputs, like MachTemp or load Base.
  assert(n->req() == 0 || (n->req() == 1 && n->in(0) == (Node*)C->root()),
         "need for recursive hoisting not expected");
  assert(b->dominates(current), "precondition: can only move n to b if b dominates n");
  move_node_and_its_projections_to_block(n, b);
}

I did not understand what this meant: `sanity check: temp node placement`... Ah, I suppose we are assuming that `n` is a `MachTemp`, and this would have to be placed in a block dominated by b? But could `n` not also be a `load Base`? Could that be a `MachProj`? Just a little confused here.
Maybe moving the `b->dominates(current)` assert down helps give good context?
But in a sense, it is also a precondition, we can only move `n` up to `b` if `b` dominates `n`...

Do you have a better idea?

src/hotspot/share/opto/lcm.cpp line 356:

> 354:       if (mach->in(j)->is_MachTemp()) {
> 355:         assert(mach->in(j)->outcnt() == 1, "MachTemp nodes should not be shared");
> 356:         // Ignore MachTemp inputs, they can be safely hoisted with the candidate.

Suggestion:

        // Ignore MachTemp inputs, they can be safely hoisted with the candidate.
        // MachTemp have no inputs themselves and are only there to reserve a scratch
        // register for the GC barrier of the memory operation.

That was what you told me in our offline meeting, I thought it was helpful context information.

src/hotspot/share/opto/lcm.cpp line 428:

> 426:         maybe_hoist_into(val->in(i), block);
> 427:       }
> 428:       move_into(val, block);

Suggestion:

        // Inputs of val may already be early enough, but if not move them together with val.
        ensure_node_is_at_block_or_before(val->in(i), block);
      }
      move_node_and_its_projections_to_block(val, block);

src/hotspot/share/opto/lcm.cpp line 437:

> 435:     if (n == nullptr || !n->is_MachTemp()) {
> 436:       continue;
> 437:     }

Do you want to check that all other nodes already dominate `block`?

src/hotspot/share/opto/lcm.cpp line 439:

> 437:     }
> 438:     maybe_hoist_into(n, block);
> 439:   }

It seems to me this is definitely new code, ensuring that we move the `MachTemp`. We did not do that before, at least not here. Correct?

src/hotspot/share/opto/lcm.cpp line 441:

> 439:       map_node_to_block(n, block);
> 440:     }
> 441:   }

This now happens in `move_into`, right?

src/hotspot/share/opto/machnode.hpp line 391:

> 389: 
> 390:   // Whether this node is expanded during code emission into a sequence of
> 391:   // instructions and the first instruction can perform an implicit null check.

You may want to put a warning / reasoning here, in case there are multiple loads.
You explained to me offline that a `zLoadP` may have a load at the beginning, but then need to load again if the GC moved the object. I suppose if it was moved, then it cannot be null, and so that should be safe... maybe that is a sufficient argument, what do you think?

test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 51:

> 49:  * @requires vm.gc.Z
> 50:  * @run driver compiler.gcbarriers.TestImplicitNullChecks Z
> 51:  */

Do you think there would be any value in having a run without requirements? Just for general result verification... i.e. that we get the correct NullPointerException.
Of course, you would have to probably add `applyIf` to the `@IR` rules.

test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 119:

> 117:                 testLoad(o);
> 118:             } catch (NullPointerException e) { nullPointerException = true; }
> 119:             Asserts.assertTrue(nullPointerException);

Suggestion:

            try {
                testLoad(o);
                throw new RuntimeException("Should have thrown NullPointerException");
            } catch (NullPointerException e) { /* expected */}

Could be a shorter alternative. Up to you. Maybe there is a benefit to `Asserts.assertTrue` I am also not aware of?
But totally optional, as your approach works anyway :)

test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 140:

> 138:     // G1 and ZGC stores cannot be currently used to implement implicit null
> 139:     // checks, because they expand into multiple memory access instructions that
> 140:     // are not necessarily located at the initial instruction start address.

Very random idea, no idea if it is any good:
Why not do the implicit null-check with a fake Load?
No idea on the implications here. I suppose it would be extra code, but at least not branching code?

-------------

PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2824535603
PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079357655
PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079437197
PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079476518
PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079430920
PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079473986
PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079420601
PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079480978
PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079486097
PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079509053
PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079488019
PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079491319
PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079493683
PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079500275
PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079505342