RFR: 8359683: ZGC: NUMA-Aware Relocation

Fri Aug 22 10:34:34 UTC 2025

Hello,

With [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441), ZGC got infrastructure to prefer allocations to end up on a specific NUMA node. When a new object is allocated, it is preferably placed on the NUMA node that is local to the allocating thread. This strategy improves access speeds for mutators working on that object, if it continues to be used by threads on the same NUMA node. However, when relocating objects, ZGC will potentially move (migrate) objects away from the NUMA node they were originally allocated on. This means that if a page is selected as part of the Relocation Set, the objects on that page could potentially be moved to another NUMA node, breaking the NUMA locality we strived for when allocating.

We should consider adding NUMA-awareness to ZGC's relocation phase to keep NUMA-locality benefits for mutators.

<details>

<summary><b>Proposal</b> (expandable section)</summary>

NUMA-Awareness consists of two main features:

**First**: GC threads should strive toward keeping the NUMA locality of objects to their original node, meaning that objects should ideally be relocated to a page that is on the same NUMA node.

Mutator threads should have a different approach, as we know that the mutator that's (helping out with) relocating an object is also going to access it, so we migrate the object to the NUMA node associated with the relocating thread. This strategy is already in effect and does not require any changes to the code (specifically, ZObjectAllocator already track per-CPU specific Small pages). However, Medium pages are shared between CPUs and thus does not hold any guarantees on which NUMA node it is on. Combined, both mutator and Medium page relocation are not common, and thus there is little gain from introducing NUMA-awareness to that specific scenario. Instead, this can be addressed in a follow-up if we feel that's necessary.

**Second**: When the GC chooses a page from the Relocation Set to relocate objects from, it should choose page(s) that are local to the same NUMA node, to speed up performance by working on NUMA-local memory. There are multiple ways to achieve this, but the main goal should be to (1) start working on pages that are local to the GC thread's NUMA node, and (2) when finished with pages on its own NUMA node, start working (help out) with pages associated with other NUMA nodes.

Some key observations to consider with the above approach:

* The NUMA node associated with the GC thread should be "polled"/"checked" in regular intervals, to account for the fact that the GC thread might have migrated to another CPU, and thus perhaps to another NUMA node. It is probably enough to check the associated NUMA node before claiming a new page and starting to relocate objects.

* By choosing pages based on NUMA-node association rather than live bytes, we might not start with the most sparse page first. This is really only a problem if the machine is fully saturated and there are allocation stalls. Additionally, it is worth considering that in a common NUMA configuration, it takes twice as long to access remote memory compared to local memory. This means that a local page could (theoretically) be relocated twice as fast as a remote page, which could release memory faster than starting with the most sparse page, if that page is on a remote node.

* The new strategy is more of an optimization for mutators and might make the GC take a bit longer to complete the relocation phase. The current strategy is to move objects to the NUMA node associated with the GC thread, regardless of where the object was originally from. This makes the GC fast, at the potential downside of mutators not accessing local memory any more. However, since ZGC is a concurrent garbage collector, it isn't really a huge issue if the relocation phase becomes a bit longer, if the mutators receive a speedup.

* Depending on the distribution of what NUMA node GC threads end up on, we might see a negative impact on performance. This hasn't changed before or after this proposal, but with NUMA-awareness implemented, it would be possible to explore future enhancements where threads are placed on a core associated with a particular NUMA-node so that threads can work on NUMA-local memory more often.

* To make the logic of this patch easier, multi-partition allocations for Medium pages has been disabled. After [JDK-8357449](https://bugs.openjdk.org/browse/JDK-8357449), which enables variable sizes for Medium pages all the way down to 4MB, there is little gain in enabling multi-partition for Medium pages. If a Medium page allocation would only succeed with multi-partition enabled, we do not have 4MB contiguous memory available, which would only happen if memory is extremely low or if the heap is degeneratively swiss-cheesed into 2MB chunks.

</details>

## Testing

* tier1-8 ZGC tasks only, on all Oracle supported platforms, without NUMA
* tier1-8 with -XX:ZFakeNUMA=16 on linux-x64-debug
* tier1-3 with NUMA on linux-x64-debug

Performance testing shows no regression when NUMA is disabled or not available.

Testing before/after the patch shows that GC thread's relocating objects from/to NUMA-local pages has gone from about 50% up to 95%. This depends very much on the distribution of threads placed on specific NUMA-nodes, but this shows the new strategy works as intended.

There is no apparent speedup or slowdown in the time it takes to complete the Relocation Phase with NUMA enabled. This could very well be due to the change in the strategy of choosing the destination NUMA node. Before we would always relocate an object to the NUMA node that is local to the GC thread. Now, we maintain the NUMA-locality when relocating, which means that if we relocate an object on a remote NUMA node, we will get worse performance as both the source and target destination is on a remote NUMA node.

-------------

Commit messages:
 - 8359683: ZGC: NUMA-Aware Relocation

Changes: https://git.openjdk.org/jdk/pull/26898/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26898&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8359683
  Stats: 246 lines in 15 files changed: 151 ins; 19 del; 76 mod
  Patch: https://git.openjdk.org/jdk/pull/26898.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26898/head:pull/26898

PR: https://git.openjdk.org/jdk/pull/26898