RFR: 8359683: ZGC: NUMA-Aware Relocation [v6]
Joel Sikström
jsikstro at openjdk.org
Wed Aug 27 08:41:20 UTC 2025
> Hello,
>
> With [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441), ZGC got infrastructure to prefer allocations to end up on a specific NUMA node. When a new object is allocated, it is preferably placed on the NUMA node that is local to the allocating thread. This strategy improves access speeds for mutators working on that object, if it continues to be used by threads on the same NUMA node. However, when relocating objects, ZGC will potentially move (migrate) objects away from the NUMA node they were originally allocated on. This means that if a page is selected as part of the Relocation Set, the objects on that page could potentially be moved to another NUMA node, breaking the NUMA locality we strived for when allocating.
>
> We should consider adding NUMA-awareness to ZGC's relocation phase to keep NUMA-locality benefits for mutators.
>
> <details>
>
> <summary><b>Proposal</b> (expandable section)</summary>
>
> NUMA-Awareness consists of two main features:
>
> **First**: GC threads should strive toward keeping the NUMA locality of objects to their original node, meaning that objects should ideally be relocated to a page that is on the same NUMA node.
>
> Mutator threads should have a different approach, as we know that the mutator that's (helping out with) relocating an object is also going to access it, so we migrate the object to the NUMA node associated with the relocating thread. This strategy is already in effect and does not require any changes to the code (specifically, ZObjectAllocator already track per-CPU specific Small pages). However, Medium pages are shared between CPUs and thus does not hold any guarantees on which NUMA node it is on. Combined, both mutator and Medium page relocation are not common, and thus there is little gain from introducing NUMA-awareness to that specific scenario. Instead, this can be addressed in a follow-up if we feel that's necessary.
>
> **Second**: When the GC chooses a page from the Relocation Set to relocate objects from, it should choose page(s) that are local to the same NUMA node, to speed up performance by working on NUMA-local memory. There are multiple ways to achieve this, but the main goal should be to (1) start working on pages that are local to the GC thread's NUMA node, and (2) when finished with pages on its own NUMA node, start working (help out) with pages associated with other NUMA nodes.
>
> Some key observations to consider with the above approach:
>
> * The NUMA node associated with the GC thread should be "polle...
Joel Sikström has updated the pull request incrementally with one additional commit since the last revision:
Fix grammar for iterator destruction comment
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/26898/files
- new: https://git.openjdk.org/jdk/pull/26898/files/f09cb290..8e754a11
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=26898&range=05
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=26898&range=04-05
Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
Patch: https://git.openjdk.org/jdk/pull/26898.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/26898/head:pull/26898
PR: https://git.openjdk.org/jdk/pull/26898
More information about the hotspot-gc-dev
mailing list