RFR: 8370345: Parallel: Rework TLAB accounting in MutableNUMASpace [v3]

Mon Oct 27 11:36:29 UTC 2025

> Hello,
> 
> Parallel's MutableNUMASpace is the only GC interface that uses the Thread parameter passed through the general CollectedHeap interface to tlab_capacity, tlab_used, and unsafe_max_tlab_alloc. It would be nice if Parallel's MutableNUMASpace could do without the Thread and instead find a thread-agnostic approach. By removing the need for the thread, it becomes possible to clean up the shared CollectedHeap interface, which makes it easier to read and maintain all GCs. Also, the lgrp_id that is stored in the Thread class should really have been moved to GCThreadLocalData after that concept was created, but with a thread-agnostic approach, the field can be removed entirely.
> 
> The current solution is not without problems. When a new allocation is made inside one of the LGRP spaces in MutableNUMASpace using cas_allocate(), the NUMA/LGRP id is polled and stored inside the Thread, and we only attempt to allocate on that LGRP. If allocation fails on the local LGRP, we do not try to allocate on any other (remote) LGRP(s). This fact is reflected in the TLAB accounting methods tlab_capacity, tlab_used, and unsafe_max_tlab_alloc, which only check how much memory is used, etc., for the LGRP matching the stored LGRP id in the Thread. This model breaks down when threads are allowed to migrate between different CPUs, and therefore also NUMA nodes, which might change the LGRP id.
> 
> For example, a system with two NUMA nodes gives us two LGRPs with ids 0 and 1. If a thread allocates most of its memory on LGRP 0 and then migrates to a CPU on LGRP 1, the thread will show that it allocated a significant amount of memory, but the used memory on the LGRP it is currently on could be very low. This would give a disproportionate allocation fraction. This is not a problem as the TLAB code accounts for this, but for a different reason entirely. The other way around could also be problematic. If a thread allocates very little memory on LGRP 0 and then migrates to LGRP 1, where another thread has allocated a lot of memory, the allocation fraction will be very low, when it could have a really high fraction if accounting for the used memory on its original LGRP.
> 
> A solution to both of these issues is to average the capacity, used, and available memory across all LGRPs for the TLAB accounting methods. This approach provides a more accurate and stable view of memory usage and availability, regardless of thread migration or imbalances in NUMA/LGRP allocation. However, there are trade-offs...

Joel Sikström has updated the pull request incrementally with one additional commit since the last revision:

  Comment on unsafe_max_tlab_alloc alignment

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27935/files
  - new: https://git.openjdk.org/jdk/pull/27935/files/c8a2182f..c4cd91a8

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27935&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27935&range=01-02

  Stats: 10 lines in 1 file changed: 9 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27935.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27935/head:pull/27935

PR: https://git.openjdk.org/jdk/pull/27935