RFR: 8347335: ZGC: Use limitless mark stack memory

Tue Feb 11 20:38:40 UTC 2025

When ZGC performs marking, a lock-free data structure is used to keep track of objects that still need to be traced in the object traversal. This lock-free data structure uses versioned pointer as a technique to avoid ABA problems, prevalent when writing lock-free data structures. This required partitioning pointers in the structure to embed both a version and a location.

Due to the reduced addressability of locations with only a portion of the pointer bits, a special memory space was created to manage the data structure such that offsets could be encoded, instead of addresses.

Since the memory area needs to be contiguous, the JVM needs to know what the expected maximum size of this space will ever be, within some limiting bounds. That is what `-XX:ZMarkStackSpaceLimit` controls.

While this strategy has worked well in practice, the design does limit the scalability of ZGC, due to limits in how much contiguous memory can be encoded with a subset of the pointer bits. Not to mention that users have no idea what number to put in to this JVM option.

The `-XX:ZMarkStackSpaceLimit` JVM option is needed due to using a contiguous allocator to solve an ABA problem in a lock-free data structure. By selecting another solution for the ABA problem, the need for the special contiguous memory allocator and hence the JVM option can be removed.

This PR proposes a new solution for that original ABA problem in the lock-free data structure, which renders the entire machinery behind the `-XX:ZMarkStackSpaceLimit` JVM option redundant. The proposed technique is to use hazard pointers instead.

The use of hazard pointers is a well established safe memory reclamation (SMR) technique for writing lock-free data structures, that we also use in the Threads list. The main idea is to publish what pointer has been read with a hazard pointer, so that concurrent threads know not to free memory that is being concurrently used. Freeing of such racingly accessed memory is deferred until it is safe, hence solving the ABA problem. This also allows using plain malloc/free instead of a custom contiguous memory allocator for these structures.

Only popping nodes from the mark stacks requires hazard pointers, and only GC workers pop entries from the mark stacks. Therefore, hazard pointers may be stored in a per-worker variable.

I have measured throughput, latency, marking times and memory usage across a number of programs and platforms, and not seen any interesting changes in the behavior, other than having a more predictable and consistent native memory usage, instead of the slightly more temperamental behavior that we have today due to eagerly handing the mark stack memory back to the OS between GC cycles, while requiring it all back the next cycle.

With this change, another JVM option bites the dust. I have already gotten the CSR to obsolete the `-XX:ZMarkStackSpaceLimit` JVM option approved (cf. https://bugs.openjdk.org/browse/JDK-8349204).

-------------

Commit messages:
 - 8347335: ZGC: Use limitless mark stack memory

Changes: https://git.openjdk.org/jdk/pull/23571/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23571&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8347335
  Stats: 1189 lines in 26 files changed: 418 ins; 656 del; 115 mod
  Patch: https://git.openjdk.org/jdk/pull/23571.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23571/head:pull/23571

PR: https://git.openjdk.org/jdk/pull/23571