RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking

Wed Apr 23 08:03:41 UTC 2025

Please review this change to improve TLAB handling in ZGC.

**Summary**
In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected.

The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds: 

bool update_allocation_history = used > 0.5 * capacity;
```  

So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected.

Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause. 

How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead.

This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly. 

**Testing**
* Functional testing tier1-tier7
* Performance testing in Aurora is neutral
* Manual testing looking at TLAB waste shows a clear reduction, in some scenarios the waste could previously be above 2% and now it is below 1%
* Manual verification that the worse case pauses are shorter due to the reduced work in the mark start pause

-------------

Commit messages:
 - Change memory order to relaxed
 - More TLABUsage fixes
 - Junba-space
 - Fixes for eden tracking
 - Fixes for TLABUsage
 - Track eden-usage in page allocator
 - Add class to keep track of TLAB usage
 - Max tlab size should be reported in words

Changes: https://git.openjdk.org/jdk/pull/24814/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24814&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8353184
  Stats: 195 lines in 12 files changed: 145 ins; 41 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/24814.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24814/head:pull/24814

PR: https://git.openjdk.org/jdk/pull/24814