RFR: 8353184: ZGC: Simplify and correct tlab_used() tracking [v4]
Stefan Karlsson
stefank at openjdk.org
Thu May 8 16:06:04 UTC 2025
On Thu, 8 May 2025 10:06:41 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:
>> Please review this change to improve TLAB handling in ZGC.
>>
>> **Summary**
>> In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected.
>>
>> The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (`tlab_used()`) and the capacity available for TLABs (`tlab_capacity()`). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds:
>>
>> bool update_allocation_history = used > 0.5 * capacity;
>> ```
>>
>> So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected.
>>
>> Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause.
>>
>> How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead.
>>
>> This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly.
>>
>> **Testing**
>> * Functional testin...
>
> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision:
>
> Handle inc and dec in alloc/undo
I like this change. I've added a few comments below.
src/hotspot/share/gc/z/zTLABUsage.cpp line 32:
> 30: _used_history() {}
> 31:
> 32:
Suggestion:
src/hotspot/share/gc/z/zTLABUsage.cpp line 39:
> 37: void ZTLABUsage::decrease_used(size_t size) {
> 38: precond(size <= _used);
> 39: Atomic::sub(&_used, size, memory_order_relaxed);
Suggestion:
precond(size <= _used);
Atomic::sub(&_used, size, memory_order_relaxed);
src/hotspot/share/gc/z/zTLABUsage.cpp line 43:
> 41:
> 42: void ZTLABUsage::reset() {
> 43: const size_t current_used = Atomic::xchg(&_used, (size_t) 0);
Does this work instead?
Suggestion:
const size_t current_used = Atomic::xchg(&_used, 0u);
src/hotspot/share/gc/z/zTLABUsage.cpp line 51:
> 49:
> 50: // Save the old values for logging
> 51: const size_t old_used = used();
It's not immediately obvious what `_used` is compared to `used()` Could one of these be renamed so that readers don't mistakenly assume that `used()` returns `_used`.
-------------
PR Review: https://git.openjdk.org/jdk/pull/24814#pullrequestreview-2825630207
PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080009139
PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080009572
PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080010741
PR Review Comment: https://git.openjdk.org/jdk/pull/24814#discussion_r2080017958
More information about the hotspot-gc-dev
mailing list