RFR: 8356372: JVMTI heap sampling not working properly with outside TLAB allocations [v3]

Serguei Spitsyn sspitsyn at openjdk.org
Fri May 16 18:45:55 UTC 2025


On Wed, 14 May 2025 12:47:08 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> While working on improving the TLAB sizing code for ZGC @kstefanj ran into an issue with the following tests failing:
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorInterpreterObjectTest.java
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java
>> 
>> The reason for seeing the problems now is that with the new sizing code ZGC used smaller TLABs at first, before resizing them to a proper size (to lower the waste). In the HeapMonitor tests we don't allocate enough to trigger GCs that will resize the TLABs so most of the tests will now run with small TLABs. This should not be a problem, but it turns out the current sampling code is not working properly when you get a lot of outside TLAB allocations. You get those when trying to allocate a fairly large object (~1400B) that won't fit into the TLAB, but there are still quite a bit of room in the TLAB so we don't want to throw it away and take a new one.
>> 
>> The problem in the current code is that we keep track of when to sample with multiple variables and when getting out of TLAB allocations these get out of sync.
>> 
>> The proposed patch is the result of a restructuring and fixing of the the code that me and @kstefanj did to fix this issue.
>> 
>> The idea is to better account how much we have allocated in three different buckets:
>> * Outside of TLAB allocations
>> * Accounted TLAB allocations
>> * The last bit of TLAB allocations that hasn't been accounted yet
>> 
>> And then use the sum of that to compare against a *non-changing* threshold to see if it is time to take a sample.
>> 
>> There are a few things to think about when reading this code:
>> * The thread can allocate and retire multiple TLABs before we cross the sample threshold.
>> * The sampling can take multiple samples in a single TLAB
>> * Any overshooting of the sample threshold triggers only one sample and the extra bytes are ignored when checking for the next sample.
>> 
>> There are some restructuring in the patch to confine the ThreadHeapSampler variables and code. For example:
>> 
>> 1) Moved accounting variables out of TLAB and into the ThreadHeapSampler
>> 2) Moved thread allocation accounting and sampling code out of the TLAB
>> 3) Moved retiring of TLABs out of make_parseable (needed to support (2))
>> 
>> Some of that could be extracted into a separate PR if that's deemed worthwhile.
>> 
>> Tested with the HeapMonitor tests various TLAB sizes.
>> 
>> If there are anyone using these APIs it would be nice to get feedba...
>
> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Apply suggestions from code review
>   
>   Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com>

src/hotspot/share/runtime/threadHeapSampler.cpp line 399:

> 397:   assert(result > 0 && result < static_cast<double>(SIZE_MAX), "Result is not in an acceptable range.");
> 398:   size_t interval = static_cast<size_t>(result);
> 399:   _sample_threshold = interval;

Nit: The line 399 is not needed as the value is reset at 400.

src/hotspot/share/runtime/threadHeapSampler.hpp line 89:

> 87: 
> 88:     // Call this after _rnd is initialized to initialize _bytes_until_sample.
> 89:     pick_next_sample();

The old identifier is used: _bytes_until_sample.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25114#discussion_r2093526227
PR Review Comment: https://git.openjdk.org/jdk/pull/25114#discussion_r2093523489


More information about the hotspot-dev mailing list