Priority redistribution in Old Object Sample event

Galder Zamarreno galder at redhat.com
Thu Jun 15 07:04:29 UTC 2023


Hi Erik,

Thanks for the quick response.

On Wed, Jun 14, 2023 at 10:50 AM Erik Gahlin <erik.gahlin at oracle.com> wrote:

> The intent of the code is to remove the sample from both the time-based
> list and the priority queue as the object is no longer alive. The weight
> of the object, the allocation span, in the time-based list, should be
> divided to its neighbour(s). The rationale for this is to keep samples
> evenly distributed over time, or to be exact, the amount allocation has
> happened. It will be as if the object was never sampled (as if the TLAB
> size would be larger)
>

Hmmmm, I'm still not sure I understand the rationale. What is the problem
that you were trying to solve with this logic?


> The priority queue exists to find the sampled object with the lowest
> allocation span quickly when the list fills up. It needs to be replaced
> with a sampled object with higher weight/span.
>

I have no doubts about the priority queue. The code and logic makes sense,
though one could argue if the complexity of the priority queue really pays
off. If you have leaks, not sure it really matters if you have few big
leaks or you have many small ones. E.g. the async profiler's object sampler
does not keep the queue ordered:

```
        if (_lock.tryLock()) {
            u32 start = (((uintptr_t)object >> 4) * 31 + ((uintptr_t)jni >>
4) + trace) & (MAX_REFS - 1);
            u32 i = start;
            do {
                jweak w = _refs[i];
                if (w == NULL || collected(w)) {
                    if (w != NULL) jni->DeleteWeakGlobalRef(w);
                    _refs[i] = wobject;
                    _values[i].size = size;
                    _values[i].trace = trace;
                    _values[i].time = TSC::ticks();
                    _lock.unlock();
                    return;
                }
            } while ((i = (i + 1) & (MAX_REFS - 1)) != start);

            _full = true;
            _lock.unlock();
        }
```

This would be better discussed separately. I can start a separate thread if
people want to revisit this.


> This may, or may not, be what the code does. It was written many years
> ago, and I need to re-read the code to check if that is really the case,
> but before digging deeper, I want to check if we are on the same page when
> it comes to the intent
>
> Erik
>
>
> ------------------------------
> *From:* hotspot-jfr-dev <hotspot-jfr-dev-retn at openjdk.org> on behalf of
> Galder Zamarreno <galder at redhat.com>
> *Sent:* Wednesday, June 14, 2023 6:30 AM
> *To:* hotspot-jfr-dev at openjdk.org <hotspot-jfr-dev at openjdk.org>
> *Subject:* Priority redistribution in Old Object Sample event
>
> Hi,
>
> I've been porting over JFR Old Object Sample event to GraalVM and I had
> the following doubt about the code in HotSpot:
>
> I'm trying to understand the code in [1]:
>
> ```
>   // push span onto previous
>   if (previous != NULL) {
>     _priority_queue->remove(previous);
>     previous->add_span(sample->span());
>     _priority_queue->push(previous);
>   }
> ```
>
> This code is related to old object sampling. When the queue is full,
> samples that have been GC'd are removed from the queue. To do so it takes
> its previous (from the list view) and adds the span of the removed dead
> object onto it. Basically, I'm wondering why this is done this way.
>
> My initial gut feeling when I first saw the code is that it does it so
> that the priority queue does not get too imbalanced (it uses a heap
> underneath) when removing objects, but that only makes sense to me if the
> previous is not the previous object sampled based on time, but the previous
> object sampled based on the order of the priority queue, so that's not what
> seems to happen.
>
> I'm wondering now if the fact that the list view is ordered by insertion
> time is relevant here. Is it maybe trying to give more importance (by
> increasing its span) to surviving objects that are near in time to those
> that have been garbage collected?
>
> Something that's probably related to this is how the span is initially
> calculated. This is done in [2], which you can find below:
>
> ```
> _total_allocated += allocated;
> const size_t span = _total_allocated - _priority_queue->total();
> ```
>
> Again when I first looked at it, I thought: well that's just equivalent to:
>
> ```
> const size_t span = allocated;
> ```
>
> But clearly things are more nuanced than that.
>
> The only documentation I've found so far is in Marcus' blog post [3]:
>
> > If sampled objects are garbage collected, they are removed and the
> priority redistributed to the neighbours. The priority is called span, and
> is currently the size of the allocation, giving more weight to larger
> (therefore more severe) leaks.
>
> Unfortunately what is a neighbour is not explained, nor why the priority
> gets redistributed.
>
> Thanks
> Galder
>
> [1]
> https://github.com/openjdk/jdk17u-dev/blob/master/src/hotspot/share/jfr/leakprofiler/sampling/objectSampler.cpp#L264-L269
> [2]
> https://github.com/openjdk/jdk17u-dev/blob/ef86ea2842b1a204834291d9d6665bfcd7b75fbc/src/hotspot/share/jfr/leakprofiler/sampling/objectSampler.cpp#L212-L213
> [3] https://hirt.se/blog/?p=1055
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-jfr-dev/attachments/20230615/96e85126/attachment-0001.htm>


More information about the hotspot-jfr-dev mailing list