Priority redistribution in Old Object Sample event
Galder Zamarreno
galder at redhat.com
Wed Jan 24 09:45:58 UTC 2024
Erik,
We never got an answer to Thomas' concerns that total_allocate only ever
increases. Is that right? See below:
On Wed, Jun 14, 2023 at 9:56 AM Thomas Stüfe <thomas.stuefe at gmail.com>
wrote:
> ...
>
> - I am looking at the code that calculates the span in the first place:
>
> https://github.com/tstuefe/jdk/blob/ba837b4bfa2dea85653d8a8fccd0817a569b4378/src/hotspot/share/jfr/leakprofiler/sampling/objectSampler.cpp#L229-L230
>
> The span of a new sample is calculated as "_total_allocated (as of now) -
> sum-of-all-spans-of-samples-in-prio-queue". _total_allocated increases
> monotonously, whereas the prio queue only contains *retained* samples. This
> means that the new sample's span is the "what had been allocated since the
> last sample was taken" + "all spans of samples that had been discarded". Is
> that intentional? Would this not mean that with prolonged sampling, the
> spans get ever larger?
>
> I may be reading this code wrong.
>
> Thank you!
>
> ..Thomas
>
>
>
>
>
> On Wed, Jun 14, 2023 at 9:37 AM Erik Gahlin <erik.gahlin at oracle.com>
> wrote:
>
>> The intent of the code is to remove the sample from both the time-based
>> list and the priority queue as the object is no longer alive. The weight
>> of the object, the allocation span, in the time-based list, should be
>> divided to its neighbour(s). The rationale for this is to keep samples
>> evenly distributed over time, or to be exact, the amount allocation has
>> happened. It will be as if the object was never sampled (as if the TLAB
>> size would be larger)
>>
>> The priority queue exists to find the sampled object with the lowest
>> allocation span quickly when the list fills up. It needs to be replaced
>> with a sampled object with higher weight/span.
>>
>> This may, or may not, be what the code does. It was written many years
>> ago, and I need to re-read the code to check if that is really the case,
>> but before digging deeper, I want to check if we are on the same page when
>> it comes to the intent
>>
>> Erik
>>
>>
>> ------------------------------
>> *From:* hotspot-jfr-dev <hotspot-jfr-dev-retn at openjdk.org> on behalf of
>> Galder Zamarreno <galder at redhat.com>
>> *Sent:* Wednesday, June 14, 2023 6:30 AM
>> *To:* hotspot-jfr-dev at openjdk.org <hotspot-jfr-dev at openjdk.org>
>> *Subject:* Priority redistribution in Old Object Sample event
>>
>> Hi,
>>
>> I've been porting over JFR Old Object Sample event to GraalVM and I had
>> the following doubt about the code in HotSpot:
>>
>> I'm trying to understand the code in [1]:
>>
>> ```
>> // push span onto previous
>> if (previous != NULL) {
>> _priority_queue->remove(previous);
>> previous->add_span(sample->span());
>> _priority_queue->push(previous);
>> }
>> ```
>>
>> This code is related to old object sampling. When the queue is full,
>> samples that have been GC'd are removed from the queue. To do so it takes
>> its previous (from the list view) and adds the span of the removed dead
>> object onto it. Basically, I'm wondering why this is done this way.
>>
>> My initial gut feeling when I first saw the code is that it does it so
>> that the priority queue does not get too imbalanced (it uses a heap
>> underneath) when removing objects, but that only makes sense to me if the
>> previous is not the previous object sampled based on time, but the previous
>> object sampled based on the order of the priority queue, so that's not what
>> seems to happen.
>>
>> I'm wondering now if the fact that the list view is ordered by insertion
>> time is relevant here. Is it maybe trying to give more importance (by
>> increasing its span) to surviving objects that are near in time to those
>> that have been garbage collected?
>>
>> Something that's probably related to this is how the span is initially
>> calculated. This is done in [2], which you can find below:
>>
>> ```
>> _total_allocated += allocated;
>> const size_t span = _total_allocated - _priority_queue->total();
>> ```
>>
>> Again when I first looked at it, I thought: well that's just equivalent
>> to:
>>
>> ```
>> const size_t span = allocated;
>> ```
>>
>> But clearly things are more nuanced than that.
>>
>> The only documentation I've found so far is in Marcus' blog post [3]:
>>
>> > If sampled objects are garbage collected, they are removed and the
>> priority redistributed to the neighbours. The priority is called span, and
>> is currently the size of the allocation, giving more weight to larger
>> (therefore more severe) leaks.
>>
>> Unfortunately what is a neighbour is not explained, nor why the priority
>> gets redistributed.
>>
>> Thanks
>> Galder
>>
>> [1]
>> https://github.com/openjdk/jdk17u-dev/blob/master/src/hotspot/share/jfr/leakprofiler/sampling/objectSampler.cpp#L264-L269
>> [2]
>> https://github.com/openjdk/jdk17u-dev/blob/ef86ea2842b1a204834291d9d6665bfcd7b75fbc/src/hotspot/share/jfr/leakprofiler/sampling/objectSampler.cpp#L212-L213
>> [3] https://hirt.se/blog/?p=1055
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-jfr-dev/attachments/20240124/2e78bac2/attachment-0001.htm>
More information about the hotspot-jfr-dev
mailing list