[External] : Re: Priority redistribution in Old Object Sample event

Erik Gahlin erik.gahlin at oracle.com
Wed Jan 24 14:01:35 UTC 2024


When an entry in the list is removed due to GC it should distribute its span to neighbors (and the priority updated).

That way, the total in the queue should be total allocated since the beginning.

(I'm not reading the code, but describing the intent)

Erik

________________________________
From: Galder Zamarreno <galder at redhat.com>
Sent: Wednesday, January 24, 2024 10:45 AM
To: Thomas Stüfe <thomas.stuefe at gmail.com>
Cc: Erik Gahlin <erik.gahlin at oracle.com>; hotspot-jfr-dev at openjdk.org <hotspot-jfr-dev at openjdk.org>
Subject: [External] : Re: Priority redistribution in Old Object Sample event

Erik,

We never got an answer to Thomas' concerns that total_allocate only ever increases. Is that right? See below:

On Wed, Jun 14, 2023 at 9:56 AM Thomas Stüfe <thomas.stuefe at gmail.com<mailto:thomas.stuefe at gmail.com>> wrote:
...

- I am looking at the code that calculates the span in the first place:
https://github.com/tstuefe/jdk/blob/ba837b4bfa2dea85653d8a8fccd0817a569b4378/src/hotspot/share/jfr/leakprofiler/sampling/objectSampler.cpp#L229-L230<https://urldefense.com/v3/__https://github.com/tstuefe/jdk/blob/ba837b4bfa2dea85653d8a8fccd0817a569b4378/src/hotspot/share/jfr/leakprofiler/sampling/objectSampler.cpp*L229-L230__;Iw!!ACWV5N9M2RV99hQ!JgmmQAyJeNPRWOg8r8IUbg5HQlS3djiF8gehNTDM8FhGq4faiWXGnrmTJBOvt3ApJeSj8dDPVt7y-DwY$>

The span of a new sample is calculated as "_total_allocated (as of now) - sum-of-all-spans-of-samples-in-prio-queue". _total_allocated increases monotonously, whereas the prio queue only contains *retained* samples. This means that the new sample's span is the "what had been allocated since the last sample was taken" + "all spans of samples that had been discarded". Is that intentional? Would this not mean that with prolonged sampling, the spans get ever larger?

I may be reading this code wrong.

Thank you!

..Thomas





On Wed, Jun 14, 2023 at 9:37 AM Erik Gahlin <erik.gahlin at oracle.com<mailto:erik.gahlin at oracle.com>> wrote:
The intent of the code is to remove the sample from both the time-based list and the priority queue as the object is no longer alive. The weight of the object, the allocation span, in the time-based list, should be divided to its neighbour(s). The rationale for this is to keep samples evenly distributed over time, or to be exact, the amount allocation has happened. It will be as if the object was never sampled (as if the TLAB size would be larger)

The priority queue exists to find the sampled object with the lowest allocation span quickly when the list fills up. It needs to be replaced with a sampled object with higher weight/span.

This may, or may not, be what the code does. It was written many years ago, and I need to re-read the code to check if that is really the case, but before digging deeper, I want to check if we are on the same page when it comes to the intent

Erik


________________________________
From: hotspot-jfr-dev <hotspot-jfr-dev-retn at openjdk.org<mailto:hotspot-jfr-dev-retn at openjdk.org>> on behalf of Galder Zamarreno <galder at redhat.com<mailto:galder at redhat.com>>
Sent: Wednesday, June 14, 2023 6:30 AM
To: hotspot-jfr-dev at openjdk.org<mailto:hotspot-jfr-dev at openjdk.org> <hotspot-jfr-dev at openjdk.org<mailto:hotspot-jfr-dev at openjdk.org>>
Subject: Priority redistribution in Old Object Sample event

Hi,

I've been porting over JFR Old Object Sample event to GraalVM and I had the following doubt about the code in HotSpot:

I'm trying to understand the code in [1]:

```
  // push span onto previous
  if (previous != NULL) {
    _priority_queue->remove(previous);
    previous->add_span(sample->span());
    _priority_queue->push(previous);
  }
```

This code is related to old object sampling. When the queue is full, samples that have been GC'd are removed from the queue. To do so it takes its previous (from the list view) and adds the span of the removed dead object onto it. Basically, I'm wondering why this is done this way.

My initial gut feeling when I first saw the code is that it does it so that the priority queue does not get too imbalanced (it uses a heap underneath) when removing objects, but that only makes sense to me if the previous is not the previous object sampled based on time, but the previous object sampled based on the order of the priority queue, so that's not what seems to happen.

I'm wondering now if the fact that the list view is ordered by insertion time is relevant here. Is it maybe trying to give more importance (by increasing its span) to surviving objects that are near in time to those that have been garbage collected?

Something that's probably related to this is how the span is initially calculated. This is done in [2], which you can find below:

```
_total_allocated += allocated;
const size_t span = _total_allocated - _priority_queue->total();
```

Again when I first looked at it, I thought: well that's just equivalent to:

```
const size_t span = allocated;
```

But clearly things are more nuanced than that.

The only documentation I've found so far is in Marcus' blog post [3]:

> If sampled objects are garbage collected, they are removed and the priority redistributed to the neighbours. The priority is called span, and is currently the size of the allocation, giving more weight to larger (therefore more severe) leaks.

Unfortunately what is a neighbour is not explained, nor why the priority gets redistributed.

Thanks
Galder

[1] https://github.com/openjdk/jdk17u-dev/blob/master/src/hotspot/share/jfr/leakprofiler/sampling/objectSampler.cpp#L264-L269<https://urldefense.com/v3/__https://github.com/openjdk/jdk17u-dev/blob/master/src/hotspot/share/jfr/leakprofiler/sampling/objectSampler.cpp*L264-L269__;Iw!!ACWV5N9M2RV99hQ!JgmmQAyJeNPRWOg8r8IUbg5HQlS3djiF8gehNTDM8FhGq4faiWXGnrmTJBOvt3ApJeSj8dDPVvH06sry$>
[2] https://github.com/openjdk/jdk17u-dev/blob/ef86ea2842b1a204834291d9d6665bfcd7b75fbc/src/hotspot/share/jfr/leakprofiler/sampling/objectSampler.cpp#L212-L213<https://urldefense.com/v3/__https://github.com/openjdk/jdk17u-dev/blob/ef86ea2842b1a204834291d9d6665bfcd7b75fbc/src/hotspot/share/jfr/leakprofiler/sampling/objectSampler.cpp*L212-L213__;Iw!!ACWV5N9M2RV99hQ!JgmmQAyJeNPRWOg8r8IUbg5HQlS3djiF8gehNTDM8FhGq4faiWXGnrmTJBOvt3ApJeSj8dDPVmGdkFVr$>
[3] https://hirt.se/blog/?p=1055<https://urldefense.com/v3/__https://hirt.se/blog/?p=1055__;!!ACWV5N9M2RV99hQ!JgmmQAyJeNPRWOg8r8IUbg5HQlS3djiF8gehNTDM8FhGq4faiWXGnrmTJBOvt3ApJeSj8dDPVu1fE2fz$>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-jfr-dev/attachments/20240124/ad9251ab/attachment-0001.htm>


More information about the hotspot-jfr-dev mailing list