RFR: 8341427: JFR: Adjust object sampler span handling

Fri Nov 22 15:56:16 UTC 2024

On Fri, 22 Nov 2024 15:31:33 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>>> Just a question, when a node is removed, why is the span pushed onto the younger neighbor? Wouldn't be better to emphasize the older neighbor since they've survived longer (and so are more likely to be a leak)?'
>> 
>> My thinking was that if it is removed, it's like it was never sampled. It would be as if the TLAB size were larger, and the span belongs to the next sample in time. I have a vague memory that the span at some point was split into younger and older samples, but I didn't go with that solution.
>> 
>> Let me think about it.
>
>> My thinking was that if it is removed, it's like it was never sampled. It would be as if the TLAB size were larger, and the span belongs to the next sample in time 
> 
> Thanks for the explanation. Ok I see what you mean. It think splitting it evenly to both makes sense as well.

@roberttoyonaga 

Background thread describing how I understand this algorithm to be intended to work https://mail.openjdk.org/pipermail/hotspot-jfr-dev/2024-May/006255.html

The goal is to get samples evenly spread over the entire allocation timeline. My understanding is that we want samples to account for the span "to their left" on the allocation timeline. A fresh sample will cover the span between itself and the previous sample. 

By giving the span of a removed sample to the younger neighbor, we get the spans adjusted as if we never had the removed sample. 

That's not the case if we give the span to the older neighbor or split the span between the two neighbors. 

A small example:

Sample 1 (span 0...10)
Sample 2 (span 10...20)
Sample 3 (span 20...30)

If we add another sample at byte 40 on the timeline and drop sample 2, I think we'd like to get this:

Sample 1 (0...10)
Sample 3 (10...30)
Sample 4 (30...40)

The spans of the samples are accurately representing which span of "time" on the allocation timeline the sample represents. In this case we'd be very likely to want to keep sample 3 because it covers a large span.

If we split the span instead, we'd get 

Sample 1 (0...15)
Sample 3 (15...30)
Sample 4 (30...40)

Sample 1 now claims to represent 0...15 on the timeline, even though the sample was actually created before the end of that interval. I think the effect this could have is to allow older samples an advantage in being kept, which might skew which samples we keep toward older samples, which causes the distribution of samples over the timeline to be uneven.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19334#issuecomment-2494079382