RFR: 8341427: JFR: Adjust object sampler span handling

Sun Nov 24 23:57:21 UTC 2024

On Fri, 22 Nov 2024 15:53:14 GMT, Stig Døssing <duke at openjdk.org> wrote:

>>> My thinking was that if it is removed, it's like it was never sampled. It would be as if the TLAB size were larger, and the span belongs to the next sample in time 
>> 
>> Thanks for the explanation. Ok I see what you mean. It think splitting it evenly to both makes sense as well.
>
> @roberttoyonaga 
> 
> Background thread describing how I understand this algorithm to be intended to work https://mail.openjdk.org/pipermail/hotspot-jfr-dev/2024-May/006255.html
> 
> The goal is to get samples evenly spread over the entire allocation timeline. My understanding is that we want samples to account for the span "to their left" on the allocation timeline. A fresh sample will cover the span between itself and the previous sample. 
> 
> By giving the span of a removed sample to the younger neighbor, we get the spans adjusted as if we never had the removed sample. 
> 
> That's not the case if we give the span to the older neighbor or split the span between the two neighbors. 
> 
> A small example:
> 
> Sample 1 (span 0...10)
> Sample 2 (span 10...20)
> Sample 3 (span 20...30)
> 
> If we add another sample at byte 40 on the timeline and drop sample 2, I think we'd like to get this:
> 
> Sample 1 (0...10)
> Sample 3 (10...30)
> Sample 4 (30...40)
> 
> The spans of the samples are accurately representing which span of "time" on the allocation timeline the sample represents. In this case we'd be very likely to want to keep sample 3 because it covers a large span.
> 
> If we split the span instead, we'd get 
> 
> Sample 1 (0...15)
> Sample 3 (15...30)
> Sample 4 (30...40)
> 
> Sample 1 now claims to represent 0...15 on the timeline, even though the sample was actually created before the end of that interval. I think the effect this could have is to allow older samples an advantage in being kept, which might skew which samples we keep toward older samples, which causes the distribution of samples over the timeline to be uneven.
> 
> Edit:
> What I'm getting at is that if the goal is to keep evenly distributed samples over the allocation timeline, then sample 3 should be preferred over sample 1 when future samples arrive, and if we split the span, it won't be. 
> 
> When more samples arrive, I think it is better to keep a sample taken at byte 30 (sample 3) than to keep a sample taken at byte 10 (sample 1), if we're going for an even distribution of the samples.

@srdo Thanks for you contribution!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19334#issuecomment-2496406805