RFR: 8341427: JFR: Adjust object sampler span handling

Fri Nov 22 13:11:25 UTC 2024

On Tue, 21 May 2024 18:37:35 GMT, Stig Døssing <duke at openjdk.org> wrote:

> The span stored in each sample is not the calculated span, it's just the object's byte size (`allocated`). That means as soon as any object falls out of the queue, the spans in the queue no longer sum to cover the allocation timeline. This causes all future samples to be added to be unduly prioritized for adding to the queue, because they are given an artificially high span. In effect, future samples are weighted as if they cover both the interval between themselves and the older neighbor sample, plus all "missing spans" from nodes that have been discarded since the program started.
> 
> Changed object samples to store the calculated span rather than the bytes allocated for the sampled object.
> 
> When a sample is removed from the queue because a sample with a larger span is being added, the span of the removed node is not handed to the younger neighbor, this only happens when a sample is removed due to GC. This means that the span will be given to the next sample added to the queue. When the sample being removed is the youngest sample, this is fine, but when it's a sample that has a younger neighbor, the span should probably be given to that neighbor rather than the newcomer. Handing it to the newcomer gives the new sample a high weight it doesn't deserve. It ends up covering not just the span to the older neighbor, but also the span of the removed node, which is not what we want.
> 
> When replacing a sample in the queue, give the span of the removed sample to the younger neighbor. If there is no such neighbor, because the youngest sample is being replaced, give the span to the node being added instead, as that will become the new youngest sample.

cc @egahlin, as per this thread https://mail.openjdk.org/pipermail/hotspot-jfr-dev/2024-May/006264.html

Master currently does not compile, so I checked that these changes can at least build when applied to the jdk-23+23 tag. Maybe you can give me a hint about which tests to run, the full tier-1 set takes a long time, and showed a few failures for me, even with no change to the code?

Employer: Crowdstrike

Anything I can do to speed up the OCA check? The agreement for my employer seems like it was approved a while ago, as it's listed on https://oca.opensource.oracle.com/?ojr=contrib-list.

Yes, done

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19334#issuecomment-2123227089
PR Comment: https://git.openjdk.org/jdk/pull/19334#issuecomment-2383436078
PR Comment: https://git.openjdk.org/jdk/pull/19334#issuecomment-2414676572
PR Comment: https://git.openjdk.org/jdk/pull/19334#issuecomment-2493609410