RFR: 8258431: Provide a JFR event with live set size estimate [v12]

Tue Mar 16 12:26:14 UTC 2021

On Tue, 16 Mar 2021 11:51:45 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

> So one of the actual purposes seems to be some kind of leak detection: there is this JFR leak detector (I only know the feature name, not completely how it works and what its overhead is) for this purpose, wouldn't that work?

Yes. But enabling that comes with an extra price so it is more of a focused tool than something you could use in continuous monitoring/signal evaluation.

> Also, for this purpose, why would used() not be a good substitute for liveness? If e.g. used() average grows over time you can deduce the same I would assume (particularly used() after mixed gc phase in g1).

The major problem is that eg. for g1 given large enough heap the used value can keep on growing for quite long time, possibly generating wrong signal about potential memory leak. 

If the live estimate is set to `used()` after mixed gc phase in g1 I think it still will be a good estimate. 
The only thing I am opposing is having `live()` call return the current `used()` value which, IMO, might become rather confusing.

> Do you have any numbers on what the impact of using used() vs. this live() would be in such a use case?

Nope. Do you mean perf impact?

> What I'm afraid of is that mixing values taken at different times - used and capacity are taken at the time of the event, and the liveness estimated updated at other, irregular intervals may cause significiant amount of confusion in interpreting this value. It might be obvious to you, but there will be other users.

IDK. If the event field would explicitly mention that this is the **last known live size estimate** it should set the expectations right.

>
>One option could be detaching the liveness estimate from used()/capacity() (I see a value in having some heap usage summary at regular intervals) and send the liveness estimate event just when they are generated? Then the various collectors could send this liveness value at times when they think they are fairly accurate, not when the collectors must and particularly not in conjunction with samples taken at completely different times.

The problem is the irregularity - when the live size is reported only when it is calculated there might be long periods in the recording missing the live size data at all. In order for this information to be useful it should be reported at least at the beginning and end of a JFR chunk.

> Independent of whether used/capacity and liveness are sent, the receiver needs to do statistics (trend lines) on those anyway.

Yes. It's just that with the live size estimate one wouldn't be getting the false positives one would get with used heap trend.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579