Proposal: Always-on Statistical History

Simon Roberts simon at dancingcloudservices.com
Thu Nov 15 17:07:52 UTC 2018


I don't begin to claim to know the politics, legalities, boundaries of JFR
license conditionsm and so forth" but:

Java Flight Recorder requires a commercial license for use in production."

Whereas, this as I understand is the *open* jdk list. So, I for one would
feel hard done by if your view prevailed and only the paying clients got
access to a valuable feature.


On Thu, Nov 15, 2018 at 9:40 AM Roger Riggs <Roger.Riggs at oracle.com> wrote:

> Hi,
>
> This looks like it has significant overlap with JFR.
> I don't think we want to start building in multiple mechanisms to keep
> tabs on a running VM.
>
> $.02, Roger
>
>
> On 11/14/2018 04:27 PM, Thomas Stüfe wrote:
> > Hi Bernd,
> >
> > On Wed, Nov 14, 2018 at 10:07 PM Bernd Eckenfels <ecki at zusammenkunft.net>
> wrote:
> >> Looks good Thomas,
> > thanks!
> >
> >> what would be the typical memory usage with the Default Settings?
> > ~ 80 Kb. Its very small.
> >
> >> Does the downsampling support min/max style rollups?
> > Not sure what you mean. Do you mean does it preserve peaks? Not yet,
> > such a feature would have to be added.
> >
> > Right now, downsampling is very primitive for performance reasons. For
> > snapshot values like heap size etc we just throw away the samples, so
> > you loose temporary peaks. For counter-like values-over-time (e.g.
> > number of pages swapped in etc), they just refer then to a larger time
> > span.
> >
> > Best Regards, Thomas
> >
> >>
> >>
> >> --
> >> http://bernd.eckenfels.net
> >>
> >>
> >>
> >> Von: Thomas Stüfe
> >> Gesendet: Mittwoch, 14. November 2018 16:29
> >> An: serviceability-dev at openjdk.java.net
> serviceability-dev at openjdk.java.net
> >> Betreff: Proposal: Always-on Statistical History
> >>
> >>
> >>
> >> Hi all,
> >>
> >>
> >>
> >> We have that feature in our port which we would like to contribute,
> >>
> >> and I would like to gauge opinions.
> >>
> >>
> >>
> >> First off, I am not sure which list is correct. This is more of a
> >>
> >> serviceability issue, but implementation wise it fit hs-runtime
> >>
> >> better. I'll start with serviceability, but feel free crosspost if
> >>
> >> needed.
> >>
> >>
> >>
> >> Second, I am aware that this may require a JEP. If necessary and the
> >>
> >> feedback is positive, I will draft one.
> >>
> >>
> >>
> >> ----
> >>
> >>
> >>
> >> In our port we have something called "Statistics History". Basically
> >>
> >> this is a rolling history, spanning up to 10 days, of a number of key
> >>
> >> values. Key values range from JVM specifics like heap size, metaspace
> >>
> >> size, number of threads etc, to platform specifics like memory
> >>
> >> footprint, cpu load, io- and swapping activity etc.
> >>
> >>
> >>
> >> A periodic tasks collects those values, in - by default - 15 second
> >>
> >> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
> >>
> >> memory that FIFO is downsampled in two steps, so we have the last n
> >>
> >> hours in high resolution and the last n days in low resolution (of
> >>
> >> course all these parameters are configurable).
> >>
> >>
> >>
> >> The history report can be triggered via jcmd, and also could get
> >>
> >> printed in the hs.err file (open for debate).
> >>
> >>
> >>
> >> ---
> >>
> >>
> >>
> >> Here some examples of how the whole thing looks like:
> >>
> >>
> >>
> >>
> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
> >>
> >>
> >>
> >>
> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
> >>
> >>
> >>
> >> ---
> >>
> >>
> >>
> >> This feature has been really popular with our support folk over the
> >>
> >> years. Be it that the VM is starved for resources by the OS, that we
> >>
> >> have some slow- or fast developing leak situation etc: these values
> >>
> >> are a first and easy way to get a first stab at a situation, before we
> >>
> >> start more expensive analysis.
> >>
> >>
> >>
> >> The explicit design goal of this history was to be very cheap - cheap
> >>
> >> enough to be *always on* and getting forgotten. It is, in our port,
> >>
> >> enabled by default. That way, if a problem occurs at a customer site,
> >>
> >> we immediately see developments spanning the last 10 days, without
> >>
> >> having to reproduce the issue.
> >>
> >>
> >>
> >> It is also robust enough to be usable during error reporting without
> >>
> >> endangering the error reporting process or falsifying the picture.
> >>
> >>
> >>
> >> I am aware that this crosses over into JFR territory. But this feature
> >>
> >> does not attempt to replace JFR, it is intended instead a cheap always
> >>
> >> on first stop historical overview.
> >>
> >>
> >>
> >> --
> >>
> >>
> >>
> >> I have a patch which can be applied atop of jdk12:
> >>
> >>
> >>
> >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
> >>
> >>
> >>
> >> It works, passes our nightlies and no regressions are shown in dapapo
> >>
> >> benchmarks.
> >>
> >>
> >>
> >> Please tell me what you think. Given enough interest, I will attempt
> >>
> >> to contribute (drafting a JEP if necessary.)
> >>
> >>
> >>
> >> Thanks and Kind Regards,
> >>
> >>
> >>
> >> Thomas
> >>
> >>
>
>

-- 
Simon Roberts
(303) 249 3613
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20181115/e78c23e0/attachment.html>


More information about the serviceability-dev mailing list