Proposal: Always-on Statistical History
Thomas Stüfe
thomas.stuefe at gmail.com
Wed Nov 14 21:32:54 UTC 2018
Hi Simon,
thank you. Yes, I combined vmstat/pidstat like features etc with
internal JVM statistics. Note that part of that table is platform
specific, so it looks slightly different on BSD/Windows/Solaris etc.
The JVM values are always the same.
Best Regards, Thomas
On Wed, Nov 14, 2018 at 7:29 PM Simon Roberts
<simon at dancingcloudservices.com> wrote:
>
> I would say this could be pretty useful. It's almost like a platform-independent, process specific vmstat, with JVM extras. Given the existence of jps, this seems to fit in that ecosystem well. I find myself having to work with windows just rarely enough that I'd have to look up how to get this info on that host every time.
> $0.02
>
>
> On Wed, Nov 14, 2018 at 7:57 AM Thomas Stüfe <thomas.stuefe at gmail.com> wrote:
>>
>> Hi all,
>>
>> We have that feature in our port which we would like to contribute,
>> and I would like to gauge opinions.
>>
>> First off, I am not sure which list is correct. This is more of a
>> serviceability issue, but implementation wise it fit hs-runtime
>> better. I'll start with serviceability, but feel free crosspost if
>> needed.
>>
>> Second, I am aware that this may require a JEP. If necessary and the
>> feedback is positive, I will draft one.
>>
>> ----
>>
>> In our port we have something called "Statistics History". Basically
>> this is a rolling history, spanning up to 10 days, of a number of key
>> values. Key values range from JVM specifics like heap size, metaspace
>> size, number of threads etc, to platform specifics like memory
>> footprint, cpu load, io- and swapping activity etc.
>>
>> A periodic tasks collects those values, in - by default - 15 second
>> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
>> memory that FIFO is downsampled in two steps, so we have the last n
>> hours in high resolution and the last n days in low resolution (of
>> course all these parameters are configurable).
>>
>> The history report can be triggered via jcmd, and also could get
>> printed in the hs.err file (open for debate).
>>
>> ---
>>
>> Here some examples of how the whole thing looks like:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
>>
>> ---
>>
>> This feature has been really popular with our support folk over the
>> years. Be it that the VM is starved for resources by the OS, that we
>> have some slow- or fast developing leak situation etc: these values
>> are a first and easy way to get a first stab at a situation, before we
>> start more expensive analysis.
>>
>> The explicit design goal of this history was to be very cheap - cheap
>> enough to be *always on* and getting forgotten. It is, in our port,
>> enabled by default. That way, if a problem occurs at a customer site,
>> we immediately see developments spanning the last 10 days, without
>> having to reproduce the issue.
>>
>> It is also robust enough to be usable during error reporting without
>> endangering the error reporting process or falsifying the picture.
>>
>> I am aware that this crosses over into JFR territory. But this feature
>> does not attempt to replace JFR, it is intended instead a cheap always
>> on first stop historical overview.
>>
>> --
>>
>> I have a patch which can be applied atop of jdk12:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
>>
>> It works, passes our nightlies and no regressions are shown in dapapo
>> benchmarks.
>>
>> Please tell me what you think. Given enough interest, I will attempt
>> to contribute (drafting a JEP if necessary.)
>>
>> Thanks and Kind Regards,
>>
>> Thomas
>
>
>
> --
> Simon Roberts
> (303) 249 3613
>
More information about the serviceability-dev
mailing list