Proposal: Always-on Statistical History

Roger Riggs Roger.Riggs at oracle.com
Thu Nov 15 16:40:17 UTC 2018


Hi,

This looks like it has significant overlap with JFR.
I don't think we want to start building in multiple mechanisms to keep 
tabs on a running VM.

$.02, Roger


On 11/14/2018 04:27 PM, Thomas Stüfe wrote:
> Hi Bernd,
>
> On Wed, Nov 14, 2018 at 10:07 PM Bernd Eckenfels <ecki at zusammenkunft.net> wrote:
>> Looks good Thomas,
> thanks!
>
>> what would be the typical memory usage with the Default Settings?
> ~ 80 Kb. Its very small.
>
>> Does the downsampling support min/max style rollups?
> Not sure what you mean. Do you mean does it preserve peaks? Not yet,
> such a feature would have to be added.
>
> Right now, downsampling is very primitive for performance reasons. For
> snapshot values like heap size etc we just throw away the samples, so
> you loose temporary peaks. For counter-like values-over-time (e.g.
> number of pages swapped in etc), they just refer then to a larger time
> span.
>
> Best Regards, Thomas
>
>>
>>
>> --
>> http://bernd.eckenfels.net
>>
>>
>>
>> Von: Thomas Stüfe
>> Gesendet: Mittwoch, 14. November 2018 16:29
>> An: serviceability-dev at openjdk.java.net serviceability-dev at openjdk.java.net
>> Betreff: Proposal: Always-on Statistical History
>>
>>
>>
>> Hi all,
>>
>>
>>
>> We have that feature in our port which we would like to contribute,
>>
>> and I would like to gauge opinions.
>>
>>
>>
>> First off, I am not sure which list is correct. This is more of a
>>
>> serviceability issue, but implementation wise it fit hs-runtime
>>
>> better. I'll start with serviceability, but feel free crosspost if
>>
>> needed.
>>
>>
>>
>> Second, I am aware that this may require a JEP. If necessary and the
>>
>> feedback is positive, I will draft one.
>>
>>
>>
>> ----
>>
>>
>>
>> In our port we have something called "Statistics History". Basically
>>
>> this is a rolling history, spanning up to 10 days, of a number of key
>>
>> values. Key values range from JVM specifics like heap size, metaspace
>>
>> size, number of threads etc, to platform specifics like memory
>>
>> footprint, cpu load, io- and swapping activity etc.
>>
>>
>>
>> A periodic tasks collects those values, in - by default - 15 second
>>
>> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
>>
>> memory that FIFO is downsampled in two steps, so we have the last n
>>
>> hours in high resolution and the last n days in low resolution (of
>>
>> course all these parameters are configurable).
>>
>>
>>
>> The history report can be triggered via jcmd, and also could get
>>
>> printed in the hs.err file (open for debate).
>>
>>
>>
>> ---
>>
>>
>>
>> Here some examples of how the whole thing looks like:
>>
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
>>
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
>>
>>
>>
>> ---
>>
>>
>>
>> This feature has been really popular with our support folk over the
>>
>> years. Be it that the VM is starved for resources by the OS, that we
>>
>> have some slow- or fast developing leak situation etc: these values
>>
>> are a first and easy way to get a first stab at a situation, before we
>>
>> start more expensive analysis.
>>
>>
>>
>> The explicit design goal of this history was to be very cheap - cheap
>>
>> enough to be *always on* and getting forgotten. It is, in our port,
>>
>> enabled by default. That way, if a problem occurs at a customer site,
>>
>> we immediately see developments spanning the last 10 days, without
>>
>> having to reproduce the issue.
>>
>>
>>
>> It is also robust enough to be usable during error reporting without
>>
>> endangering the error reporting process or falsifying the picture.
>>
>>
>>
>> I am aware that this crosses over into JFR territory. But this feature
>>
>> does not attempt to replace JFR, it is intended instead a cheap always
>>
>> on first stop historical overview.
>>
>>
>>
>> --
>>
>>
>>
>> I have a patch which can be applied atop of jdk12:
>>
>>
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
>>
>>
>>
>> It works, passes our nightlies and no regressions are shown in dapapo
>>
>> benchmarks.
>>
>>
>>
>> Please tell me what you think. Given enough interest, I will attempt
>>
>> to contribute (drafting a JEP if necessary.)
>>
>>
>>
>> Thanks and Kind Regards,
>>
>>
>>
>> Thomas
>>
>>



More information about the serviceability-dev mailing list