RFR 8247471: Enhance CPU load events with the actual elapsed CPU time

Jaroslav Bachorík jaroslav.bachorik at datadoghq.com
Mon Jun 22 08:17:30 UTC 2020


On Sun, Jun 21, 2020 at 2:11 AM Erik Gahlin <erik.gahlin at oracle.com> wrote:
>
> Hi Jaroslav,
>
> Sorry for the late reply.
>
> > On 18 Jun 2020, at 19:41, Jaroslav Bachorík <jaroslav.bachorik at datadoghq.com> wrote:
> >
> > Hi Erik,
> >
> >
> > On Mon, Jun 15, 2020 at 10:20 AM Erik Gahlin <erik.gahlin at oracle.com> wrote:
> >>
> >> Hi Jaroslav,
> >>
> >> I wonder if we should remove CPU from the field names and labels in metadata.xml
> >>
> >> <Field type="long" contentType="nanos" name="jvmUserCpuTime" label="Elapsed JVM UserTime”/>
> >>
> >> becomes
> >>
> >> <Field type="long" contentType="nanos" name="jvmUserTime" label="Elapsed JVM User Time”/>
> >>
> >> CPU is implicit from the event name and we don’t have it for the other fields.
> >
> > Changed: http://cr.openjdk.java.net/~jbachorik/8247471/webrev.01/
> >
> >>
> >> When does machineTotalCpuTime start?
> >>
> >> From when the machine booted or when the JVM process started?
> >>
> >> Maybe we want to make that clear in a description.
> >
> > I re-checked the descriptions and all attributes are defined as
> > 'elapsed' time which kind of evokes the delta against the previous
> > state. I agree that it might not be immediately 100% clear but if I am
> > going to enhance the description I would need guidance about how much
> > text is not too much - eg. would something like 'JVM System CPU Time
> > Elapsed Since Previous CPU Load Event' (and similar) be acceptable?
> >
>
> We have tried to avoid using the word ‘event’ (and ‘this’ in descriptions). Mostly because descriptions and labels may end up in places where the (event) context is lost, for example in a GUI tooltip or in a chart legend.
>
> description = “Time the JVM spent executing in user mode since last sample”
> description = “Time the JVM spent executing in kernel model since last sample”
> description = “Total time spent executing since last sample”
>
> description = “Time the thread spent executing in user mode since last sample”
> description = “Time the thread spent executing in kernel mode since last sample”
>
> What bothers me, when I spell it out like this, is that we don’t know when the last sample occurred. It’s fine when we have percentage, but it doesn’t work that well when we have the actual time. It would be possible to set a duration for the event, but it may be confusing since we have not had it before for periodic events. We could add another field, i.e “Wall clock time since last sample”, but we have not done that before for other events.

I think 'since last sample' is an implicit assumption for the load
metric as well as it is reported nowadays. Other definitions, like
moving average of the load since JVM startup, are also valid but they
are not used for those events.
The solution with setting the duration sounds reasonable - it would be
useful even for the CPU load based calculations. Eg. currently if I
want to convert the load per sample into something more tangible I
need to use min(period, time since last sample) to convert the load
back to CPU time measured for this particular event, which is only an
approximation and is breaking for either first or last event in the
recording :(

>
> I’m sorry for the delay, but could I get a few more days to think this over?

Don't worry - we are not in a terrible hurry here. I am all for
getting it right instead of getting it quick.

-JB-

>
> I think it is really important to get the event definition correct.
>
> Once released, changing the event definition becomes a mess. The implementation on the other hand can always be updated. Working on the consumer side of JFR for several years, I know how frustrating it can be to not have proper definitions or data that you cant’ work with.
>
> Otherwise, the implementation looks good.
>
> Erik
>
> > Thanks,
> >
> > -JB-
> >
> >>
> >> Erik
> >>
> >>> On 12 Jun 2020, at 12:08, Jaroslav Bachorík <jaroslav.bachorik at datadoghq.com> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> Here is my attempt at adding the elapsed CPU time to
> >>> ThreadCPULoadEvent and CPULoadEvent JFR events.
> >>>
> >>> I have tested the implementation on Linux, MacOS and Windows and it is
> >>> working as expected.
> >>> However, I am not an expert on all of those platforms so if there is
> >>> an easier way to get the elapsed CPU time than what I am doing now I
> >>> am open to suggestions.
> >>>
> >>> JIRA: https://bugs.openjdk.java.net/browse/JDK-8247471
> >>> Webrev: http://cr.openjdk.java.net/~jbachorik/8247471
> >>>
> >>> Thanks!
> >>>
> >>> -JB-
> >>
>


More information about the hotspot-jfr-dev mailing list