RFR 8247471: Enhance CPU load events with the actual elapsed CPU time

Erik Gahlin erik.gahlin at oracle.com
Sun Jun 21 00:10:48 UTC 2020


Hi Jaroslav, 

Sorry for the late reply. 

> On 18 Jun 2020, at 19:41, Jaroslav Bachorík <jaroslav.bachorik at datadoghq.com> wrote:
> 
> Hi Erik,
> 
> 
> On Mon, Jun 15, 2020 at 10:20 AM Erik Gahlin <erik.gahlin at oracle.com> wrote:
>> 
>> Hi Jaroslav,
>> 
>> I wonder if we should remove CPU from the field names and labels in metadata.xml
>> 
>> <Field type="long" contentType="nanos" name="jvmUserCpuTime" label="Elapsed JVM UserTime”/>
>> 
>> becomes
>> 
>> <Field type="long" contentType="nanos" name="jvmUserTime" label="Elapsed JVM User Time”/>
>> 
>> CPU is implicit from the event name and we don’t have it for the other fields.
> 
> Changed: http://cr.openjdk.java.net/~jbachorik/8247471/webrev.01/
> 
>> 
>> When does machineTotalCpuTime start?
>> 
>> From when the machine booted or when the JVM process started?
>> 
>> Maybe we want to make that clear in a description.
> 
> I re-checked the descriptions and all attributes are defined as
> 'elapsed' time which kind of evokes the delta against the previous
> state. I agree that it might not be immediately 100% clear but if I am
> going to enhance the description I would need guidance about how much
> text is not too much - eg. would something like 'JVM System CPU Time
> Elapsed Since Previous CPU Load Event' (and similar) be acceptable?
> 

We have tried to avoid using the word ‘event’ (and ‘this’ in descriptions). Mostly because descriptions and labels may end up in places where the (event) context is lost, for example in a GUI tooltip or in a chart legend.

description = “Time the JVM spent executing in user mode since last sample”
description = “Time the JVM spent executing in kernel model since last sample”
description = “Total time spent executing since last sample”

description = “Time the thread spent executing in user mode since last sample”
description = “Time the thread spent executing in kernel mode since last sample”

What bothers me, when I spell it out like this, is that we don’t know when the last sample occurred. It’s fine when we have percentage, but it doesn’t work that well when we have the actual time. It would be possible to set a duration for the event, but it may be confusing since we have not had it before for periodic events. We could add another field, i.e “Wall clock time since last sample”, but we have not done that before for other events.

I’m sorry for the delay, but could I get a few more days to think this over?

I think it is really important to get the event definition correct.

Once released, changing the event definition becomes a mess. The implementation on the other hand can always be updated. Working on the consumer side of JFR for several years, I know how frustrating it can be to not have proper definitions or data that you cant’ work with.

Otherwise, the implementation looks good.

Erik

> Thanks,
> 
> -JB-
> 
>> 
>> Erik
>> 
>>> On 12 Jun 2020, at 12:08, Jaroslav Bachorík <jaroslav.bachorik at datadoghq.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> Here is my attempt at adding the elapsed CPU time to
>>> ThreadCPULoadEvent and CPULoadEvent JFR events.
>>> 
>>> I have tested the implementation on Linux, MacOS and Windows and it is
>>> working as expected.
>>> However, I am not an expert on all of those platforms so if there is
>>> an easier way to get the elapsed CPU time than what I am doing now I
>>> am open to suggestions.
>>> 
>>> JIRA: https://bugs.openjdk.java.net/browse/JDK-8247471
>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8247471
>>> 
>>> Thanks!
>>> 
>>> -JB-
>> 



More information about the hotspot-jfr-dev mailing list