RFR: 8221535: add steal tick related information to hs_error file [linux]

Baesken, Matthias matthias.baesken at sap.com
Fri Mar 29 10:01:58 UTC 2019



  *   I would completely remove this section:
  *   +#ifdef DEBUG_LINUX_PROC_STAT
  *   +  vm_fprintf(stderr, "[stat] read "
  *

Hi Thomas,
There are no  vm_printf   any more in

http://cr.openjdk.java.net/~mbaesken/webrevs/8221535.2/src/hotspot/os/linux/os_linux.cpp.frames.html

they have been removed  already.


  *   Every item past the "idle" column in /proc/stat has a (admittedly old) kernel version dependency.
  *   Why do you think it is important to communicate the existence of "steal",

In case  we  are sure that we are always  on  2.6.11+   kernels, then indeed I can remove the  special handling.
I was sure  that  we are on 2.6+  but only 95%  sure that we are on  2.6.11+   .



>BTW, this has nothing to do with your change, but this whole desig seems inefficient:
>In os_perf_linux.cpp - which is used e.g. from JFR - we call "os::Linux::get_tick_information" for every expected cpu, and so we parse /proc/stat n times for one reading.

Sure this could/should  be addressed ;  but in another change .



Regards, Matthias



From: Thomas Stüfe <thomas.stuefe at gmail.com>
Sent: Freitag, 29. März 2019 09:44
To: Baesken, Matthias <matthias.baesken at sap.com>
Cc: David Holmes <david.holmes at oracle.com>; hotspot-dev at openjdk.java.net
Subject: Re: RFR: 8221535: add steal tick related information to hs_error file [linux]

Hi Matthias,

as usual, thanks for your patience and perseverance. Unfortunately you touched a piece of coding which was not very pretty (IMHO) in the first place, so there are a number of further remarks:

- Return type: please use a simple bool value, not OSReturn. Keeps this in line with all other OS functions. Plus I do not see us returning anything else than OS_OK and OS_ERR anyway, and callers seem not to care either.

- Please use lowercase-underscore naming of the return struct, not camel case. Again, like all other os functions around. I suggest a simple "cpu_ticks".

- Please change parameter order to get_tick_information and make the default of "which_logical_cpu" -1. And add a small comment that -1 means "all".

// which_logical_cpu=-1 returns accumulated ticks for all cpus.
bool os::Linux::get_tick_information(cpu_ticks* out, int which_logical_cpu = -1) {

- Please initialize the output structure to zero at the begin of the function. Just a simple memset is sufficient.

- I would rename "expected_tickinfo_count" to "required_tickinfo_count". That clearly states that we have a number of values which are mandatory but are prepared to handle optional values too.

- I would completely remove this section:

+#ifdef DEBUG_LINUX_PROC_STAT
+  vm_fprintf(stderr, "[stat] read "
+          UINT64_FORMAT " " UINT64_FORMAT " " UINT64_FORMAT " " UINT64_FORMAT " "
+          UINT64_FORMAT " " UINT64_FORMAT " " UINT64_FORMAT " " UINT64_FORMAT " " UINT64_FORMAT " \n",
+          userTicks, niceTicks, systemTicks, idleTicks,
+          iowTicks, irqTicks, sirqTicks, stealTicks, guestNiceTicks);
+#endif

vm_fprintf is nowhere defined and will not build when activated. Seems to be a relict. We call it only from here and from os_perf_aix.cpp. Please remove it from os_aix.cpp too.

- Backward compatibility:

I do not see the value in "has_steal_ticks". See manpage for /proc: http://man7.org/linux/man-pages/man5/proc.5.html

Every item past the "idle" column in /proc/stat has a (admittedly old) kernel version dependency. Why do you think it is important to communicate the existence of "steal", but not of the items before (introduced before Linux 2.6.11)? Seems arbitrary.

I mean, lets do this either right or not at all.

If done right, I would prefer my suggested solution which somehow maps an INVALID value onto the output value. If you dislike my prior proposal (value_t) you could just define a special value to mean INVALID. E.g. #define INVALID_VALUE ((uint64_t)(0xFFFFFFFFFFFFFFFFUL)) or similar.

Alternatively, we could decide to just not care. For values we do not find we could report zero (code does that now already). You may report "0" steals and not know if thats true or if the kernel is just too old to tell you, but again, that is the case for other values too.

------

BTW, this has nothing to do with your change, but this whole desig seems inefficient:

In os_perf_linux.cpp - which is used e.g. from JFR - we call "os::Linux::get_tick_information" for every expected cpu, and so we parse /proc/stat n times for one reading.

We could instead just have a function like os::Linux::get_tick_information just return all information in one go. Would be way easier.

Cheers, Thomas



On Fri, Mar 29, 2019 at 8:56 AM Baesken, Matthias <matthias.baesken at sap.com<mailto:matthias.baesken at sap.com>> wrote:
Hi David ,   thanks  for  your comments .

New webrev :

http://cr.openjdk.java.net/~mbaesken/webrevs/8221535.2/

moved  os::CPUPerfTicks    to   os::Linux::CPUPerfTicks
removed DEBUG_LINUX_PROC_STAT  from os_linux.cpp
addressed the minor  comments


Best regards, Matthias


> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com<mailto:david.holmes at oracle.com>>
> Sent: Freitag, 29. März 2019 06:07
> To: Baesken, Matthias <matthias.baesken at sap.com<mailto:matthias.baesken at sap.com>>; Thomas Stüfe
> <thomas.stuefe at gmail.com<mailto:thomas.stuefe at gmail.com>>
> Cc: hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>
> Subject: Re: RFR: 8221535: add steal tick related information to hs_error file
> [linux]
>
> Hi Matthias,
>
> This generally seems okay.
>
> On 28/03/2019 2:50 am, Baesken, Matthias wrote:
> > Hello, here is a second webrev :
> >
> > http://cr.openjdk.java.net/~mbaesken/webrevs/8221535.1/
> >
> > I  followed  Davids idea  to  use  the  os_perf coding .
> > For reusing  I  moved  os::Linux::get_tick_information    (former name
> get_total_ticks  , but it is now not only total  but also steal  )   into  os_linux .
> > The CPUPerfTicks   struct has been  enhanced   by    steal and  a boolean flag
> has_steal_ticks  .
> >
> >    28 struct CPUPerfTicks {
> >    29   uint64_t used;
> >    30   uint64_t usedKernel;
> >    31   uint64_t total;
> >    32   uint64_t steal;
> >    33   bool     has_steal_ticks;
> >    34 };
>
> Okay, but one minor nit: why put struct CPUPerfTicks in the os::
> namespace with the code that uses it in the os::Linux namespace?
>
> A few other minor comments:
>
> src/hotspot/os/linux/os_linux.cpp
>
> +   if (-1 == which_logical_cpu) {
>
> hotspot style puts the variable first in a comparison against a constant.
>
> + #ifdef DEBUG_LINUX_PROC_STAT
>
> This should either just be DEBUG code or else deleted. The vm_printfs
> would ideally be logging statements if that is possible.
>
> +   if (n > expected_tickinfo_count+3) {
>
> Spaces around + please.
>
> +       st->print_cr("Steal ticks from vm start: %llu", (long long
> unsigned) steal_ticks_difference);
>
> %llu should be UINT64_FORMAT.
> The cast should not be needed.
>
> Thanks,
> David
> -----
>
> >
> > I found  the  "big API"   idea  a bit too much  for  just 2  use cases  .
> >
> >
> > Best regards , Matthias
> >


More information about the hotspot-dev mailing list