RFR: 8221535: add steal tick related information to hs_error file [linux]

Thomas Stüfe thomas.stuefe at gmail.com
Wed Mar 27 10:46:59 UTC 2019


On Wed, Mar 27, 2019 at 11:22 AM David Holmes <david.holmes at oracle.com>
wrote:

> On 27/03/2019 8:04 pm, Baesken, Matthias wrote:
> > Hi Thomas, interesting idea . David – would you be fine with  what
> > Thomas proposes ?
>
> Isn't adding the missing fields to the existing os_perf code basically
> the same thing? I don't see any point in duplicating existing code like
> this.
>

We are on one page here.

But os_perf_linux.cpp, get_total_ticks() (which I assume you mean by
pre-existing code) has a number of issues I can see:

- It does not allow for values not found since the kernel is too old. See
http://man7.org/linux/man-pages/man5/proc.5.html
  All values in the /proc/stat cpu line beyond "idle" (the fourth) are
depending on kernel version. Matthias wants to read steal ticks, which have
been introduced with 2.6.11.
  A clean API would recognize this and return all it can return instead of
nothing (which is what I believe now happens).
  Of course, the API should be able to signal "is_valid" to the user. For a
way to do this, see my proposal, resp. what we did in SapMachine:
https://github.com/SAP/SapMachine/blob/sapmachine/src/hotspot/os/linux/stathist_linux.cpp#L106

- I believe also this means the calculated "totals" can be wrong as it is
now:
  pticks->total      = userTicks + niceTicks + systemTicks + idleTicks +
                       iowTicks + irqTicks + sirqTicks;
  Pretty sure that on newer kernels, where you have numbers for "steal" and
"guest", those ticks are substracted by the kernel from "user" or "system"
and must be added manually to a total.

I also would like to get all values, not just the two or three ticks, and
for that, a structure would be better as sketched out.

It also should then be removed from os_perf_linux.cpp to the "os::Linux"
namespace and os_linux.cpp (since parsing /proc/stat is a useful thing and
it would be good to consolidate that coding at a central place).

Bottomline, I would have a central, low-level, clean and capable and
platform-specific solution which can be reused without much fuss in
Matthias code as well as in os_perf (I think the JFR uses this too). I
really do not want code duplication, at least not long term.

Cheers, Thomas


You

>
> David
> -----
>
> > Best regards, Matthias
> >
> > *From:*Thomas Stüfe <thomas.stuefe at gmail.com>
> > *Sent:* Mittwoch, 27. März 2019 10:28
> > *To:* Baesken, Matthias <matthias.baesken at sap.com>
> > *Cc:* hotspot-dev at openjdk.java.net
> > *Subject:* Re: RFR: 8221535: add steal tick related information to
> > hs_error file [linux]
> >
> > Hi Matthias,
> >
> > this is a lot of coding for just one counter...
> >
> > Also os_perf_linux.cpp does something very similar. However, that code
> > does not get the steal ticks you want, does not seem to care about
> > kernel differences and could also be improved.
> >
> > Proposal: Why do you not tweak your new API to be a general purpose API
> > to read all ticks from /proc/stat. That would be way more useful. Later,
> > if we feel like it, could change os_perf_linux.cpp to use your function.
> >
> > We did something similar for the statistical history in the SapMachine,
> see
> >
> >
> https://github.com/SAP/SapMachine/blob/sapmachine/src/hotspot/os/linux/stathist_linux.cpp
> >
> > . So I would suggest to have a function reading proc stat and returning
> > all values in a struct. That struct should have provisions for "value
> > not found" since which values you get depends in kernel version.
> >
> > --------
> >
> > Example:
> >
> > struct value_t { bool valid; uint64_t v; }
> >
> > struct cpu_values_t {
> >   value_t user;
> >   value_t nice;
> >   value_t system;
> >   value_t idle;
> >   value_t iowait;
> >   value_t steal;
> >   value_t guest;
> >   value_t guest_nice;
> > };
> >
> > void parse_proc_stat_cpu_line(const char* line, cpu_values_t* out) {
> >    // Note: existence of some of these values depends on kernel version
> >    out->user.valid = out->nice.valid = out->system.valid =
> > out->idle.valid = out->iowait.valid = out->steal.valid =
> > out->guest.valid = out->guest_nice.valid = false;
> >    int user, nice, system, idle, iowait, irq, softirq, steal, guest,
> > guest_nice;
> >    int num = ::sscanf(line, "cpu %d %d %d %d %d %d %d %d %d %d",
> >                                &user, &nice, &system, &idle, &iowait,
> > &irq, &softirq, &steal, &guest, &guest_nice);
> >    if (num >= 4) {
> >      out->user.v = user; out->user.valid = true;
> >      out->nice.v = nice; out->nice.valid = true;
> >      out->system.v = system; out->system.valid = true;
> >      out->idle.v = idle; out->idle.valid = true;
> >      if (num >= 5) { // iowait (5) (since Linux 2.5.41)
> >        out->iowait = iowait; out->iowait.valid = true;
> >        if (num >= 8) { // steal (8) (since Linux 2.6.11)
> >          out->steal = steal; out->steal.valid = true;
> >          if (num >= 9) { // guest (9) (since Linux 2.6.24)
> >            out->guest = guest; out->guest.valid = true;
> >            if (num >= 10) { // guest (9) (since Linux 2.6.33)
> >              out->guest_nice = guest_nice; out->guest_nice.valid = true;
> >            }
> >        }
> >     }
> > }
> >
> > Cheers Thomas
> >
> > On Wed, Mar 27, 2019 at 9:42 AM Baesken, Matthias
> > <matthias.baesken at sap.com <mailto:matthias.baesken at sap.com>> wrote:
> >
> >     Hello, please review  the  following change .
> >
> >     Recent Linux kernels provide in the /proc filesystem   so called
> >     "steal" information, see the the proc man page :
> >
> >        ..
> >     "there is an eighth column, steal - stolen time, which is the time
> >     spent in other operating systems when running in a virtualized
> >     environment"
> >
> >     The info was useful  in our  internal VM,  when looking into
> >     performance issues  (especially in virtualized environments), so it
> >     should be added to the hs_err file .
> >
> >     Bug/webrev :
> >
> >     https://bugs.openjdk.java.net/browse/JDK-8221535
> >
> >     http://cr.openjdk.java.net/~mbaesken/webrevs/8221535.0/
> >
> >
> >     Best Regards, Matthias
> >
>


More information about the hotspot-dev mailing list