RFR: 8221535: add steal tick related information to hs_error file [linux]

Baesken, Matthias matthias.baesken at sap.com
Wed Mar 27 15:10:52 UTC 2019


  *   All values in the /proc/stat cpu line beyond "idle" (the fourth) are depending on kernel version.

Hello,  I think we have to worry only about  “steal”  (kernel 2.6.11)  and  maybe (if we want to add this )  “guest” (kernel 2.6.24).
The 3 others beyond idle  came with 2.6 or even earlier :

“In Linux 2.6 this line includes three additional columns: iowait - time waiting for I/O to complete (since 2.5.41);
irq - time  servicing interrupts (since 2.6.0-test4); softirq - time servicing softirqs (since 2.6.0-test4).”

Even the   SUSE Linux 9 and  Redhat 5  machines I checked  have such “new”  (2.6+) kernels . Most  likely we cannot run for some time  on such old systems .



Best regards, Matthias



From: Thomas Stüfe <thomas.stuefe at gmail.com>
Sent: Mittwoch, 27. März 2019 11:47
To: David Holmes <david.holmes at oracle.com>
Cc: Baesken, Matthias <matthias.baesken at sap.com>; hotspot-dev at openjdk.java.net
Subject: Re: RFR: 8221535: add steal tick related information to hs_error file [linux]



On Wed, Mar 27, 2019 at 11:22 AM David Holmes <david.holmes at oracle.com<mailto:david.holmes at oracle.com>> wrote:
On 27/03/2019 8:04 pm, Baesken, Matthias wrote:
> Hi Thomas, interesting idea . David – would you be fine with  what
> Thomas proposes ?

Isn't adding the missing fields to the existing os_perf code basically
the same thing? I don't see any point in duplicating existing code like
this.

We are on one page here.

But os_perf_linux.cpp, get_total_ticks() (which I assume you mean by pre-existing code) has a number of issues I can see:

- It does not allow for values not found since the kernel is too old. See http://man7.org/linux/man-pages/man5/proc.5.html
  All values in the /proc/stat cpu line beyond "idle" (the fourth) are depending on kernel version. Matthias wants to read steal ticks, which have been introduced with 2.6.11.
  A clean API would recognize this and return all it can return instead of nothing (which is what I believe now happens).
  Of course, the API should be able to signal "is_valid" to the user. For a way to do this, see my proposal, resp. what we did in SapMachine:  https://github.com/SAP/SapMachine/blob/sapmachine/src/hotspot/os/linux/stathist_linux.cpp#L106

- I believe also this means the calculated "totals" can be wrong as it is now:
  pticks->total      = userTicks + niceTicks + systemTicks + idleTicks +
                       iowTicks + irqTicks + sirqTicks;
  Pretty sure that on newer kernels, where you have numbers for "steal" and "guest", those ticks are substracted by the kernel from "user" or "system" and must be added manually to a total.

I also would like to get all values, not just the two or three ticks, and for that, a structure would be better as sketched out.

It also should then be removed from os_perf_linux.cpp to the "os::Linux" namespace and os_linux.cpp (since parsing /proc/stat is a useful thing and it would be good to consolidate that coding at a central place).

Bottomline, I would have a central, low-level, clean and capable and platform-specific solution which can be reused without much fuss in Matthias code as well as in os_perf (I think the JFR uses this too). I really do not want code duplication, at least not long term.

Cheers, Thomas


You

David
-----

> Best regards, Matthias
>
> *From:*Thomas Stüfe <thomas.stuefe at gmail.com<mailto:thomas.stuefe at gmail.com>>
> *Sent:* Mittwoch, 27. März 2019 10:28
> *To:* Baesken, Matthias <matthias.baesken at sap.com<mailto:matthias.baesken at sap.com>>
> *Cc:* hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>
> *Subject:* Re: RFR: 8221535: add steal tick related information to
> hs_error file [linux]
>
> Hi Matthias,
>
> this is a lot of coding for just one counter...
>
> Also os_perf_linux.cpp does something very similar. However, that code
> does not get the steal ticks you want, does not seem to care about
> kernel differences and could also be improved.
>
> Proposal: Why do you not tweak your new API to be a general purpose API
> to read all ticks from /proc/stat. That would be way more useful. Later,
> if we feel like it, could change os_perf_linux.cpp to use your function.
>
> We did something similar for the statistical history in the SapMachine, see
>
> https://github.com/SAP/SapMachine/blob/sapmachine/src/hotspot/os/linux/stathist_linux.cpp
>
> . So I would suggest to have a function reading proc stat and returning
> all values in a struct. That struct should have provisions for "value
> not found" since which values you get depends in kernel version.
>
> --------
>
> Example:
>
> struct value_t { bool valid; uint64_t v; }
>
> struct cpu_values_t {
>   value_t user;
>   value_t nice;
>   value_t system;
>   value_t idle;
>   value_t iowait;
>   value_t steal;
>   value_t guest;
>   value_t guest_nice;
> };
>
> void parse_proc_stat_cpu_line(const char* line, cpu_values_t* out) {
>    // Note: existence of some of these values depends on kernel version
>    out->user.valid = out->nice.valid = out->system.valid =
> out->idle.valid = out->iowait.valid = out->steal.valid =
> out->guest.valid = out->guest_nice.valid = false;
>    int user, nice, system, idle, iowait, irq, softirq, steal, guest,
> guest_nice;
>    int num = ::sscanf(line, "cpu %d %d %d %d %d %d %d %d %d %d",
>                                &user, &nice, &system, &idle, &iowait,
> &irq, &softirq, &steal, &guest, &guest_nice);
>    if (num >= 4) {
>      out->user.v = user; out->user.valid = true;
>      out->nice.v = nice; out->nice.valid = true;
>      out->system.v = system; out->system.valid = true;
>      out->idle.v = idle; out->idle.valid = true;
>      if (num >= 5) { // iowait (5) (since Linux 2.5.41)
>        out->iowait = iowait; out->iowait.valid = true;
>        if (num >= 8) { // steal (8) (since Linux 2.6.11)
>          out->steal = steal; out->steal.valid = true;
>          if (num >= 9) { // guest (9) (since Linux 2.6.24)
>            out->guest = guest; out->guest.valid = true;
>            if (num >= 10) { // guest (9) (since Linux 2.6.33)
>              out->guest_nice = guest_nice; out->guest_nice.valid = true;
>            }
>        }
>     }
> }
>
> Cheers Thomas
>
> On Wed, Mar 27, 2019 at 9:42 AM Baesken, Matthias
> <matthias.baesken at sap.com<mailto:matthias.baesken at sap.com> <mailto:matthias.baesken at sap.com<mailto:matthias.baesken at sap.com>>> wrote:
>
>     Hello, please review  the  following change .
>
>     Recent Linux kernels provide in the /proc filesystem   so called
>     "steal" information, see the the proc man page :
>
>        ..
>     "there is an eighth column, steal - stolen time, which is the time
>     spent in other operating systems when running in a virtualized
>     environment"
>
>     The info was useful  in our  internal VM,  when looking into
>     performance issues  (especially in virtualized environments), so it
>     should be added to the hs_err file .
>
>     Bug/webrev :
>
>     https://bugs.openjdk.java.net/browse/JDK-8221535
>
>     http://cr.openjdk.java.net/~mbaesken/webrevs/8221535.0/
>
>
>     Best Regards, Matthias
>


More information about the hotspot-dev mailing list