Review request JMC-6512: for Competing CPU load rule should take overall load into account

Marcus Hirt marcus.hirt at datadoghq.com
Mon Jul 8 15:25:30 UTC 2019


Jira ticket opened here:
https://bugs.openjdk.java.net/browse/JMC-6523

/M

On Mon, Jul 8, 2019 at 10:59 AM Jie Kang <jkang at redhat.com> wrote:

> On Wed, Jul 3, 2019 at 11:52 AM Florian David <florian.david at gmail.com>
> wrote:
> >
> >
> >
> > Le jeu. 27 juin 2019 à 23:38, Jie Kang <jkang at redhat.com> a écrit :
> >>
> >> Hi Florian,
> >>
> > Hi Jie,
> >
> > Thanks for the review. Please find the new one taking into account your
> feedback here:
> > http://cr.openjdk.java.net/~hirt/fdavid/JMC-6512/webrev.1/
> >
> >>
> >> Changes look mostly good.
> >>
> >> # I have some suggestions and comments below:
> >>
> >> * The Messages also contain:
> >>
> >> CompareCpuRule_INFO_LIMIT=Competing CPU usage limit
> >> CompareCpuRule_INFO_LIMIT_LONG=The amount of CPU used by other
> >> processes needed to trigger an info notice
> >> CompareCpuRule_WARNING_LIMIT=Competing CPU usage warning limit
> >> CompareCpuRule_WARNING_LIMIT_LONG=The amount of CPU used by other
> >> processes needed to trigger a warning
> >>
> >> I think this patch should also alter these descriptions to reflect the
> >> new calculation behind the rule.  To be honest I'm not sure how to
> >> describe the new meaning succinctly in words, otherwise I'd provide a
> >> suggestion, sorry;
> >>
> > I have updated these messages:
> >
> > CompareCpuRule_TEXT_INFO_LONG=The application performance can be
> affected when the machine is under heavy load and there are other processes
> that use CPU or other resources on the same computer. To profile
> representatively or get higher throughput, shut down other resource
> intensive processes running on the machine.
> > ...
> > CompareCpuRule_TEXT_MESSAGE=An average CPU load of {2} was caused by
> other processes during {0} starting at {1}.
> > CompareCpuRule_INFO_LIMIT=Competing CPU usage info limit
> > CompareCpuRule_INFO_LIMIT_LONG=The amount of CPU used by other processes
> times the overall CPU usage needed to trigger an info notice
> > CompareCpuRule_WARNING_LIMIT=Competing CPU usage warning limit
> >
> > I tried to keep these messages simple and not overwhelming the user with
> lot of details.
> > However, if you think it should be even more detailed, I can be more
> precise.
>
> Hey Florian,
>
> These messages are fine with me. Unfortunately some of the tests now fail:
>
> [ERROR] Failures:
> [ERROR]   TestRulesWithJfr.verifyAllResults:132->verifyRuleResults:159
>
> I am guessing the JfrRuleBaseline.xml needs to have some more updates
> for the new changes.
>
> I think the rest of the patch looks fine!
>
> >
> >>
> >> # The following comments could be addressed in different Jira issues,
> if at all:
> >>
> >> * The SpanToolkit is used twice on the items now, once for the max and
> >> once for the ratio. Could the underlying iterations over the items be
> >> combined? I understand they calculate different spans, one of which is
> >> used in the message while the other is used for the score. Could they
> >> be aligned?
> >>
> >> * It's curious that the SpanToolkit is given the warning limit whereas
> >> the long message is generated if the score is above the info limit. As
> >> well, the warning limit is used to determine if spans get combined
> >> while the end result used for the score process is the longest span,
> >> regardless of it's value's relation to the warning limit. Should it be
> >> the longest span that's above the warning limit, if possible,
> >> otherwise the longest span? I guess it depends on if the initial spans
> >> vary in length or not. I'm not sure how to describe it but something
> >> feels strange here.
> >>
> > I agree, these comments are relevant. However I'm not yet a contributor
> > thus I don't have rights to create Jira tickets :( How can we address
> that ?
>
> I think you can ask Marcus to create a Jira ticket for you.
>
>
> Regards,
>
>
>
> >
> > For the second point, I also believe there is something fishy in this
> code
> > about using the warning limit instead of the info limit is incorrect.
> I'm not
> > mastering spans already but passing warning limit to getMaxSpanLimit
> > should exclude span below this limit, thus only triggering the rule if
> score
> > is above warning limit only and never of it's between info and warning
> limits.
> > Something is wrong in this workflow.
> >
> >> # The following is just a question for the intended design:
> >>
> >> * After an initial read of these changes, I would expect that if I set
> >> a warning limit of say 0.4, then a warning would be triggered if there
> >> is a longest span where the CPU usage of other processes was 40% and
> >> the total CPU usage was 100% I.e. the range is like [1]? Is that
> >> right?
> >>
> >> [1]
> >> x*y<LIMIT, 0<=x<=1, 0<=y<=1, x<y
> >> x: CPU Usage of Other Processes over the longest span of time
> >> y: CPU Usage of All Processes over longest span of time
> >>
> > Yes that's exactly the computation being done. The core idea is that when
> > machine CPU is low, it's meaningless to trigger the rule since there is
> still enough
> > CPU to run other processes. The multiplication by 'y' decreases the score
> > in this case.
> >
> > Regards,
> > Florian
> >
> >>
> >>
> >> Regards,
> >>
> >>
> >> On Thu, Jun 27, 2019 at 5:34 AM Florian David <florian.david at gmail.com>
> wrote:
> >> >
> >> > Hi all,
> >> >
> >> > Please review this fix for the CPU load rule.
> >> >
> >> > JIRA: https://bugs.openjdk.java.net/browse/JMC-6512
> >> > Webrev: http://cr.openjdk.java.net/~hirt/fdavid/JMC-6512/webrev.0/
> >> >
> >> > Regards,
> >> > Florian
>


More information about the jmc-dev mailing list