Review request JMC-6512: for Competing CPU load rule should take overall load into account

Marcus Hirt marcus.hirt at datadoghq.com
Thu Jul 11 21:14:45 UTC 2019


Looks good to me too! I or Henrik will sponsor this one.

Kind regards,
Marcus

On Thu, Jul 11, 2019 at 2:05 PM Jie Kang <jkang at redhat.com> wrote:
>
> On Wed, Jul 10, 2019 at 12:03 PM Florian David <florian.david at gmail.com> wrote:
> >
> > I updated the patch and tests are passing. I also corrected 2 more messages (CompareCpuRule_INFO_LIMIT and CompareCpuRule_WARNING_LIMIT).
> > I have also rebased the patch so that it can be applied to the last revision (659b7d0350d6).
> >
> > Link: http://cr.openjdk.java.net/~hirt/fdavid/JMC-6512/webrev.2/
>
> Hey Florian,
>
> Thanks for the update. It looks good to me.
>
>
> Regards,
>
>
> >
> > Le lun. 8 juil. 2019 à 11:25, Marcus Hirt <marcus.hirt at datadoghq.com> a écrit :
> >>
> >> Jira ticket opened here:
> >> https://bugs.openjdk.java.net/browse/JMC-6523
> >>
> >> /M
> >>
> >> On Mon, Jul 8, 2019 at 10:59 AM Jie Kang <jkang at redhat.com> wrote:
> >>>
> >>> On Wed, Jul 3, 2019 at 11:52 AM Florian David <florian.david at gmail.com> wrote:
> >>> >
> >>> >
> >>> >
> >>> > Le jeu. 27 juin 2019 à 23:38, Jie Kang <jkang at redhat.com> a écrit :
> >>> >>
> >>> >> Hi Florian,
> >>> >>
> >>> > Hi Jie,
> >>> >
> >>> > Thanks for the review. Please find the new one taking into account your feedback here:
> >>> > http://cr.openjdk.java.net/~hirt/fdavid/JMC-6512/webrev.1/
> >>> >
> >>> >>
> >>> >> Changes look mostly good.
> >>> >>
> >>> >> # I have some suggestions and comments below:
> >>> >>
> >>> >> * The Messages also contain:
> >>> >>
> >>> >> CompareCpuRule_INFO_LIMIT=Competing CPU usage limit
> >>> >> CompareCpuRule_INFO_LIMIT_LONG=The amount of CPU used by other
> >>> >> processes needed to trigger an info notice
> >>> >> CompareCpuRule_WARNING_LIMIT=Competing CPU usage warning limit
> >>> >> CompareCpuRule_WARNING_LIMIT_LONG=The amount of CPU used by other
> >>> >> processes needed to trigger a warning
> >>> >>
> >>> >> I think this patch should also alter these descriptions to reflect the
> >>> >> new calculation behind the rule.  To be honest I'm not sure how to
> >>> >> describe the new meaning succinctly in words, otherwise I'd provide a
> >>> >> suggestion, sorry;
> >>> >>
> >>> > I have updated these messages:
> >>> >
> >>> > CompareCpuRule_TEXT_INFO_LONG=The application performance can be affected when the machine is under heavy load and there are other processes that use CPU or other resources on the same computer. To profile representatively or get higher throughput, shut down other resource intensive processes running on the machine.
> >>> > ...
> >>> > CompareCpuRule_TEXT_MESSAGE=An average CPU load of {2} was caused by other processes during {0} starting at {1}.
> >>> > CompareCpuRule_INFO_LIMIT=Competing CPU usage info limit
> >>> > CompareCpuRule_INFO_LIMIT_LONG=The amount of CPU used by other processes times the overall CPU usage needed to trigger an info notice
> >>> > CompareCpuRule_WARNING_LIMIT=Competing CPU usage warning limit
> >>> >
> >>> > I tried to keep these messages simple and not overwhelming the user with lot of details.
> >>> > However, if you think it should be even more detailed, I can be more precise.
> >>>
> >>> Hey Florian,
> >>>
> >>> These messages are fine with me. Unfortunately some of the tests now fail:
> >>>
> >>> [ERROR] Failures:
> >>> [ERROR]   TestRulesWithJfr.verifyAllResults:132->verifyRuleResults:159
> >>>
> >>> I am guessing the JfrRuleBaseline.xml needs to have some more updates
> >>> for the new changes.
> >>>
> >>> I think the rest of the patch looks fine!
> >>>
> >>> >
> >>> >>
> >>> >> # The following comments could be addressed in different Jira issues, if at all:
> >>> >>
> >>> >> * The SpanToolkit is used twice on the items now, once for the max and
> >>> >> once for the ratio. Could the underlying iterations over the items be
> >>> >> combined? I understand they calculate different spans, one of which is
> >>> >> used in the message while the other is used for the score. Could they
> >>> >> be aligned?
> >>> >>
> >>> >> * It's curious that the SpanToolkit is given the warning limit whereas
> >>> >> the long message is generated if the score is above the info limit. As
> >>> >> well, the warning limit is used to determine if spans get combined
> >>> >> while the end result used for the score process is the longest span,
> >>> >> regardless of it's value's relation to the warning limit. Should it be
> >>> >> the longest span that's above the warning limit, if possible,
> >>> >> otherwise the longest span? I guess it depends on if the initial spans
> >>> >> vary in length or not. I'm not sure how to describe it but something
> >>> >> feels strange here.
> >>> >>
> >>> > I agree, these comments are relevant. However I'm not yet a contributor
> >>> > thus I don't have rights to create Jira tickets :( How can we address that ?
> >>>
> >>> I think you can ask Marcus to create a Jira ticket for you.
> >>>
> >>>
> >>> Regards,
> >>>
> >>>
> >>>
> >>> >
> >>> > For the second point, I also believe there is something fishy in this code
> >>> > about using the warning limit instead of the info limit is incorrect. I'm not
> >>> > mastering spans already but passing warning limit to getMaxSpanLimit
> >>> > should exclude span below this limit, thus only triggering the rule if score
> >>> > is above warning limit only and never of it's between info and warning limits.
> >>> > Something is wrong in this workflow.
> >>> >
> >>> >> # The following is just a question for the intended design:
> >>> >>
> >>> >> * After an initial read of these changes, I would expect that if I set
> >>> >> a warning limit of say 0.4, then a warning would be triggered if there
> >>> >> is a longest span where the CPU usage of other processes was 40% and
> >>> >> the total CPU usage was 100% I.e. the range is like [1]? Is that
> >>> >> right?
> >>> >>
> >>> >> [1]
> >>> >> x*y<LIMIT, 0<=x<=1, 0<=y<=1, x<y
> >>> >> x: CPU Usage of Other Processes over the longest span of time
> >>> >> y: CPU Usage of All Processes over longest span of time
> >>> >>
> >>> > Yes that's exactly the computation being done. The core idea is that when
> >>> > machine CPU is low, it's meaningless to trigger the rule since there is still enough
> >>> > CPU to run other processes. The multiplication by 'y' decreases the score
> >>> > in this case.
> >>> >
> >>> > Regards,
> >>> > Florian
> >>> >
> >>> >>
> >>> >>
> >>> >> Regards,
> >>> >>
> >>> >>
> >>> >> On Thu, Jun 27, 2019 at 5:34 AM Florian David <florian.david at gmail.com> wrote:
> >>> >> >
> >>> >> > Hi all,
> >>> >> >
> >>> >> > Please review this fix for the CPU load rule.
> >>> >> >
> >>> >> > JIRA: https://bugs.openjdk.java.net/browse/JMC-6512
> >>> >> > Webrev: http://cr.openjdk.java.net/~hirt/fdavid/JMC-6512/webrev.0/
> >>> >> >
> >>> >> > Regards,
> >>> >> > Florian


More information about the jmc-dev mailing list