RFR 8171119: Low-Overhead Heap Profiling

JC Beyler jcbeyler at google.com
Wed May 9 23:13:56 UTC 2018


Hi Jerry,

First off: Thanks for looking the tests over! I agree that it is not easy
to test easily but we have been working really hard on getting it very
stable.

As is explained in the test plan item: JDK-8202570
<https://bugs.openjdk.java.net/browse/JDK-8202570> (sub task of JDK-8202566
<https://bugs.openjdk.java.net/browse/JDK-8202566>), I've run these tests
now more than 1000 times in a row (I've actually done it multiple times)
and Serguei has been running the mach5 tests. I'll let him give an update
on it but basically, most tests have actually been stable.

There definitely is variability due to the way the sampling rate is
calculated and the two systems that I put up to mitigate it has been to
accept a certain error rate and to, in certain cases, if a first run
failed, run it a second time.

Serguei and I have been iterating on the mach5 testing results to ensure
stability for the tests. Some of the tests have had a few changes to solve
warming up the TLABs, total memory limits, and test timing limits but
nothing really provoked a massive change in the testing framework in
general. I'll let him chime in if I have forgotten anything.

Thanks, let me know if this answers your questions and if there is anything
else I could do :),
Jc

On Wed, May 9, 2018 at 3:20 PM Gerald Thornbrugh <
gerald.thornbrugh at oracle.com> wrote:

> Hi Jc and Serguei,
>
> I have looked at the test fixes and overall they look good.  I am a little
> concerned with some of the comments
> I read that included “Flakiness” and “unlucky” like this comment
> in HeapMonitorArrayAllSampledTest.java:
>
>       // 10% error ensures a sanity test without becoming flaky.
>       // Flakiness is due to the fact that this test is dependent on the sampling rate, which is a
>       // statistical geometric variable around the sampling rate. This means that the test could be
>       // unlucky and not achieve the mean average fast enough for the test case.
>
>
> So it seems there is some dependencies or variabilities that could cause
> the test to fail if it got unlucky.
> I understand that it is difficult to test features like this in a
> reproducible and reliable way.
> My fear is that under certain system loads or configurations these tests
> may get “flaky” or “unlucky” and start
> causing intermittent failures during test runs.
>
> What kind of testing has been performed for these changes and have you
> seen and failures due to “flakiness”
> or “unluckiness”?
>
> Thanks,
>
> Jerry
>
> On May 7, 2018, at 9:49 PM, serguei.spitsyn at oracle.com wrote:
>
> Hi Jc,
>
> I'll make one more pass through the JVMTI and test fixes.
> However, it would be good if at least one more pair of eyes
> looked at the tests as they are a big part of the fix.
>
> Thanks,
> Serguei
>
>
> On 5/7/18 19:28, JC Beyler wrote:
>
> Hi Vladimir,
>
> Good catch, I believe it was used before but no longer since we put the
> heap sampler information directly in the thread structure. I removed it for
> the next webrev.
>
> Could anyone do a review on the JVMTI parts and tests?
>
> Thanks a lot for your help!
> Jc
>
> On Mon, May 7, 2018 at 6:31 PM Vladimir Kozlov <vladimir.kozlov at oracle.com
> >
> wrote:
>
> I did not look on JVMTI part and tests. It looks good to me.
>
> Where _thread field is used?
>
> Thanks,
> Vladimir
>
> On 5/7/18 6:10 PM, JC Beyler wrote:
>
> Hi all,
>
> With the awesome help of Serguei Spitsyn, we have moved forward on the
> implementation for JEP-331 and have the following webrev for review:
>
> Webrev: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.18/
> Bug: https://bugs.openjdk.java.net/browse/JDK-8171119
>
> It is based on jdk/jdk so should patch well with a recent tip.
>
> Could we please have some reviews for the webrev? It would be greatly
> appreciated!
>
> Thanks for all your help!
> Jc
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20180509/6c70c150/attachment.html>


More information about the serviceability-dev mailing list