RFR: JDK-8206013: vmTestbase/nsk tests should have /timeout configured

Gary Adams gary.adams at oracle.com
Mon Jul 16 14:49:16 UTC 2018


I agree that timeouts should be very rare and that a shorter timeout
helps during test development. These tests were written a long time
ago and the test developers are no longer available for ongoing
adjustments.

These tests were not designed to be performance regression tests.
They typically have a single feature that is being tested. For instance
the recent investigations with exclude001 revealed many more methods
were being processed now than when the test was originally written.
Increasing waittime from 5 to 7 minutes allowed it to run to completion
on the slower solaris-sparcv9-debug build.

I believe we are looking for ways to keep the continuous integration
systems building and testing automatically. Intermittent timeout
failures should be avoided where possible. I believe historically
we have seen both vmTestbase/nsk waitime timeouts and
jtreg timeouts in this collection of tests. Increasing the jtreg timeout
should allow the internal waitime timeout to have first shot at
reporting a timeout.
> Hi Gary,
>
> I wasn't suggesting a shorter waittime to speed up the tests. It's just
> another (of many) timeout related parameters use to detect (what should
> be very uncommon) timeout failures sooner. I guess in that case it does
> make the test faster in cases where it does timeout.
Agreed.
>
> So one question is how much do we care about timeout performance? If not
> at all (we think the timeout is very rare, if ever), we'd just do a
> something like a 1h timeout and forget about it. However, historically
> that is not the approach we have taken. jtreg is given a fairly short
> timeout of 2m, multiplied to account for platform performance.
When/if these tests are rewritten to be more jtreg centric, the timeout and
time factor arguments should be updated.

I believe performance specific tests should catch regressions.
These functional tests should be given adequate time to complete
their tasks.
>
> So while I understand it doesn't make sense to have the waittime be
> longer than the (adjusted) jtreg timeout (we'd always hit the jtreg
> timeout first), I don't think that implies we should make the jtreg
> timeout longer. Maybe we should make the waittime shorter. In any case,
> with the current timeoutFactor in place, it's already the case that the
> jtreg timeout is longer than waittime. So I'm not sure why you feel the
> need to make the jtreg timeout longer, unless the test is hitting the
> jtreg timeout already.
The current waittime setting is what the original test developer
designated. It corresponds closest to the jtreg timeout setting.

>
> And another thought that just came to me. Timeouts can also serve the
> purpose of detecting bugs. If the test author decides the test should
> finish in 1m, and someone bumps the timeout to 10m, that might make a
> performance bug introduced in the future go unnoticed. In general I
> don't think we should increase the timeout for tests that are not
> currently timing out. For ones that are, first see if there is a
> performance related issue.
I believe the focus should be setting the timeouts that allow these
tests to reliably complete on the current supported platforms
and build variants.
>
> Chris
>
> On 7/13/18 2:36 PM,gary.adams at oracle.com  <http://mail.openjdk.java.net/mailman/listinfo/serviceability-dev>  wrote:
> >/  We know that the default jtreg timeout is 2 minutes and typically
> />/  runs with a timeoutfactor of 4 or 10. So the harness "safety net"
> />/  is 8 to 20 minutes from jtreg.
> />/
> />/  It does appear that most of the vmTestbase tests use a 5 minute
> />/  waittime. I have seen waittime used in different ways. The one we
> />/  saw most recently was waiting for a specific reply that was taking
> />/  upwords of 7 minutes handling method exclude filtering. e.g.
> />/  600K methods on solaris-sparcv9-debug
> />/
> />/  I've seen other tests using waittime as a total test timeout.
> />/
> />/  The jtreg timeout factor has not been applied to the vmTestbase waitime.
> />/  The tests have been quickly ported so they can run under jtreg
> />/  harness, but have not been converted to use the all the jtreg features.
> />/
> />/  The purpose of this specific fix is to prevent jtreg from an early
> />/  termination at 2 minutes or 8 minutes, when the original waittime
> />/  allows for 5 minutes.
> />/
> />/  Reducing waittime will not speed up the tests. It would probably
> />/  introduce
> />/  more intermittent timeout reports.
> />/
> />/  On 7/13/18 4:21 PM, Chris Plummer wrote:
> />>/  Hi Gary,
> />>/
> />>/  It looks like you have properly added timeout=300 wherever we use
> />>/  -waittime:5. However, I'm not 100% convinced this is always the right
> />>/  approach. In the bug description you said that -waittime is used as a
> />>/  timeout for individual operations. However, there could be multiple
> />>/  of those operations, and they could in sum exceed the 300 second
> />>/  jtreg timeout you added.
> />>/
> />>/  What is the default for -waittime? I'm also guessing that the initial
> />>/  application of -waittime was never really tuned to the specific tests
> />>/  and just cloned across most of them. It seems every test either needs
> />>/  5m or the default, which doesn't really make much sense. If 5m was
> />>/  really needed, we should have seen a lot of failures when ported to
> />>/  jtreg, but as far as I know the only reason this issue got on your
> />>/  radar was due to exclude001 needing 7m. Maybe rather than adding
> />>/  timeout=300  you should change -waitime to 2m, since other than
> />>/  exclude001, none of the tests seem to need more than 2m.
> />>/
> />>/  Lastly, does timeoutFactor impact -waittime? It seems it should be
> />>/  applied to it also. I'm not sure if it is.
> />>/
> />>/  thanks,
> />>/
> />>/  Chris
> />>/
> />>/  On 7/13/18 4:29 AM, Gary Adams wrote:
> />>>/  This is a simple update to set the jtreg timeout to match the
> />>>/  internal waittime already being used by these vmTestbase/nsk/jdb tests.
> />>>/
> />>>/     Issue:https://bugs.openjdk.java.net/browse/JDK-8206013
> />>>/     Webrev:http://cr.openjdk.java.net/~gadams/8206013/webrev.00/  <http://cr.openjdk.java.net/%7Egadams/8206013/webrev.00/>
> />>/
> />>/
> />>/
> />/
> /
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20180716/86b6b04a/attachment.html>


More information about the serviceability-dev mailing list