Proposed new policies for JDK 9 regression tests: tiered testing, intermittent failures, and randomness
joe darcy
joe.darcy at oracle.com
Thu Apr 30 00:12:31 UTC 2015
A follow-up,
The tiered testing definitions have been added to the jdk repository
(JDK-8075544). Two new jtreg keywords were defined and added to
appropriate jdk regression tests, "intermittent" (JDK-8075565) and
"randomness" (JDK-8078334).
Tests that are observed to fail intermittently should be tagged with the
"intermittent" keyword until such time as the flakiness of the test's
behavior is resolved.
Going forward, when a new regression test is written or an existing test
updated, the presence or absence of the "randomness" keyword should be
kept up-to-date with the behavior of the test. As explained in the
TEST.ROOT file,
> # The "randomness" keyword marks tests using randomness with test
> # cases differing from run to run. (A test using a fixed random seed
> # would not count as "randomness" by this definition.) Extra care
> # should be taken to handle test failures of intermittent or
> # randomness tests.
If a "randomness" test fails, and especially if it fails intermittently,
the seed value in force during the failing run should be included in a
bug report about the failure. One part of investigating the failure is
to see if the intermittent failure becomes reproducible if the seed is
set to the value observed during a failing run.
There is now a random number utility library for regression testing in
the jdk repository (JDK-8078672); the facility is located at
test/lib/testlibrary/jdk/testlibrary/RandomFactory.java
and can be accessed via jtreg @library facility with jtreg tags like
* ...
* @library /lib/testlibrary
* @build jdk.testlibrary.*
* ...
Calls to
new Random()
in regression tests can be replaced with
RandomFactory.getRandom()
In brief, requesting a random number generator from the factory outputs
the seed it is using and a test run can be made to use a particular seed
by passing in a -Dseed=X option to the jtreg test run.
Cheers,
-Joe
On 3/18/2015 5:43 PM, joe darcy wrote:
> Hello,
>
> Over the last several years, there has been a background thread of
> activity to improve the robustness and reliability of JDK regression
> testing. In a effort to further improve testing of JDK 9, I have a few
> proposals to put forward for discussion:
>
> * Tiered testing
> * Marking tests which fail intermittently
> * Marking tests which use randomness
>
> Some portions of the JDK regression tests are more stable that others;
> that is, some portions are less prone to intermittent test failures.
> Some aspects of the JDK are also more fundamental than others. For
> example, javac and the core libraries are necessary for any kind of
> Java development work while jaxp is not.
>
> With those considerations in mind and taking into account the graph of
> integration forests for JDK 9 [1], I propose an initial definition of
> two tiers of tests:
>
> * Tier 1: stable tests for fundamental parts of the platform. Failures
> in tier 1 tests are treated as urgent issues to resolve, on par with a
> build failure.
>
> * Tier 2: tests which may be less stable or less fundamental parts of
> the platform. Resolving failures in tier 2 tests is important, but not
> as urgent as resolving a tier 1 failure.
>
> The initial proposed population of tier 1 and tier 2 regression tests is:
>
> Tier 1 tests:
> jdk/test: jdk_lang, jdk_util, jdk_math
> langtools/test
>
> Tier 2 tests:
> jdk/test: jdk_io, jdk_nio, jdk_net, jdk_rmi, jdk_time,
> jdk_security, jdk_text, core_tools, jdk_other, jdk_svc
> nashorn/test
> jaxp/test:jaxp_all
>
> The regression tests for client areas are not run as commonly as other
> regression tests; client tests could be added as a third tier or
> incorporated into tier 2 over time. Given how HotSpot integrates in
> jdk9/dev after going through its own set of integration forests, the
> current definitions of tiered testing is aimed at langtools and
> libraries work.
>
> Some of the areas included in tier 2 above are very fundamental, such
> as jdk_io, but still have some testing issues. Once those issues are
> resolved, a test set like jdk_io could be promoted from tier 2 to tier 1.
>
> These definitions of tiered tests can be implemented as entries in the
> TEST.group files used by jtreg in the various Hg component
> repositories, jdk, langtools, jaxp, and nashorn.
>
> One goal of this explicit tiered testing policy is that all the tier 1
> tests would always pass on the master. In other words, in the
> steady-state situation, integrations from dev into master should not
> introduce tier 1 test failures on mainline platforms.
>
> Resolving a new persistent test failure could be accomplished in
> multiple ways. If there there a flaw in new code or the new test, the
> new code or test could be fixed. If developing a full fix would take a
> while, the test could be @exclude-d or put on the problem list while
> the full fix is being tracked in another bug. Finally, if the testing
> situation is sufficient bad, the changeset which introduced the
> problem can be anti-delta-ed out.
>
> Currently it is difficult to know what set of JDK regression tests
> intermittently fail. To make this determination easier, I propose
> defining for use in the JDK repositories a new jtreg keyword , say
> "intermittent-failure", that would be added to tests known or
> suspected to fail intermittently. The jtreg harness supports defining
> a set a keywords for a set of tests in the TEST.ROOT file. The
> affected (or afflicted) tests would get a
>
> @key intermittent-failure
>
> line as one of their jtreg tags. Besides documenting the problems of
> the test in the test itself, a command like
>
> jtreg -keywords:intermittent-failure ...
>
> could be used to run the intermittently tailing tests as a group, such
> as in a dedicated attempt to gather more failure information about the
> tests.
>
> Some tests want to explore a large space a inputs, often a space of
> inputs so large is is impractical or undesirable to exhaustively
> explore the space in routine testing. One way to get better test
> coverage in this kind of situation over time is for a test of the area
> to use randomness to explore a different subset of the input space of
> different runs. This use of randomness can be a valid testing
> technique, but it must be used with care.
>
> If such a random-using test fails intermittently, it may be because it
> is encountering a real flaw in the product, but a flaw that is only
> visited rarely. If such a failure is observed, it should be added to
> the bug database along with the seed value that can be used to
> reproduce the situation. As a corollary, all such random-using tests
> should print out the seed value they are using a provide a way to set
> the seed value on a given run.
>
> To aid those analyzing the failure of a random-using test, a new jtreg
> keyword like "uses-randomness" should be added.
>
> Thanks to Alan Bateman, Stuart Marks, and Stefan Särne for many
> conversations leading up to these proposals.
>
> Comments?
>
> -Joe
>
> [1] "Proposal to revise forest graph and integration practices for JDK
> 9,"
> http://mail.openjdk.java.net/pipermail/jdk9-dev/2013-November/000000.html
More information about the jdk9-dev
mailing list