Proposed new policies for JDK 9 regression tests: tiered testing, intermittent failures, and randomness
joe darcy
joe.darcy at oracle.com
Thu Mar 19 00:43:43 UTC 2015
Hello,
Over the last several years, there has been a background thread of
activity to improve the robustness and reliability of JDK regression
testing. In a effort to further improve testing of JDK 9, I have a few
proposals to put forward for discussion:
* Tiered testing
* Marking tests which fail intermittently
* Marking tests which use randomness
Some portions of the JDK regression tests are more stable that others;
that is, some portions are less prone to intermittent test failures.
Some aspects of the JDK are also more fundamental than others. For
example, javac and the core libraries are necessary for any kind of Java
development work while jaxp is not.
With those considerations in mind and taking into account the graph of
integration forests for JDK 9 [1], I propose an initial definition of
two tiers of tests:
* Tier 1: stable tests for fundamental parts of the platform. Failures
in tier 1 tests are treated as urgent issues to resolve, on par with a
build failure.
* Tier 2: tests which may be less stable or less fundamental parts of
the platform. Resolving failures in tier 2 tests is important, but not
as urgent as resolving a tier 1 failure.
The initial proposed population of tier 1 and tier 2 regression tests is:
Tier 1 tests:
jdk/test: jdk_lang, jdk_util, jdk_math
langtools/test
Tier 2 tests:
jdk/test: jdk_io, jdk_nio, jdk_net, jdk_rmi, jdk_time,
jdk_security, jdk_text, core_tools, jdk_other, jdk_svc
nashorn/test
jaxp/test:jaxp_all
The regression tests for client areas are not run as commonly as other
regression tests; client tests could be added as a third tier or
incorporated into tier 2 over time. Given how HotSpot integrates in
jdk9/dev after going through its own set of integration forests, the
current definitions of tiered testing is aimed at langtools and
libraries work.
Some of the areas included in tier 2 above are very fundamental, such as
jdk_io, but still have some testing issues. Once those issues are
resolved, a test set like jdk_io could be promoted from tier 2 to tier 1.
These definitions of tiered tests can be implemented as entries in the
TEST.group files used by jtreg in the various Hg component repositories,
jdk, langtools, jaxp, and nashorn.
One goal of this explicit tiered testing policy is that all the tier 1
tests would always pass on the master. In other words, in the
steady-state situation, integrations from dev into master should not
introduce tier 1 test failures on mainline platforms.
Resolving a new persistent test failure could be accomplished in
multiple ways. If there there a flaw in new code or the new test, the
new code or test could be fixed. If developing a full fix would take a
while, the test could be @exclude-d or put on the problem list while the
full fix is being tracked in another bug. Finally, if the testing
situation is sufficient bad, the changeset which introduced the problem
can be anti-delta-ed out.
Currently it is difficult to know what set of JDK regression tests
intermittently fail. To make this determination easier, I propose
defining for use in the JDK repositories a new jtreg keyword , say
"intermittent-failure", that would be added to tests known or suspected
to fail intermittently. The jtreg harness supports defining a set a
keywords for a set of tests in the TEST.ROOT file. The affected (or
afflicted) tests would get a
@key intermittent-failure
line as one of their jtreg tags. Besides documenting the problems of the
test in the test itself, a command like
jtreg -keywords:intermittent-failure ...
could be used to run the intermittently tailing tests as a group, such
as in a dedicated attempt to gather more failure information about the
tests.
Some tests want to explore a large space a inputs, often a space of
inputs so large is is impractical or undesirable to exhaustively explore
the space in routine testing. One way to get better test coverage in
this kind of situation over time is for a test of the area to use
randomness to explore a different subset of the input space of different
runs. This use of randomness can be a valid testing technique, but it
must be used with care.
If such a random-using test fails intermittently, it may be because it
is encountering a real flaw in the product, but a flaw that is only
visited rarely. If such a failure is observed, it should be added to the
bug database along with the seed value that can be used to reproduce the
situation. As a corollary, all such random-using tests should print out
the seed value they are using a provide a way to set the seed value on a
given run.
To aid those analyzing the failure of a random-using test, a new jtreg
keyword like "uses-randomness" should be added.
Thanks to Alan Bateman, Stuart Marks, and Stefan Särne for many
conversations leading up to these proposals.
Comments?
-Joe
[1] "Proposal to revise forest graph and integration practices for JDK 9,"
http://mail.openjdk.java.net/pipermail/jdk9-dev/2013-November/000000.html
More information about the jdk9-dev
mailing list