Draft JEP: JDK Core Libraries Test Stabilization

Mon May 5 12:57:31 UTC 2014

Hi Stuart,

great proposal. You can count on me when it comes to testing on exotic
platforms like for example AIX:)

Regards,
Volker

On Thu, May 1, 2014 at 2:08 AM, Stuart Marks <stuart.marks at oracle.com> wrote:
> Hi all,
>
> Here's a draft JEP for stabilizing the core libraries regression test suite,
> that is, fixing up the spuriously failing tests. Please review and comment.
>
> Thanks!
>
> s'marks
>
>
>
>
> Title: JDK Core Libraries Test Stabilization
> Author: Stuart Marks
> Organization: Oracle
> Discussion: core-libs-dev at openjdk.java.net
> [...other metadata elided...]
>
> Summary
> -------
>
> The JDK Regression Test Suite has several thousand fully automated
> tests. These tests are valuable and effective in that they serve to
> prevent bugs from entering the code base. However, they suffer from
> many intermittent failures. Many of these failures are "spurious" in
> that they are not caused by bugs in the product. Spurious failures add
> considerable noise to test reports; they make it impossible for
> developers to ascertain whether a particular change has introduced a
> bug; and they obscure actual failures.
>
> The reliability of the regression suite has improved considerably over
> the past few years. However, there are perhaps still 100-200 tests
> that fail intermittently, and most of these failures are spurious.
> This project aims to reduce the number and frequency of spuriously
> failing tests to a level where it no longer is an impediment to
> development.
>
> This project targets tests from the regression suite that cover the
> JDK Core Libraries, including base packages (java.lang, io, nio, util)
> I18N, Networking, RMI, Security, and Serviceability. JAXP and CORBA
> are also included, although they have relatively few regression tests
> at present.
>
>
> Non-Goals
> ---------
>
> Regression tests for other areas, including Hotspot, Langtools, and
> Client areas, are not included in this project.
>
> This project does not address operational issues that might cause
> builds or test runs to fail or for reports not to be delivered in a
> timely fashion.
>
> This project is not focused on product bugs that cause test
> failures. Such test failures are "good" in that the test suite is
> providing valid information about the product.
>
> Test runs on embedded platforms are not covered by this project.
>
>
> Success Metrics
> ---------------
>
> The reliability of a successful test run (100% pass) currently stands
> at approximately 0.5%. The goal is to improve this success rate to
> 98%, exclusive of true failures (i.e., those caused by bugs in the
> product). At a 98% success rate, a continuous build system that runs ten
> jobs per day, five days a week would have one or fewer spurious
> failures per week.
>
>
> Motivation
> ----------
>
> Developers are continually hampered by the unreliability of the
> regression test suite. Intermittently failing tests add significant
> noise to the results of every test run. The consequence is that
> developers cannot tell whether test failures were caused by bugs
> introduced by a recent change or whether they are spurious
> failures. In addition, the intermittent failures mask actual failures
> in the product, slowing development and reducing quality. Developers
> should be able to rely on the test suite telling them accurate
> information: test failures should indicate the introduction of a bug
> into the system, and absence of test failures should be usable as
> evidence that changes are correct.
>
>
> Description
> -----------
>
> Spurious test failures fall into two broad categories:
>
>  - test bugs
>  - environmental issues
>
> Our working assumption for most intermittent test failures is that
> they are spurious, and further, that they are caused by bugs in the
> test itself. While it is possible for a product bug to cause an
> intermittent failure, this is relatively rare. The majority of
> intermittent failures encountered so far have indeed proven to be test
> bugs.
>
> "Environmental" issues, such as misconfigured test machines, temporary
> dysfunction on the machine running the test job (e.g., filesystem
> full), or transient network failures, also contribute to spurious
> failures. Test should be made more robust, if possible. Environment
> issues should be fed back to the infrastructure team for resolution
> and future infrastructure improvements.
>
> A variety of techniques will be employed to diagnose, track, and help
> develop fixes for intermittently failing tests:
>
>  - track all test failures in JBS
>  - repeated test runs against the same build
>  - gather statistics about failure rates, # tests with bugs, and track
> continuously
>  - investigate pathologies for common test failure modes
>  - develop techniques for fixing common test bugs
>  - develop test library code to improve commonality across tests and to
>    avoid typical failure modes
>  - add instrumentation to tests (and to the test suite) to improve
> diagnosability
>  - exclude tests judiciously, preferably only as a last resort
>  - change reviews
>  - code inspections
>
>
> Alternatives
> ------------
>
> The most likely alternative to diagnosing and fixing intermittent
> failures is to aggressively exclude intermittently failing tests from
> the test suite. This trades off code coverage in favor of test
> reliability, adding risk of undetected bug introduction.
>
>
> Testing
> -------
>
> The subject of the project is the test suite itself. The main
> "testing" of the test suite is running it repeatedly in a variety of
> environments, including continuous build-and-test systems, as well as
> recurring "same-binary" test runs on promoted builds. This will help
> flush out intermittent failures and detect newly introduced failures.
>
>
> Risks and Assumptions
> ---------------------
>
> We are working on a long tail of intermittent failures, which may
> become increasingly frustrating as time goes on, resulting in the
> project stalling out.
>
> New intermittent failures may be introduced or discovered more quickly
> than they can be resolved.
>
> The main work of fixing up the tests will be spread across several
> development groups. This requires good cross-group coordination and
> focus.
>
> The culture in the development group has (mostly) been to ignore test
> failures, or to find ways to cope with them. As intermittent failures
> are removed, we hope to decrease the group's tolerance of test failures.
>
>
> Dependences
> -----------
>
> No dependences on other JEPs or components.
>
>
> Impact
> ------
>
> No impact on specific parts of the platform or product, except for
> developer time and effort being spent on it, across various component
> teams.
>
> ==========