RFR: 8274083: Update testing docs to mention tiered testing

Wed Sep 22 02:09:02 UTC 2021

On Tue, 21 Sep 2021 16:06:28 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> doc/testing.html line 77:
>> 
>>> 75: <li><p><code>tier1</code>: This test group is the first line of defense against bugs. Multiple developers run these tests every day. Normally, at least this tier is ran before integration. Because of the widespread, the tests in <code>tier1</code> are carefully selected and optimized to run fast, and to run in the most stable manner. The test failures in <code>tier1</code> are usually followed up on quickly, either with fixes, or adding relevant tests to problem list. <a href="../.github/workflows/">GitHub Actions workflows</a>, if enabled, run <code>tier1</code> tests.</p></li>
>>> 76: <li><p><code>tier2</code>: This test group covers even more ground. These contain, among other things, tests that either run for too long to be at <code>tier1</code>, test unstable and/or experimental features, test less essential JDK components.</p></li>
>>> 77: <li><p><code>tier3</code>: This test group covers more stressful tests, the tests for corner cases not covered by previous tiers, plus the tests that require GUIs. As such, this suite should either be run with low concurrency (<code>TEST_JOBS=1</code>), or without headful tests (<code>JTREG_KEYWORDS=\!headful</code>), or both.</p></li>
>> 
>> It's not clear to me that these descriptions are useful. Instead of "First line of defense against bugs" then it might be better to say something about the hotspot and core library tests that are typically run in tier1. Folks contributing to some areas of the core, I/O, and networking libraries will need to learn about tier2 as that is where many of the tests for these APIs and implementations are. So I wouldn't use words like "unstable", "experimental", "less essential" but maybe say that it includes tests that may require some configuration, like changing firewall rules to allow multicast or other traffic.
>
> I'll take another stab at it tomorrow.
> 
> The whole tiered testing thing encapsulates that some tests are more important to run (at least, first or regularly) than others. So we need to somehow capture that. Discussing what tests are actually running is not relevant, because `TEST.groups` is the source of truth there.
> 
> For a contributor, these descriptions should tell why, when and how often would you use a particular tier. In other words, good docs capture some thinking about how contributors are supposed to use the existing tests. It seems to be a broad consensus that we run `tier1` before non-trivial integrations. It seems to be a rule of thumb that we care a lot about `tier1` stability and speed, and we push out less stable, slower or specially-configured tests to higher tiers. As I inspect the test groups, I think `tier2` is for regular but non-`tier1`-grade tests, `tier3` is mostly for stress tests, and `tier4` is (somewhat by construction) is "everything else".
> 
> From here, this what I'd like to tell contributors: run `tier1` early and often, the bugs there are likely to be real product bugs and should be looked at as soon as possible, run `tier1` + `tier2` to execute all regular tests but suspect test bugs more, run `tier3` to add even more stress testing and suspect environmental problems when they fail, run `tier4` to run everything that previous tiers missed and hope for the best. Maybe I should just put that in the docs :)

Ideal world: run every test on every platform before every integration.
Real world: run as many tests as is practical having consideration for time of testing and the cost of testing resources, and the scope of the change.

It is assumed/expected you will have run all tests pertaining to the area of code change, that can reasonably be expected to be run.

The tiered testing is intended to give broader coverage to ensure no unintended consequences of a change. At a minimum run all tier1 tests on all non-trivial integrations. If you have time and resources, and think a change may have broader impact than what has been explicitly tested, then run further tiers.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5615