RFR: 8287366: Improve test failure reporting in GHA

Magnus Ihse Bursie ihse at openjdk.java.net
Mon Jun 6 21:11:26 UTC 2022


On Mon, 6 Jun 2022 12:57:25 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

>> It is currently both tricky and tedious to figure out what went wrong when a jtreg test fails in GHA.
>> 
>> We should utilize the full potential of GitHub Action summaries and error annotations to make finding failures easier and more discoverable.
>> 
>> With this PR, the overview of failures are presented on the "Summary" page for the action (the top-most line to the left, with the outline house icon). Below the `submit.yml` dependency graph, you'll find the annotations, which will look like this:
>> 
>> 
>> Linux x86 (jdk/tier1 part 1)
>> Test run reported 34 test failure(s) and 0 error(s). See summary for details.
>> 
>> 
>> Below the annotations follow the summaries. Go have a look at the runs for this PR to see what it looks like! In short, there is a separate summary per test job. The first part lists the names of the failed tests. This will always be included. Below this (with links from the summary list) are detailed information for each failed test. This include the jtreg output, and the `hs_err` file(s), if present. The latter part has a limit from Github on 1 MB. If this limit is broken, no detailed information at all is presented (sorry 'bout that; GitHub's rules).
>> 
>> This PR is deliberately based on a commit prior to the fix for JDK-8287137 (Problemlist failing x86_32 tests after Loom integration), so you can see for yourself how the GHA runs looks in case of a "train wreck" testing situation, like on x86 after Loom. As you can see, most of the output part of the summaries got larger than the 1 MB limit, which means they were not shown. Only the summary for `Linux x86 (hs/tier1 runtime)` displays as intended. OTOH, this shows that the system has a "graceful degradation" mode for even large amount of failures like this. And, since I don't see a Loom v2.0 coming anytime soon, I believe this amount of failed tests are unlikely to be a realistic scenario.
>> 
>> Finally: the duplication in submit.yml is really, really annoying. :-( I have copied the same code block to three places. The fourth place, for Windows, do not get any support at this time. Concurrently with this change, I have started a separate branch where I split up submit.yml into reusable parts, using "callable workflows" and "custom actions". As part of this effort, I will also change the windows jobs to use cygwin bash instead of PowerShell. Until then, I could not be bothered to even think about implementing this functionality in PS. When that change is integrated, Windows will get this functionality for free, too.
>
>> With this PR, the overview of failures are presented on the "Summary" page for the action (the top-most line to the left, with the outline house icon).
> 
> @magicus, thank you. This is really useful. I didn't even know that this "Summary" page existed. I now checked this page on one of my PRs (which includes this commit) and it does indeed make it much simpler to analyze these failures.

@jaikiran Thanks for the kind words. I think I should perhaps do some tweaking to the Skara bots that link to the GHA runs, so it easier to go to the summary page.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8901



More information about the build-dev mailing list