RFR: 1659: Fix polling and retries in testinfo bot [v2]
erikj at openjdk.org
Fri Nov 4 14:28:13 UTC 2022
> The testinfo bot monitors "checks" in the source repository of PRs and copies over the results to the PR itself. This makes it easier for PR reviewers to see results of e.g. GitHub actions.
> Since this bot needs to react to things that aren't part of the PR, it needs some kind of periodic polling of PRs where new check updates are likely. This is currently handled by the TestInfoBotWorkItem calling back with an "expiration time" for a retry, depending on the conclusion of the current run. The idea seems to be that a new WorkItem should not be scheduled until after this expiration time. This isn't exactly how it ends up working however. The big issue here is that if no callback is made, the bot will happily schedule another WorkItem without delay in the next call to getPeriodicItems. There is also a race between the execution of getPeriodicItems and the execution of a WorkItem. If the WorkItem isn't calling back with an expiration in time for the next getPeriodicItems, then a new WorkItem will also be scheduled without delay.
> This patch attempts to solve this by using the `PullRequestPoller` for getting updated PRs. For any PR that hasn't been updated, the only way to get scheduled is if a WorkItem is calling back with a "recheckAt" time. The poller has this functionality built in already with the retry with a date feature. This will change the bot to recheck some PRs more often (every time they are touched by a user or another bot), but it will also stop rechecking certain PRs constantly.
> I'm also changing the how and when the `TestInfoBotWorkItem` calls back with a recheckAt. I'm making sure each category has a built in termination condition, to avoid ending up with an endless loop of rechecking every time interval. This is either if a PR is closed, or if checks are still running, for max of 24h since the PR was touched.
> The test fix was a perplexing problem to solve. The old test setup created "checks" on a draft PR instead of a repository, which confused me. The TestInfoBot copies checks from a repository to a PR so that is what we are supposed to test. The reason for this was that `TestHostedRepository` didn't have support for having checks itself, instead the `checks` method just returned all checks from all PRs associated with the TestHost. As far as I can tell, this was just an unnecessary simplification of the test classes that made implementing a test for the `TestInfoBot` more complicated than necessary. To make things even weirder, by having `TestHostedRepository::checks` return all checks from all PRs, depending on which order the "draft" PR and the real PR were processed by TestInfoBot, they would get each others checks copied repeatedly. The old test happened to "work" because the right PR was processed first, but with my new implementation, the processing order happened to be reversed
. My fix was to add a checks field on `TestHostedRepository` and put the checks there in the test setups instead.
Erik Joelsson has updated the pull request incrementally with one additional commit since the last revision:
- all: https://git.openjdk.org/skara/pull/1411/files
- new: https://git.openjdk.org/skara/pull/1411/files/4cc4ab09..8485b360
- full: https://webrevs.openjdk.org/?repo=skara&pr=1411&range=01
- incr: https://webrevs.openjdk.org/?repo=skara&pr=1411&range=00-01
Stats: 12 lines in 2 files changed: 1 ins; 0 del; 11 mod
Fetch: git fetch https://git.openjdk.org/skara pull/1411/head:pull/1411
More information about the skara-dev