RFR: 1680: NotifyBot::getPeriodicItems takes too long
Erik Joelsson
erikj at openjdk.org
Thu Nov 17 22:17:25 UTC 2022
When starting up the notify bot, the first round of `getPeriodicItems` was observed to make >2500 calls to github for repository branches (taking a total of 18 minutes). This call originates from `NotifyBot::isOfInterest`, which looks for pre-integration branches matching the PR. There is one call for every PR being considered, plus pagination. The JDK repository has so many open PRs at the moment that this call needs 8 pages. ~250 PRs x 8 pages is 2000 calls just for that repo. On top of this, there is also another check in `isOfInterest` that looks for a "ready comment". This check needs to parse all comments of a PR, which there can also be quite a few of, making it deceptively expensive.
The method `Bot::getPeriodicItems` is expected to a be a quick method that gets called at a set interval to spawn new `WorkItem` instances. The `BotRunner` calls all the `getPeriodicItems` serially, so if they start taking significant time, it adds up pretty quickly. This is especially true in the very first round after a bot restart, where a lot of PRs need to be evaluated. The intent of having this "quick" discarding of PRs in `getPeriodicItems` was to avoid the overhead of spawning WorkItems for PRs that could easily be identified as not needing it, but in this case, it seems to have backfired.
To fix this, I'm moving the `isOfInterest` method away from `NotifyBot` and into `PullRequestWorkItem::run` and making it the first thing to be called. By doing this, we add the overhead of always spawning a `WorkItem`, but we move the heavy processing into the WorkItems, who execute with high concurrency, away from the serially executed `getPeriodicItem`. Looking at metrics data, it's pretty clear that the initial `getPeriodicItems` isn't able to saturate the thread pool with enough WorkItems today due to all these expensive checks, so this change alone should improve latency quite significantly for the first round.
To further reduce unnecessary work, I'm rearranging `isOfInterest` a bit, and only checking branches if the labels aren't found on the PR, and the prbranch notifier is actually configured on the repo. I'm also changing how we check for the existence of the pre integration branch. Instead of fetching all branches (which may result in 8+ pages), I'm rewriting the `HostedRepository::branchHash` method to return an `Optional<Hash>` that will be empty on 404. That makes it usable for checking if a branch exists.
-------------
Commit messages:
- SKARA-1680
Changes: https://git.openjdk.org/skara/pull/1422/files
Webrev: https://webrevs.openjdk.org/?repo=skara&pr=1422&range=00
Issue: https://bugs.openjdk.org/browse/SKARA-1680
Stats: 116 lines in 12 files changed: 56 ins; 37 del; 23 mod
Patch: https://git.openjdk.org/skara/pull/1422.diff
Fetch: git fetch https://git.openjdk.org/skara pull/1422/head:pull/1422
PR: https://git.openjdk.org/skara/pull/1422
More information about the skara-dev
mailing list