RFR: 1680: NotifyBot::getPeriodicItems takes too long

Magnus Ihse Bursie ihse at openjdk.org
Wed Nov 23 13:56:05 UTC 2022


On Thu, 17 Nov 2022 22:12:21 GMT, Erik Joelsson <erikj at openjdk.org> wrote:

> When starting up the notify bot, the first round of `getPeriodicItems` was observed to make >2500 calls to github for repository branches (taking a total of 18 minutes). This call originates from `NotifyBot::isOfInterest`, which looks for pre-integration branches matching the PR. There is one call for every PR being considered, plus pagination. The JDK repository has so many open PRs at the moment that this call needs 8 pages. ~250 PRs x 8 pages is 2000 calls just for that repo. On top of this, there is also another check in `isOfInterest` that looks for a "ready comment". This check needs to parse all comments of a PR, which there can also be quite a few of, making it deceptively expensive.
> 
> The method `Bot::getPeriodicItems` is expected to a be a quick method that gets called at a set interval to spawn new `WorkItem` instances. The `BotRunner` calls all the `getPeriodicItems` serially, so if they start taking significant time, it adds up pretty quickly. This is especially true in the very first round after a bot restart, where a lot of PRs need to be evaluated. The intent of having this "quick" discarding of PRs in `getPeriodicItems` was to avoid the overhead of spawning WorkItems for PRs that could easily be identified as not needing it, but in this case, it seems to have backfired.
> 
> To fix this, I'm moving the `isOfInterest` method away from `NotifyBot` and into `PullRequestWorkItem::run` and making it the first thing to be called. By doing this, we add the overhead of always spawning a `WorkItem`, but we move the heavy processing into the WorkItems, who execute with high concurrency, away from the serially executed `getPeriodicItem`. Looking at metrics data, it's pretty clear that the initial `getPeriodicItems` isn't able to saturate the thread pool with enough WorkItems today due to all these expensive checks, so this change alone should improve latency quite significantly for the first round.
> 
> To further reduce unnecessary work, I'm rearranging `isOfInterest` a bit, and only checking branches if the labels aren't found on the PR, and the prbranch notifier is actually configured on the repo. I'm also changing how we check for the existence of the pre integration branch. Instead of fetching all branches (which may result in 8+ pages), I'm rewriting the `HostedRepository::branchHash` method to return an `Optional<Hash>` that will be empty on 404. That makes it usable for checking if a branch exists.

Have you been able to verify if this does any difference in startup speed for the bot?

-------------

PR: https://git.openjdk.org/skara/pull/1422


More information about the skara-dev mailing list