RFR: 1532: CSRBot is too inefficient

Mon Aug 15 17:11:51 UTC 2022

On Mon, 15 Aug 2022 13:42:05 GMT, Guoxiong Li <gli at openjdk.org> wrote:

>> This patch is a pretty major redesign of how the CSRBot polls for work. The motivation for this is described in the bug description. It's quite big, so I will try to break down the major changes. 
>> 
>> 1. The `CSRBot` has been split into `CSRIssueBot` and `CSRPullRequestBot`. The issue bot polls the issue tracker (JBS) for work, while the pull request bot polls the forge (github). The WorkItem part of the old CSRBot has been moved to the new `PullRequestWorkItem`, and except for now only handling one single PR per WorkItem instance (instead of everything in a repo), has been left pretty much intact. All actual updates on PRs are (still) handled by this WorkItem. The new `IssueWorkItem` is created when a change to a CSR Issue has been detected (in JBS). It walks the JBS issue and PR links to identify all PRs that could possibly be related to that CSR, and creates `PullRequestWorkItem`s for them.
>> 
>> 2. To be able to find PRs from JBS issues, the CSRBot needs to know how to parse the PR comment links. To better share this knowledge, I moved the creation and deletion logic from the NotifyBot to PullRequestUtils (in the forge module), which is available to all bots that need it. I also added a method for parsing the URL from such a comment.
>> 
>> 3. To efficiently query for updated CSR issues from Jira, I added two new query methods on IssueProject. A longer explanation of the new polling mechanism for issues can be found in the first bug comment. I've also tried to reasonably explain things in comments in the code.
>> 
>> 4. In `JiraProject`, there was a pretty serious bug in all query methods for Issues. The `JiraIssue` constructor takes a `RestRequest` as a parameter. This request object is supposed to be used by `JiraIssue` for generating all the subqueries for the Jira REST API. The problem was that the query methods in `JiraProject` just sent in whatever `RestRequest` they used to find issues, which broke all methods that tried to make further calls through the returned `JiraIssue` to Jira. I fixed this by always supplying a `RestRequest` object with the correct URL when creating a JiraIssue.
>> 
>> 5. To be able to call Jira with absolute timestamps in queries, we need to know the timezone of the user we are calling as. Luckily this information was available with a simple call, so I added this to the `JiraHost` class.
>> 
>> 6. Similar to how care needs to be taken when polling for updates to issues in Jira, I applied a similar solution when polling PRs. This required a new method `HostedRepository.openPullRequestsAfter`. Unfortunately, it doesn't help much on Github, where the API doesn't support such a parameter, but it works for Gitlab.
>> 
>> I have modified the tests that are affected by this change so that they still pass. This does verify that changes to issues in the IssueTracker are detected. Perhaps I should add some new tests to specifically verify the polling logic. I have verified the new queries and complete functionality manually by running the bot against the playground repo and bugs-stage.
>
> bots/csr/src/main/java/org/openjdk/skara/bots/csr/CSRIssueBot.java line 56:
> 
>> 54:                 issueUpdatedAt.put(issue.id(), issue.updatedAt());
>> 55:                 log.fine("Setting lastUpdatedAt from last updated issue " + issue.id() + " updated at " + lastUpdatedAt);
>> 56:             } else {
> 
> When setting the `lastUpdateAt` to the date of the last updated issue firstly, the method `issueProject.csrIssues` can not get the previous csr issues at the beginning. Is it a intent?

Yes, this is the intent. When the bots start up, the CSRPullRequestBot will always go through every open PR and update the CSR state, so any updates in CSR issues will be found. The CSRIssueBot only needs to react to CSR issue updates going forward. I will try to clarify this in a comment.

> bots/csr/src/main/java/org/openjdk/skara/bots/csr/IssueWorkItem.java line 51:
> 
>> 49:                 .filter(l -> l.relationship().isPresent() && "csr of".equals(l.relationship().get())).findAny();
>> 50:         var issue = link.flatMap(Link::issue);
>> 51:         var mainIssue = issue.flatMap(Backports::findMainIssue);
> 
> These two `flatMap` methods seem can be put into the first stream statement because the variables `link` and `issue` never be used twice.

That is true. I believe I left them out to more easily set breakpoints to verify state

-------------

PR: https://git.openjdk.org/skara/pull/1357