RFR: 1606: PullRequestPoller always processes the last updated MR for GitLab [v4]
Erik Helin
ehelin at openjdk.org
Fri Oct 7 09:21:55 UTC 2022
On Tue, 4 Oct 2022 16:27:49 GMT, Erik Joelsson <erikj at openjdk.org> wrote:
>> Erik Joelsson has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Applying the same fixes to IssuePoller
>
> This change is currently blocking deployment of new changes to Skara bots, unless we back out [SKARA-1590](https://bugs.openjdk.org/browse/SKARA-1590). This is because the PullRequestPoller isn't behaving well enough to actually be used yet. It would certainly be better to get this fix in however, so we can move forward with more efficient polling.
Hey @erikj79, nice work 🎉
First of all, I think your changes are fine, the reasoning and the code checks out. I have a bunch of suggestions, but those you can choose whether you want to pursue or not, they do not have to be made within the scope of this PR. Smaller things to consider:
- in `GitLabMergeRequest`, since you are already caching the `comparisonSnapshot`, should we instead cache the `notes` JSON result? There are several methods in `GitLabMergeRequest` that makes use of the `notes` JSON, then they would be able to utilise this cached result as well (saving a number of requests). This thought can of course be extended to other JSON objects that `GitLabMergeRequest` fetches as well.
- very minor nit, but I would have named `comparisonSnapshot` just `snapshot`, but that is up to you 😄
- more of a question than a comment: the snapshot functionality will mean that we won't even run the bots if a MR hasn't changed, but what is the main cost of running the bots on a MR that hasn't changed? My guess would be that the expensive part are all the requests that the bots have to make in order to figure out that they don't have to do anything, but if we cache more aggressively, would that cost go away? Anyways, just a thought
Now for a bigger experiment 🧑🔬 One annoying thing is that the GitLab REST API doesn't return the comments for a MR when fetching a single MR (unlike GitHub which does). So I played around a bit with the GitLab GraphQL API and think I managed to achieve this:
$ GRAPHQL_TOKEN=<a personal access token with read-api scope>
$ curl -v 'https://<gitlab-domain>/api/graphql' \
--header "Authorization: Bearer $GRAPHQL_TOKEN" \
--header "Content-Type: application/json"
--request POST \
--data '{"query": "{ project(fullPath: "<group-name>/<project-name>") { mergeRequests(state: opened) { nodes { title notes { nodes { id body } } } } } }" }'
GraphQL queries can a look a bit messy when typing the on the command-line for CURL, below is the query a bit nicer formatted:
{
"query":
"{ project(fullPath: "<group-name>/<project-name>") {
mergeRequests(state: opened) {
nodes {
title
notes {
nodes {
id
body
}
}
}
}
}
}"
}
The query would have to be expanded to get include all the fields we use from the two `GET` calls we do today fetch the main MR information and the MR "notes" (comments). I think that can be done and if it can be, then `GitLabRepository::pullRequests` could be updated to utilise this (a new constructor for `GitLabMergeRequest` would also be needed to store the notes, that ties into my previous suggestion of caching the notes).
With the above, and with your great work on the `PullRequestPoller` (and drilling down into why GitLab doesn't honor the "updatedAfter" query parameter) I _think_ we would get away with doing a single POST to GitLab's GraphQL endpoint when checking for updated MRs (since the notes would be included). So if nothing has changed, then all that is going to happen is a single HTTP request, and we can't optimise that away 😆
Feel free to ping me offline if you want to chat about GraphQL and these suggestions. Anyhow, this PR is good to go as it is, but there is also room to make further improvements 👍
-------------
PR: https://git.openjdk.org/skara/pull/1380
More information about the skara-dev
mailing list