RFR: 2491: GitLabRepository::recentCommitComments can miss commits

Zhao Song zsong at openjdk.org
Mon May 5 16:47:00 UTC 2025


On Fri, 2 May 2025 21:46:03 GMT, Erik Joelsson <erikj at openjdk.org> wrote:

> The mechanism for polling GitLab for commit comments has a flaw that may cause it to start missing relevant comments. The underlying issue is that we are trusting timestamps on commits which may originate from a client machine, where the system clock may be wrong.
> 
> If a user changes their system clock to a date in the future and commits, that commit ends up having a future date. If this commit is then pushed to their personal fork and included in a merge request, then this commit will end up in a pr/X branch in the main/parent repository.
> 
> The rather convoluted mechanism for polling for commit comments (for the purpose of finding commit commands) for GitLab involves maintaining a mapping from commit title message to commit hash. This map is needed because the only way to find comments globally for a repository is to query for "events", and the event object only has the commit title, not the hash. Then for all possibly relevant events, we look up the commits in the map.
> 
> This map is initially generated from a local repository using a git command like this: "git rev-list <branch list>". After the first initialization, it is then updated using API calls to the server, querying for commits after a timestamp. The timestamp is the latest found already in the map. This is where the flaw lies. There is no guarantee that the timestamps are ever increasing, so once a commit with a faulty future timestamp ends up in the map, no new commits will ever be added when attempting to update it.
> 
> When I wrote this code, I think I assumed the timestamps would be strictly increasing because all official commits are generated by Skara. That still holds true for commits in repos and branches where we have strict Skara controlled pull requests. The problem here is that the API query does not limit commits to the specified branches, it gets all commits after the timestamp, which includes commits in pr/X branches as well as unreferenced commits.
> 
> A quick solution here would be to change the query to use branches. That would require doing a call per branch. I think a better solution would be to stick to one source for commits to the map, the local repository. It is already updated before each time we check for new commit comments. The challenge then is to identify the new commits to complement the map without having to rebuild it from scratch each time. I can't think of an exact way to accomplish this, but think we can get away with a good enough approximation. We can count the numb...

Looks good overall!

forge/src/main/java/org/openjdk/skara/forge/gitlab/GitLabRepository.java line 647:

> 645:             Set<Branch> branchSet = Set.copyOf(branches);
> 646:             if (!branchSet.equals(commitMapBranchSet)) {
> 647:                 log.info("Invalidating commitTitleToCommits map for branch set: " + branchSet + " old set: " + commitMapBranchSet);

Since commitMapBranchSet doesn't have an initial value, this log will also be printed during the initial run. Is this intentional?

-------------

PR Review: https://git.openjdk.org/skara/pull/1717#pullrequestreview-2815422159
PR Review Comment: https://git.openjdk.org/skara/pull/1717#discussion_r2073795369


More information about the skara-dev mailing list