RFR: 2491: GitLabRepository::recentCommitComments can miss commits

Erik Joelsson erikj at openjdk.org
Fri May 2 22:41:00 UTC 2025


The mechanism for polling GitLab for commit comments has a flaw that may cause it to start missing relevant comments. The underlying issue is that we are trusting timestamps on commits which may originate from a client machine, where the system clock may be wrong.

If a user changes their system clock to a date in the future and commits, that commit ends up having a future date. If this commit is then pushed to their personal fork and included in a merge request, then this commit will end up in a pr/X branch in the main/parent repository.

The rather convoluted mechanism for polling for commit comments (for the purpose of finding commit commands) for GitLab involves maintaining a mapping from commit title message to commit hash. This map is needed because the only way to find comments globally for a repository is to query for "events", and the event object only has the commit title, not the hash. Then for all possibly relevant events, we look up the commits in the map.

This map is initially generated from a local repository using a git command like this: "git rev-list <branch list>". After the first initialization, it is then updated using API calls to the server, querying for commits after a timestamp. The timestamp is the latest found already in the map. This is where the flaw lies. There is no guarantee that the timestamps are ever increasing, so once a commit with a faulty future timestamp ends up in the map, no new commits will ever be added when attempting to update it.

When I wrote this code, I think I assumed the timestamps would be strictly increasing because all official commits are generated by Skara. That still holds true for commits in repos and branches where we have strict Skara controlled pull requests. The problem here is that the API query does not limit commits to the specified branches, it gets all commits after the timestamp, which includes commits in pr/X branches as well as unreferenced commits.

A quick solution here would be to change the query to use branches. That would require doing a call per branch. I think a better solution would be to stick to one source for commits to the map, the local repository. It is already updated before each time we check for new commit comments. The challenge then is to identify the new commits to complement the map without having to rebuild it from scratch each time. I can't think of an exact way to accomplish this, but think we can get away with a good enough approximation. We can count the number of commits in the repo using "git rev-list --count <ref list>". Then when we iterate over the commits from commitMetaData iterator, we only go until the map contains the same number of commits as the repo has. At that point we know we have added all new commits. Most of the time, the rev-list loop (in topo-order) should return all the new commits first. It's only if we have a branched history within a branch that commits could be iterated in a di
 fferent order, but worst case we just reprocess the unique ancestors of in each merge parent before we have passed all new commits.

While working on this I also realized that the current implementation has another flaw in that it assumes the list of branches given will always be the same. This is currently true in the affected bot, but that's not something the HostedRepository API should be assuming. I implemented a simple invalidation of the map in case the set of branches are different from the last call. This isn't an elegant solution but it at least keeps the API clean for now and if we actually need to handle this usecase, we can figure something else out then.

-------------

Commit messages:
 - SKARA-2491

Changes: https://git.openjdk.org/skara/pull/1717/files
  Webrev: https://webrevs.openjdk.org/?repo=skara&pr=1717&range=00
  Issue: https://bugs.openjdk.org/browse/SKARA-2491
  Stats: 125 lines in 7 files changed: 92 ins; 20 del; 13 mod
  Patch: https://git.openjdk.org/skara/pull/1717.diff
  Fetch: git fetch https://git.openjdk.org/skara.git pull/1717/head:pull/1717

PR: https://git.openjdk.org/skara/pull/1717


More information about the skara-dev mailing list