Integrated: 1406: Two GitHub APIs don't return `patch` field when `per_page` argument exceeds 70

Guoxiong Li gli at
Fri Apr 22 17:56:05 UTC 2022

On Fri, 22 Apr 2022 07:38:59 GMT, Guoxiong Li <gli at> wrote:

> Hi all,
> When the bot checks whether a backport is clean, it would use the [get-a-commit]( api to get the patch files of the original commit and use the [list-pull-requests-files]( api to get the patch files of the backport PR, and then compare them.
> But the [get-a-commit]( api has the following drawbacks:
> 1. When the `per_page` argument exceeds 70, the files after 70th don't have the `patch` field. Please see the link [1] with `per_page` as 71, the final file, 71th, doesn't have the `patch` field. And the default `per_page` of this api is **300**, so if we don't provide a `per_page` and the total files number exceeds 70, we get the wrong result without `patch` field as well [2].
> 2. At most 3000 patch files of a commit are returned by the api. If the files number exceeds 3000, such as [3], the api returns only 3000 files. The class `RestRequest` gets the pagination information duplicately, but finally only get 3000 files. (The test case of this patch can prove it.)
> 3. If a patch file of the commit is too large (not exactly know the api how to judge "too large"), the api doesn't return the patch file as well. Please use the link [1] and find `modules/javafx.web/src/main/native/Source/ThirdParty/libxml/src/ChangeLog` or `modules/javafx.web/src/main/native/Source/ThirdParty/libxml/src/HTMLparser.c`, you can identify that they don't have the `patch` field.
> The [list-pull-requests-files]( api has the same drawbacks, but its default `per_page` is **30** instead of **300**, so the first drawback above doesn't happen in this api **now**.
> This patch adds the `per_page` argument to these two api so that we can solve the first drawback. Note: considering the default `per_page` of the api [list-pull-requests-files]( will change in the future (may exceed 70), it is good to also provide a explicit `per_page` for it.
> I can't solve second and third drawbacks now. A possible alternative is using the media-type [5] `application/vnd.github.VERSION.diff` or `application/vnd.github.VERSION.patch` to get the meta `diff` or `patch` data and parse them. But the current paser classes `GitRawDiffParser` and `UnifiedDiffParser` can't parse such meta diff or patch (the format is not suitable). It needs many changes to the code about vcs, so I don't think it is a short term solution.
> As a conclusion, this patch can solve the situations provided by kevin in SKARA-1332. But it can't identify the difference of the "too large" files and the files which exceeds 3000. Note: this shortcoming has already existed in SKARA and is not produced by this patch.
> Best Regards,
> -- Guoxiong
> [1]
> [2]
> [3]
> [4]
> [5]

This pull request has now been integrated.

Changeset: 528895bf
Author:    Guoxiong Li <gli at>
Committer: Erik Joelsson <erikj at>
Stats:     106 lines in 3 files changed: 103 ins; 0 del; 3 mod

1406: Two GitHub APIs don't return `patch` field when `per_page` argument exceeds 70

Reviewed-by: erikj



More information about the skara-dev mailing list