RFR: 1332: Skara sometimes fails to detect that a backport was clean

Erik Joelsson erikj at openjdk.java.net
Fri Apr 22 14:03:12 UTC 2022


On Fri, 22 Apr 2022 07:38:59 GMT, Guoxiong Li <gli at openjdk.org> wrote:

> Hi all,
> 
> When the bot checks whether a backport is clean, it would use the [get-a-commit](https://docs.github.com/en/rest/commits/commits#get-a-commit) api to get the patch files of the original commit and use the [list-pull-requests-files](https://docs.github.com/en/rest/pulls/pulls#list-pull-requests-files) api to get the patch files of the backport PR, and then compare them.
> 
> But the [get-a-commit](https://docs.github.com/en/rest/commits/commits#get-a-commit) api has the following drawbacks:
> 
> 1. When the `per_page` argument exceeds 70, the files after 70th don't have the `patch` field. Please see the link [1] with `per_page` as 71, the final file, 71th, doesn't have the `patch` field. And the default `per_page` of this api is **300**, so if we don't provide a `per_page` and the total files number exceeds 70, we get the wrong result without `patch` field as well [2].
> 2. At most 3000 patch files of a commit are returned by the api. If the files number exceeds 3000, such as [3], the api returns only 3000 files. The class `RestRequest` gets the pagination information duplicately, but finally only get 3000 files. (The test case of this patch can prove it.)
> 3. If a patch file of the commit is too large (not exactly know the api how to judge "too large"), the api doesn't return the patch file as well. Please use the link [1] and find `modules/javafx.web/src/main/native/Source/ThirdParty/libxml/src/ChangeLog` or `modules/javafx.web/src/main/native/Source/ThirdParty/libxml/src/HTMLparser.c`, you can identify that they don't have the `patch` field.
> 
> The [list-pull-requests-files](https://docs.github.com/en/rest/pulls/pulls#list-pull-requests-files) api has the same drawbacks, but its default `per_page` is **30** instead of **300**, so the first drawback above doesn't happen in this api **now**.
> 
> This patch adds the `per_page` argument to these two api so that we can solve the first drawback. Note: considering the default `per_page` of the api [list-pull-requests-files](https://docs.github.com/en/rest/pulls/pulls#list-pull-requests-files) will change in the future (may exceed 70), it is good to also provide a explicit `per_page` for it.
> 
> I can't solve second and third drawbacks now. A possible alternative is using the media-type [5] `application/vnd.github.VERSION.diff` or `application/vnd.github.VERSION.patch` to get the meta `diff` or `patch` data and parse them. But the current paser classes `GitRawDiffParser` and `UnifiedDiffParser` can't parse such meta diff or patch (the format is not suitable). It needs many changes to the code about vcs, so I don't think it is a short term solution.
> 
> As a conclusion, this patch can solve the situations provided by kevin in SKARA-1332. But it can't identify the difference of the "too large" files and the files which exceeds 3000. Note: this shortcoming has already existed in SKARA and is not produced by this patch.
> 
> Best Regards,
> -- Guoxiong
> 
> [1] https://api.github.com/repos/openjdk/jfx/commits/b0f2521219efc1b0d0c45088736d5105712bc2c9?&per_page=71
> [2] https://api.github.com/repos/openjdk/jfx/commits/b0f2521219efc1b0d0c45088736d5105712bc2c9
> [3] https://git.openjdk.java.net/jfx/commit/6f28d912024495278c4c35ab054bc2aab480b3e4
> [4] https://api.github.com/repos/openjdk/jfx/commits/6f28d912024495278c4c35ab054bc2aab480b3e4
> [5] https://docs.github.com/en/rest/overview/media-types#commits-commit-comparison-and-pull-requests

This looks like a good change. Thanks for identifying these causes for failing to get patch information from Github. In addition to the cases you have identified here which aren't solved, most of the examples given in the original bug were hidden from you as they pertain to a private Gitlab instance. So until we can address those, I would like the original bug open. Could you file a new bug describing the particular problem you are solving here and change this PR to reference that bug? Please link it to the existing bug and add a comment in the original explaining that part of the problem is being solved in the new bug.

forge/src/main/java/org/openjdk/skara/forge/github/GitHubPullRequest.java line 740:

> 738:     public Diff diff() {
> 739:         var files = request.get("pulls/" + json.get("number").toString() + "/files")
> 740:                            .param("per_page", "50")

Can we add a comment explaining the need for per_page? Something like:

"Need to specify an explicit per_page < 70 to guarantee that we get patch information in the result set."

-------------

PR: https://git.openjdk.java.net/skara/pull/1302


More information about the skara-dev mailing list