RFR: Add tool for importing webrev patches

Mon Jul 8 07:22:22 UTC 2019

On Fri, 5 Jul 2019 15:38:41 GMT, JornVernee via github.com <duke at openjdk.java.net> wrote:

> On Fri, 5 Jul 2019 13:20:51 GMT, Erik Duveblad via github.com <duke at openjdk.java.net> wrote:
> 
>> On Fri, 5 Jul 2019 10:01:57 GMT, JornVernee via github.com <duke at openjdk.java.net> wrote:
>> 
>>> As discussed on the mailing list: https://mail.openjdk.java.net/pipermail/skara-dev/2019-July/000176.html
>>> 
>>> This PR adds a tool for importing webrev `.patch` files, either from a webrev url or local file path into a repository.
>>> 
>>> I've named the tool `wimport` as short for `webrev-import` (I thought the full name sounded a bit too much like a sub-command of `webrev`). This will first checkout a new branch, then apply the patch, and then create a commit for the patch with the message `Imported patch 'NAME'`, similar to what is done by mq. Afterwards a user can just delete the new branch when they're done with e.g. reviewing the webrev. Although, there is also an option to directly apply the webrev without creating a branch or commit.
>>> 
>>> Currently  the following options are supported:
>>> 
>>>     usage: git wimport [options] <webrev url|file path>
>>>         -n, --name NAME  Use NAME as a name for the patch (default is patch file name)
>>>         -f, --file       Input is a file path
>>>         -k, --keep       Keep downloaded patch file in current directory
>>>         -d, --direct     Directly apply patch, without creating a branch or commit
>>> 
>>> ----------------
>>> 
>>> Commits:
>>>  - 7b20ce75:	Wimport prototype
>>> 
>>> Pull request:
>>> http://git.openjdk.java.net/skara/pull/32
>>> 
>>> Webrev:
>>> https://openjdk.github.io/cr/skara/32/webrev.00
>>> 
>>> Patch:
>>> http://git.openjdk.java.net/skara/pull/32.diff
>>> 
>>> Fetch command:
>>> git fetch https://github.com/openjdk/skara.git pull/32/head:pull/32
>> 
>> Thanks Jorn for you contribution!
>> 
>> I just had time to take quick look at the code, but wanted to share some high-level thoughts before diving in deeper. I would suggest splitting this tool into two:
>> 1. git-webrev-apply
>> 2. git-webrev-fetch
>> 
>> The motivation would be to two piggyback on already known git commands apply and fetch and try
>> to match the semantics as good as possible. The first tool, git-webrev-apply is very similar to what you have done in this PR _except_ that it would not do a commit. The second tool, git-webrev-fetch, would be more similar to what you have done here, but here I have few more considerations to make the tool as safe as possible:
>> - we want git-webrev-fetch to apply directly to index (with `git apply --cached`) so we avoid potentially changing files in the user's working tree.
>> - if the user already has a dirty index (`git diff --cached` is not empty) then I would consider two options:
>>   - abort and notify the user that the index is dirty (this differs from how `git fetch` behaves)
>>   - temporarily store the user's dirty index with `git write-tree`, apply the patch with `git apply --cached`, commit (more on how to commit later), restore the user's index with `git read-tree`
>> - I think we can find out a bit more about the webrev. For author and committer we can definitely use the username, `https://cr.openjdk.java.net/~(.*)/`. You could potentially use the `census` module if you want to get the real name for the user (to be able to use "Real Name <username at openjdk.org>" form).
>> - For the commit message I would either go with just the URL of the webrev or try to apply some heuristic to find out if the webrev has corresponding [JBS](https://bugs.openjdk.java.net/) issue.
>> - For committing the dirty index I'm not fully sure yet of the best way. Ideally we want `origin/master` to be the parent since `master` might have advanced due to a user's local commits. One way to achieve this could be to first commit and then rebase the resulting commit onto `origin/master`. Another, perhaps more correct, but also slower, would be to use `git worktree` to create a new temporary worktree from `origin/master` in /tmp, apply the patch, and then commit.
>> - I think we can automatically deduce the name of the ref, for a webrev from a URL <https://cr.openjdk.java.net/~ehelin/JDK-01234567/00> I think the ref `webrevs/ehelin/JDK-01234567/00` is fairly natural. This is somewhat similar to how `pulls/` is used for refs related to pull requests.
>> 
>> Regarding the verbose naming of the tools, I actually intended for them to look like webrev subcommands :smiley: The reason for this is if these two tools turn out well then I think we can repurpose `git-webrev` into having subcommands (similar to `git-pr`):
>> - git webrev generate
>> - git webrev apply <URL>
>> - git webrev fetch <URL>
>> 
>> `generate` would be the default command so that `git webrev` stays backwards compatible (just typing `git webrev` would mean `git webrev generate`). But before embarking on this refactoring I wanted us to start with implementing `git webrev-apply` and `git webrev-fetch`.
>> 
>> I would suggest slightly re-purposing this PR into becoming the PR for `git-webrev-apply` and then tackle `git-webrev-fetch` in another PR.
>> 
>> What do you think?
>> 
>> PR: http://git.openjdk.java.net/skara/pull/32
> 
> Response inline...
> 
>> I just had time to take quick look at the code, but wanted to share some high-level thoughts before diving in deeper. I would suggest splitting this tool into two:
>> 
>>     1. git-webrev-apply
>> 
>>     2. git-webrev-fetch
>> 
>> 
>> The motivation would be to two piggyback on already known git commands apply and fetch and try
>> to match the semantics as good as possible. The first tool, git-webrev-apply is very similar to what you have done in this PR _except_ that it would not do a commit.
> 
> Okay, so it would just be like `git apply`, but with a webrev url as an argument? I suppose it's reasonable to drop support for local `.patch` files in that case, since that behaviour can be achieved by just using `git apply`.
> 
>> The second tool, git-webrev-fetch, would be more similar to what you have done here, but here I have few more considerations to make the tool as safe as possible:
>> 
>>     * we want git-webrev-fetch to apply directly to index (with `git apply --cached`) so we avoid potentially changing files in the user's working tree.
> 
> Oh yeah, good point!
> 
>>     * if the user already has a dirty index (`git diff --cached` is not empty) then I would consider two options:
>>       
>>       * abort and notify the user that the index is dirty (this differs from how `git fetch` behaves)
>>       * temporarily store the user's dirty index with `git write-tree`, apply the patch with `git apply --cached`, commit (more on how to commit later), restore the user's index with `git read-tree`
> 
> Can't the `git read-tree` step create conflicts? I'm a little more in favour of the former option of aborting, then the user can decided either to reset the tree or do a `write-tree` (or something else).
> 
>>     * I think we can find out a bit more about the webrev. For author and committer we can definitely use the username, `https://cr.openjdk.java.net/~(.*)/`. You could potentially use the `census` module if you want to get the real name for the user (to be able to use "Real Name [username at openjdk.org](mailto:username at openjdk.org)" form).
> 
> If were going with deriving more information from the webrev, I'd rather try to find it in the webrev body, we can possibly get:
> 
>  * the author
>  * the revision the webrev was generated on
>  * the branch the webrev is targeting
>  * the associated JBS bug link
>  * the patch file link (already doing this for this PR)
> 
>>     * For the commit message I would either go with just the URL of the webrev or try to apply some heuristic to find out if the webrev has corresponding [JBS](https://bugs.openjdk.java.net/) issue.
> 
> I think some heuristic is good, but we should have a good fallback as well, since we're not guaranteed to find the wanted information. Using the webrev URL as a commit message seems like a good one!
> 
>>     * For committing the dirty index I'm not fully sure yet of the best way. Ideally we want `origin/master` to be the parent since `master` might have advanced due to a user's local commits. One way to achieve this could be to first commit and then rebase the resulting commit onto `origin/master`. Another, perhaps more correct, but also slower, would be to use `git worktree` to create a new temporary worktree from `origin/master` in /tmp, apply the patch, and then commit.
> 
> Well, not all webrevs target the master branch, but I get your point. An easy way to sidestep this problem is to use the current HEAD as the parent (and whichever branch this might be).
> 
>>     * I think we can automatically deduce the name of the ref, for a webrev from a URL https://cr.openjdk.java.net/~ehelin/JDK-01234567/00 I think the ref `webrevs/ehelin/JDK-01234567/00` is fairly natural. This is somewhat similar to how `pulls/` is used for refs related to pull requests.
> 
> Sounds good! Overall I think we can use several heuristics to create a `WebrevMetaData` object that we can then generate this kind of information from that.
> 
>> Regarding the verbose naming of the tools, I actually intended for them to look like webrev subcommands ðŸ˜ƒ The reason for this is if these two tools turn out well then I think we can repurpose `git-webrev` into having subcommands (similar to `git-pr`):
>> 
>>     * git webrev generate
>> 
>>     * git webrev apply
>> 
>>     * git webrev fetch
>> 
>> 
>> `generate` would be the default command so that `git webrev` stays backwards compatible (just typing `git webrev` would mean `git webrev generate`). But before embarking on this refactoring I wanted us to start with implementing `git webrev-apply` and `git webrev-fetch`.
> 
> Okay, sounds great to me! (just having the usual case of new-contributor-syndrome, not trying to step on anyone's toes ðŸ˜„)
> 
>> I would suggest slightly re-purposing this PR into becoming the PR for `git-webrev-apply` and then tackle `git-webrev-fetch` in another PR.
>> 
>> What do you think?
> 
> Sounds good.
> 
> Also, I wanted to ask what you want to do for tests? I noticed there are no other test in the `cli` module yet, so I've tested this PR manually, but maybe now is a good time to start adding them?
> 
> FWIW, not all the tests pass on Windows currently, particularly some tests that rely on unix tools/path-separators, and quite a bit of the `vcs` tests file (I haven't looked into it yet). In the meantime it might be nice to set up some CI on GitHub, e.g. Travis?
> 
> PR: http://git.openjdk.java.net/skara/pull/32

> > The first tool, git-webrev-apply is very similar to what you have done in this PR _except_ that it would
> > not do a commit.
> 
> Okay, so it would just be like `git apply`, but with a webrev url as an argument? I suppose it's reasonable to drop support for local `.patch` files in that case, since that behaviour can be achieved by just using `git apply`.

Yes, `git-webrev-apply` would behave exactly like `git apply` but with a webrev URL as argument. Yes, I would drop support for local `.patch` files for the exact reason you mention. 

> > ```
> > * if the user already has a dirty index (`git diff --cached` is not empty) then I would consider two options:
> >   
> >   * abort and notify the user that the index is dirty (this differs from how `git fetch` behaves)
> >   * temporarily store the user's dirty index with `git write-tree`, apply the patch with `git apply --cached`, commit (more on how to commit later), restore the user's index with `git read-tree`
> > ```
> 
> Can't the `git read-tree` step create conflicts? I'm a little more in favour of the former option of aborting, then the user can decided either to reset the tree or do a `write-tree` (or something else).

I don't think so? But I haven't tried this idea :smile: The first `git read-tree` will empty the index, we are essentially temporarily storing the index as a tree. The `git apply` followed by `git commit` should also result in an empty index. One could experiment with using `git write-tree` and `git commit-tree` instead of `git commit`, but the result should be very similar. So when we do the final `git read-tree` the index should be clean, `HEAD` should be intact,  the worktree should be intact, so I don't think we should get any conflicts?

However, the above will be a little fiddly, so it is probably best to just start with a warning if the index is dirty.

> > ```
> > * I think we can find out a bit more about the webrev. For author and committer we can definitely use the username, `https://cr.openjdk.java.net/~(.*)/`. You could potentially use the `census` module if you want to get the real name for the user (to be able to use "Real Name [username at openjdk.org](mailto:username at openjdk.org)" form).
> > ```
> 
> If were going with deriving more information from the webrev, I'd rather try to find it in the webrev body, we can possibly get:
> 
> * the author
> * the revision the webrev was generated on
> * the branch the webrev is targeting
> * the associated JBS bug link
> * the patch file link (already doing this for this PR)

Good point! I would probably recommend a mix, since not everyone creates webrevs with `-c`, but many contributors use the pattern `~username/ISSUE/` in the URLs.

> > ```
> > * For the commit message I would either go with just the URL of the webrev or try to apply some heuristic to find out if the webrev has corresponding [JBS](https://bugs.openjdk.java.net/) issue.
> > ```
> 
> I think some heuristic is good, but we should have a good fallback as well, since we're not guaranteed to find the wanted information. Using the webrev URL as a commit message seems like a good one!

:+1: 

> > ```
> > * For committing the dirty index I'm not fully sure yet of the best way. Ideally we want `origin/master` to be the parent since `master` might have advanced due to a user's local commits. One way to achieve this could be to first commit and then rebase the resulting commit onto `origin/master`. Another, perhaps more correct, but also slower, would be to use `git worktree` to create a new temporary worktree from `origin/master` in /tmp, apply the patch, and then commit.
> > ```
> 
> Well, not all webrevs target the master branch, but I get your point. An easy way to sidestep this problem is to use the current HEAD as the parent (and whichever branch this might be).
> 
> > ```
> > * I think we can automatically deduce the name of the ref, for a webrev from a URL https://cr.openjdk.java.net/~ehelin/JDK-01234567/00 I think the ref `webrevs/ehelin/JDK-01234567/00` is fairly natural. This is somewhat similar to how `pulls/` is used for refs related to pull requests.
> > ```
> 
> Sounds good! Overall I think we can use several heuristics to create a `WebrevMetaData` object that we can then generate this kind of information from that.

That sounds like a good start, please consider adding it to the `webrev` module.

> > Regarding the verbose naming of the tools, I actually intended for them to look like webrev subcommands smiley The reason for this is if these two tools turn out well then I think we can repurpose `git-webrev` into having subcommands (similar to `git-pr`):
> > ```
> > * git webrev generate
> > 
> > * git webrev apply
> > 
> > * git webrev fetch
> > ```
> > 
> > 
> > `generate` would be the default command so that `git webrev` stays backwards compatible (just typing `git webrev` would mean `git webrev generate`). But before embarking on this refactoring I wanted us to start with implementing `git webrev-apply` and `git webrev-fetch`.
> 
> Okay, sounds great to me! (just having the usual case of new-contributor-syndrome, not trying to step on anyone's toes )

Haha, the only toes you can step on are mine and Robin's, and we step on each other's toes all the time, so they are fairly hardened by now :laughing: Feel free to suggest as dramatic changes as you like!

> > I would suggest slightly re-purposing this PR into becoming the PR for `git-webrev-apply` and then tackle `git-webrev-fetch` in another PR.
> > What do you think?
> 
> Sounds good.
> 
> Also, I wanted to ask what you want to do for tests? I noticed there are no other test in the `cli` module yet, so I've tested this PR manually, but maybe now is a good time to start adding them?

I general we try to put as much logic as possible into the various libraries and then test the libraries (since they are easier to unit test than the CLI tools). The CLI tools are primarily meant to be small programs "gluing" together a few libraries and therefore I have resorted to manual testing. Some of the CLI tools (the ones not requiring a network connection) should be fairly easy to test though, so feel free to add tests to the `cli` module.

> FWIW, not all the tests pass on Windows currently, particularly some tests that rely on unix tools/path-separators, and quite a bit of the `vcs` tests file (I haven't looked into it yet). In the meantime it might be nice to set up some CI on GitHub, e.g. Travis?

Sorry to hear that the tests aren't passing on Windows, they _should_ pass (we are running them on Windows from time to time). We have something in the works regarding CI, but we need a little more time to work on it, so please stay tuned for that.

PR: http://git.openjdk.java.net/skara/pull/32