Request for assistance: Verify and update mailing list filter rules for jdk/jdk in preparation for Skara transition
Magnus Ihse Bursie
magnus.ihse.bursie at oracle.com
Fri Aug 28 15:01:23 UTC 2020
On 2020-08-28 16:20, Erik Helin wrote:
> Hi all,
>
> Thanks for all the feedback, based on this discussion we have been
> able to make a number of improvements in this area!
>
> But to begin with, it would probably have been good to clarify the
> intended use of this list of rules a bit more clearly: The intent of
> these rules is to create a conservative “minimal” initial set of lists
> that quite likely would want to be included when sending out the
> review request. A good example of this would be the fairly well known
> rule that “all Makefile changes should be reviewed on build-dev” which
> is readily encoded in this format as “build-dev”: “make/“. However,
> files that are shared between several lists and where you select lists
> based on what actually changed should not be on here. You may for
> example have noticed the absence of the various ProblemList.txt files
> - they certainly fit that criteria. But these are the two clear
> opposites, there are of course many places in the jdk source tree
> where this is much more of a gray area.
>
> It is also quite likely that the original list that we posted here was
> dauntingly verbose - it was our hope that it would be reasonably
> straightforward for a subject matter expert to quickly delete a lot of
> the “noise”. Based on the change suggestions received so far, it would
> appear to not have been that unreasonable: the latest version has
> dropped over a hundred lines while still improving accuracy by a fair
> bit. Ideally there should probably only be a handful of rules for any
> given list, which is already true for most, but the rules for lists
> like “core-libs-dev” are certainly not there yet.
>
> We should also mention how the initial list was created - it was *not*
> crated manually by us :) Instead we parsed the RFR e-mails for
> thousands of commits pushed to the jdk/jdk repository. This allowed us
> to create a program that could see the files that were changed by a
> commit and the lists that that the RFR e-mail were sent to. We then
> manually curated the list, but realized that domain experts would make
> a better job of the curation (hence the "request for assitance" e-mail).
>
> We have also validated our approach by comparing the results of the
> rules engine for selecting mailing lists with the last 30 days of RFR
> e-mails for patches targeting the jdk/jdk repository. So far it seems
> like the rules engine perform equally or better than humans at
> selecting mailing lists, but it could also be that we are missing some
> nuances in RFR e-mails that look like they have forgotten to CC at
> least one mailing list.
>
> Still, if you are an experienced contributor, you may not want to have
> these suggestions added only to have to replace them manually with the
> correct lists. To help with that, we have adjusted the “git pr create”
> command-line tool to display the suggestions before the PR is created,
> and provide the option to override them. If you create a PR from the
> command-line you can now use the --cc flag:
>
> $ git pr create --cc=hotspot-dev,build-dev
>
> Furthermore, the "git pr create" command will now ask the user to
> verify the mailing lists before creating the pull request:
>
> $ git pr create
> The following mailing lists will be CC:d for the "RFR" e-mail:
> - build-dev at openjdk.java.net
> - hotspot-dev at openjdk.java.net
>
> Do you want to proceed with this mailing list selection? [Y/n]:
>
> A user can opt-in to always use the inferred mailing lists by speciyfing
> --cc=auto. This means that it is now *opt-in* to use the mailing lists
> inferred from the patch when creating a PR from the command-line.
>
> When creating a PR from the GitHub web UI, it’s also possible to
> override these suggestions by adding a “/cc” command at the end of the
> PR body. There is also a window of a few minutes to issue a command to
> edit the suggestions after they have been displayed, but before any
> email is sent to any list. The semantics of the “/cc” command have
> also been changed, to ensure that it always overrides the automatic
> choices.
Erik,
I think this all sounds great!
While a lot of the developers know about the rules for cc:ing build-dev
for making changes, not everyone does. And from time to time, even the
most experienced developers forget to include build-dev. So I've been
waiting for something like this for a very long time...
With the latest changes you describe, I also think you have managed to
strike a good balance between ease-of-use and power user control.
I'm not sure the rules are really all the way yet, but I don't think
they have to be perfect for this to be helpful. Let's get this thing
started, and if we get suprises with omissions or too many included
lists, we can adjust the rules as we go. In fact, fixing problems with
the correspondence between code are and mailing lists will be sooo much
easier to do if all you need to do is to edit a single text file,
instead of re-educating a horde of stubborn-minded JDK developers! :-o
/Magnus
>
> We would also like to reply to a few of the additional questions raised:
>
> Q: Shouldn't this file with the rules be placed in the jdk/jdk
> repository?
> A: Yes, we think so too, but we started out by having it in the skara
> repository. The Skara tooling does not care where the file resides,
> we can easily update the tooling once/if we move the file into the
> jdk/jdk repository.
>
> Q: How will the file be maintained?
> A: We imagine that this will be a collective effort once the file has
> landed in the jdk/jdk repository. If a contributor makes a large
> change that moves a lot of files around, then it seems fitting to
> also consider if the rules for choosing mailing lists should perhaps
> be updated.
>
> Q: Why not use the CODEOWNERS file format that GitHub already supports?
> A: This is something we considered but rejected for several reasons. The
> first and primary reason is that the rule file is not meant to
> convey ownership, it is merely stating the mailing lists that should
> be notified when certain files change. Secondly we will run into
> problems with GitHub _also_ sending e-mails if we use the CODEOWNERS
> format, something we would not want. Lastly the rules file also
> support specifying mailing lists groups that should be coalesced into
> a single mailing list.
>
> Q: Doesn't other OpenJDK projects that already have transitioned have
> this problem?
> A: No, for all other OpenJDK projects there is only one "primary"
> mailing list. For example skara-dev for Skara, amber-dev for Amber
> and so on.
>
> Q: Do we have too many mailing lists?
> A: Perhaps, but it is not our intent to propose any changes to the
> number of mailing lists we have.
>
> Q: Will the rules engine will be 100% correct every time?
> A: No, but neither are humans when they choose mailing lists. The goal
> of the rules engine, as stated at the start of this e-mail, is to
> choose a reasonable minimal set of mailing lists that a patch should
> be sent to. A contributor can always override the choice via the CLI
> or use the `/cc` pull request command.
>
> Q: Is the automatic selection of mailing lists opt-in or opt-out?
> A: It is currently opt-in if you create a PR from the command-line and
> opt-out if you create a PR via a web browser. We believe that many
> experienced contributors will use the CLI tools and there have an
> easy way to control the selected mailing lists. In constrast many
> newcomers are likely to create pull requests via a web browser. As
> stated above, there is a "cooldown" period after a PR has been
> created when the PR author has time to fine-tune the selected mailing
> lists regardless of how the PR was created.
>
> Q: Shouldn't the information about to which mailing list to send a patch
> be on the wiki instead?
> A: We strongly prefer to have it in a format and place where the
> information can be processed by programs. This way we can use the
> information to automate and improve upon our workflows.
>
> Again, thanks everyone for the great feedback and to everyone who
> helped out tweaking the rules!
>
> Best regards,
> Erik and Robin
>
> On 8/27/20 12:34 PM, Robin Westberg wrote:
>> Hi all,
>>
>> As part of transitioning the jdk/jdk repository to Git with project
>> Skara, we have created a set of rules that determine which mailing
>> list(s) should be the default recipient of a review request,
>> depending on which files are changed. The initial version of these
>> rules was created by looking at historical commits and matching them
>> with existing mailing list review threads. This has produced a
>> reasonable basis, but it can most certainly be made better with some
>> additional manual curating.
>>
>> Therefore, it would be very helpful if people with good knowledge of
>> the various subsystems and source files that make up the JDK would be
>> willing to take a look at these rules, and also suggest improvements
>> where needed. In addition, lists like [1] would also be very useful
>> insofar they exist.
>>
>> The current version of these rules is located in a JSON file in the
>> Skara repository at [2]. In order to check the validity of the rules,
>> there is also a CLI tool that can be used to apply it to either a
>> subset of files or existing commits and produce a suggestion [3] [4].
>> If you are interested in helping out with curating these rules, these
>> are the steps to get started:
>>
>> 1. Install the Skara CLI tools:
>> https://wiki.openjdk.java.net/display/SKARA/CLI+Tools
>> 2. Locate a suitable clone of the jdk/jdk repository (either
>> Mercurial or Git is fine)
>> 3. Change (cd) to the root of your jdk/jdk repository
>> 3. Run the “debug mlrules” command on your favorite subset of files,
>> for example like this (use the actual location of jdk.json on your
>> system):
>>
>> $ git skara debug mlrules -v
>> ~/git/skara/config/mailinglist/rules/jdk.json src/hotspot/share/jfr/
>> Reading rules file...
>> src/hotspot/share/jfr/dcmd: [hotspot-jfr-dev]
>> src/hotspot/share/jfr/instrumentation/jfrEventClassTransformer.cpp:
>> [hotspot-jfr-dev]
>> src/hotspot/share/jfr/instrumentation/jfrEventClassTransformer.hpp:
>> [hotspot-jfr-dev]
>> src/hotspot/share/jfr/instrumentation/jfrJvmtiAgent.cpp:
>> [hotspot-jfr-dev, serviceability-dev]
>> …
>> Final list suggestion is: [hotspot-jfr-dev, serviceability-dev]
>>
>> The command accepts multiple folder and/or file names to make it
>> possible to simulate a potential change to a given set of files:
>>
>> $ git skara debug mlrules -v
>> ../skara/config/mailinglist/rules/jdk.json doc/ide.md
>> src/hotspot/cpu/x86/x86.ad src/hotspot/os/linux/gc/z/zNUMA_linux.cpp
>> Reading rules file...
>> doc: [build-dev]
>> src/hotspot/cpu: [hotspot-compiler-dev]
>> src/hotspot/os: [hotspot-runtime-dev, hotspot-gc-dev]
>>
>> Combined list suggestion: [build-dev, hotspot-compiler-dev,
>> hotspot-gc-dev, hotspot-runtime-dev]
>> Final list suggestion is: [build-dev, hotspot-dev]
>>
>> If the suggestions look fine, all is well. If not, you are welcome to
>> propose a change to the rules, preferably by editing the jdk.json
>> file [6] and creating a pull request towards the Skara project as
>> described in [5]. Coincidentally, this is the same way that future
>> changes to the jdk/jdk repository will be integrated, so this
>> exercise could also serve as a way of getting started with Git / Skara!
>>
>> Best regards,
>> Robin
>>
>> [1] https://openjdk.java.net/groups/2d/2dawtfiles.html
>> [2]
>> https://git.openjdk.java.net/skara/blob/master/config/mailinglist/rules/jdk.json
>>
>> [3]
>> $ git skara debug mlrules -v
>> ~/git/skara/config/mailinglist/rules/jdk.json
>> src/java.desktop/unix/native
>> Reading rules file...
>> src/java.desktop/unix/native/common: [2d-dev]
>> src/java.desktop/unix/native/include: []
>> src/java.desktop/unix/native/libawt: [2d-dev]
>> src/java.desktop/unix/native/libawt_headless: [awt-dev]
>> src/java.desktop/unix/native/libawt_xawt: [awt-dev]
>> src/java.desktop/unix/native/libfontmanager: [2d-dev]
>> src/java.desktop/unix/native/libjawt: [awt-dev]
>> src/java.desktop/unix/native/libsplashscreen: [awt-dev]
>>
>> Combined list suggestion: [2d-dev, awt-dev]
>> Final list suggestion is: [2d-dev, awt-dev]
>>
>> [4]
>> $ git skara debug mlrules -d 30 -v
>> ~/git/skara/config/mailinglist/rules/jdk.json .
>> ...
>> ✅ [2d-dev, awt-dev, serviceability-dev] c32923e0: 8240487: Cleanup
>> whitespace in .cc, .hh, .m, and .mm files
>> ❌ [awt-dev] 7f74c7dd: 8212226: SurfaceManager throws "Invalid Image
>> variant" for MultiResolutionImage (Windows)
>> Suggested lists: [2d-dev, awt-dev]
>> Rules matching unmentioned lists [2d-dev]:
>> src/java.desktop/share/classes/sun/java2d/SunGraphics2D.java -
>> [2d-dev: ^src/java.desktop/share/classes/sun/java2d/]
>> …
>>
>> [5] https://wiki.openjdk.java.net/display/SKARA/Skara#Skara-Workflow
>>
>> [6]
>> The rules are made up of sets of regular expressions for the various
>> mailing lists that are used for reviewing changes going into the JDK.
>> If any rule matches, that mailing list will get a copy of the review
>> request email. For directories containing files that belong to
>> different subsystems, it’s usually a good idea to write the rules in
>> a complementary fashion if possible, so that anything not explicitly
>> mentioned gets a reasonable default. As an example, see these rules
>> for a subset of awt / 2d / i18n files:
>>
>> “awt-dev”:
>> "src/java.desktop/share/classes/sun/awt/(?!font|sunhints|color/|font/|geom/|im/|image/|print/)”
>> “2d-dev”:
>> "src/java.desktop/share/classes/sun/awt/(font|sunhints|color/|font/|geom/|image/|print/)"
>> “I18n-dev”: "src/java.desktop/share/classes/sun/awt/im/“
>>
>> In this example, anything not explicitly indicated as belonging to
>> either 2d-dev or i18-dev will be matched by awt-dev.
>>
More information about the jdk-dev
mailing list