Request for assistance: Verify and update mailing list filter rules for jdk/jdk in preparation for Skara transition

Erik Helin erik.helin at oracle.com
Fri Aug 28 14:20:08 UTC 2020


Hi all,

Thanks for all the feedback, based on this discussion we have been able 
to make a number of improvements in this area!

But to begin with, it would probably have been good to clarify the 
intended use of this list of rules a bit more clearly: The intent of 
these rules is to create a conservative “minimal” initial set of lists 
that quite likely would want to be included when sending out the review 
request. A good example of this would be the fairly well known rule that 
“all Makefile changes should be reviewed on build-dev” which is readily 
encoded in this format as “build-dev”: “make/“. However, files that are 
shared between several lists and where you select lists based on what 
actually changed should not be on here. You may for example have noticed 
the absence of the various ProblemList.txt files - they certainly fit 
that criteria. But these are the two clear opposites, there are of 
course many places in the jdk source tree where this is much more of a 
gray area.

It is also quite likely that the original list that we posted here was 
dauntingly verbose - it was our hope that it would be reasonably 
straightforward for a subject matter expert to quickly delete a lot of 
the “noise”. Based on the change suggestions received so far, it would 
appear to not have been that unreasonable: the latest version has 
dropped over a hundred lines while still improving accuracy by a fair 
bit. Ideally there should probably only be a handful of rules for any 
given list, which is already true for most, but the rules for lists like 
“core-libs-dev” are certainly not there yet.

We should also mention how the initial list was created - it was *not* 
crated manually by us :) Instead we parsed the RFR e-mails for thousands 
of commits pushed to the jdk/jdk repository. This allowed us to create a 
program that could see the files that were changed by a commit and the 
lists that that the RFR e-mail were sent to. We then manually curated 
the list, but realized that domain experts would make a better job of 
the curation (hence the "request for assitance" e-mail).

We have also validated our approach by comparing the results of the 
rules engine for selecting mailing lists with the last 30 days of RFR 
e-mails for patches targeting the jdk/jdk repository. So far it seems 
like the rules engine perform equally or better than humans at selecting 
mailing lists, but it could also be that we are missing some nuances in 
RFR e-mails that look like they have forgotten to CC at least one 
mailing list.

Still, if you are an experienced contributor, you may not want to have 
these suggestions added only to have to replace them manually with the 
correct lists. To help with that, we have adjusted the “git pr create” 
command-line tool to display the suggestions before the PR is created,
and provide the option to override them. If you create a PR from the 
command-line you can now use the --cc flag:

   $ git pr create --cc=hotspot-dev,build-dev

Furthermore, the "git pr create" command will now ask the user to verify 
the mailing lists before creating the pull request:

   $ git pr create
   The following mailing lists will be CC:d for the "RFR" e-mail:
   - build-dev at openjdk.java.net
   - hotspot-dev at openjdk.java.net

   Do you want to proceed with this mailing list selection? [Y/n]:

A user can opt-in to always use the inferred mailing lists by speciyfing
--cc=auto. This means that it is now *opt-in* to use the mailing lists 
inferred from the patch when creating a PR from the command-line.

When creating a PR from the GitHub web UI, it’s also possible to 
override these suggestions by adding a “/cc” command at the end of the 
PR body. There is also a window of a few minutes to issue a command to 
edit the suggestions after they have been displayed, but before any 
email is sent to any list. The semantics of the “/cc” command have also 
been changed, to ensure that it always overrides the automatic choices.

We would also like to reply to a few of the additional questions raised:

Q: Shouldn't this file with the rules be placed in the jdk/jdk
    repository?
A: Yes, we think so too, but we started out by having it in the skara
    repository. The Skara tooling does not care where the file resides,
    we can easily update the tooling once/if we move the file into the
    jdk/jdk repository.

Q: How will the file be maintained?
A: We imagine that this will be a collective effort once the file has
    landed in the jdk/jdk repository. If a contributor makes a large
    change that moves a lot of files around, then it seems fitting to
    also consider if the rules for choosing mailing lists should perhaps
    be updated.

Q: Why not use the CODEOWNERS file format that GitHub already supports?
A: This is something we considered but rejected for several reasons. The
    first and primary reason is that the rule file is not meant to
    convey ownership, it is merely stating the mailing lists that should
    be notified when certain files change. Secondly we will run into
    problems with GitHub _also_ sending e-mails if we use the CODEOWNERS
    format, something we would not want. Lastly the rules file also
    support specifying mailing lists groups that should be coalesced into
    a single mailing list.

Q: Doesn't other OpenJDK projects that already have transitioned have
    this problem?
A: No, for all other OpenJDK projects there is only one "primary"
    mailing list. For example skara-dev for Skara, amber-dev for Amber
    and so on.

Q: Do we have too many mailing lists?
A: Perhaps, but it is not our intent to propose any changes to the
    number of mailing lists we have.

Q: Will the rules engine will be 100% correct every time?
A: No, but neither are humans when they choose mailing lists. The goal
    of the rules engine, as stated at the start of this e-mail, is to
    choose a reasonable minimal set of mailing lists that a patch should
    be sent to. A contributor can always override the choice via the CLI
    or use the `/cc` pull request command.

Q: Is the automatic selection of mailing lists opt-in or opt-out?
A: It is currently opt-in if you create a PR from the command-line and
    opt-out if you create a PR via a web browser. We believe that many
    experienced contributors will use the CLI tools and there have an
    easy way to control the selected mailing lists. In constrast many
    newcomers are likely to create pull requests via a web browser. As
    stated above, there is a "cooldown" period after a PR has been
    created when the PR author has time to fine-tune the selected mailing
    lists regardless of how the PR was created.

Q: Shouldn't the information about to which mailing list to send a patch
    be on the wiki instead?
A: We strongly prefer to have it in a format and place where the
    information can be processed by programs. This way we can use the
    information to automate and improve upon our workflows.

Again, thanks everyone for the great feedback and to everyone who helped 
out tweaking the rules!

Best regards,
Erik and Robin

On 8/27/20 12:34 PM, Robin Westberg wrote:
> Hi all,
> 
> As part of transitioning the jdk/jdk repository to Git with project Skara, we have created a set of rules that determine which mailing list(s) should be the default recipient of a review request, depending on which files are changed. The initial version of these rules was created by looking at historical commits and matching them with existing mailing list review threads. This has produced a reasonable basis, but it can most certainly be made better with some additional manual curating.
> 
> Therefore, it would be very helpful if people with good knowledge of the various subsystems and source files that make up the JDK would be willing to take a look at these rules, and also suggest improvements where needed. In addition, lists like [1] would also be very useful insofar they exist.
> 
> The current version of these rules is located in a JSON file in the Skara repository at [2]. In order to check the validity of the rules, there is also a CLI tool that can be used to apply it to either a subset of files or existing commits and produce a suggestion [3] [4]. If you are interested in helping out with curating these rules, these are the steps to get started:
> 
> 1. Install the Skara CLI tools: https://wiki.openjdk.java.net/display/SKARA/CLI+Tools
> 2. Locate a suitable clone of the jdk/jdk repository (either Mercurial or Git is fine)
> 3. Change (cd) to the root of your jdk/jdk repository
> 3. Run the “debug mlrules” command on your favorite subset of files, for example like this (use the actual location of jdk.json on your system):
> 
> $ git skara debug mlrules -v ~/git/skara/config/mailinglist/rules/jdk.json src/hotspot/share/jfr/
> Reading rules file...
> src/hotspot/share/jfr/dcmd: [hotspot-jfr-dev]
> src/hotspot/share/jfr/instrumentation/jfrEventClassTransformer.cpp: [hotspot-jfr-dev]
> src/hotspot/share/jfr/instrumentation/jfrEventClassTransformer.hpp: [hotspot-jfr-dev]
> src/hotspot/share/jfr/instrumentation/jfrJvmtiAgent.cpp: [hotspot-jfr-dev, serviceability-dev]
>> Final list suggestion is: [hotspot-jfr-dev, serviceability-dev]
> 
> The command accepts multiple folder and/or file names to make it possible to simulate a potential change to a given set of files:
> 
> $ git skara debug mlrules -v ../skara/config/mailinglist/rules/jdk.json doc/ide.md src/hotspot/cpu/x86/x86.ad src/hotspot/os/linux/gc/z/zNUMA_linux.cpp
> Reading rules file...
> doc: [build-dev]
> src/hotspot/cpu: [hotspot-compiler-dev]
> src/hotspot/os: [hotspot-runtime-dev, hotspot-gc-dev]
> 
> Combined list suggestion: [build-dev, hotspot-compiler-dev, hotspot-gc-dev, hotspot-runtime-dev]
> Final list suggestion is: [build-dev, hotspot-dev]
> 
> If the suggestions look fine, all is well. If not, you are welcome to propose a change to the rules, preferably by editing the jdk.json file [6] and creating a pull request towards the Skara project as described in [5]. Coincidentally, this is the same way that future changes to the jdk/jdk repository will be integrated, so this exercise could also serve as a way of getting started with Git / Skara!
> 
> Best regards,
> Robin
> 
> [1] https://openjdk.java.net/groups/2d/2dawtfiles.html
> [2] https://git.openjdk.java.net/skara/blob/master/config/mailinglist/rules/jdk.json
> 
> [3]
> $ git skara debug mlrules -v ~/git/skara/config/mailinglist/rules/jdk.json src/java.desktop/unix/native
> Reading rules file...
> src/java.desktop/unix/native/common: [2d-dev]
> src/java.desktop/unix/native/include: []
> src/java.desktop/unix/native/libawt: [2d-dev]
> src/java.desktop/unix/native/libawt_headless: [awt-dev]
> src/java.desktop/unix/native/libawt_xawt: [awt-dev]
> src/java.desktop/unix/native/libfontmanager: [2d-dev]
> src/java.desktop/unix/native/libjawt: [awt-dev]
> src/java.desktop/unix/native/libsplashscreen: [awt-dev]
> 
> Combined list suggestion: [2d-dev, awt-dev]
> Final list suggestion is: [2d-dev, awt-dev]
> 
> [4]
> $ git skara debug mlrules -d 30 -v ~/git/skara/config/mailinglist/rules/jdk.json .
> ...
> ✅ [2d-dev, awt-dev, serviceability-dev] c32923e0: 8240487: Cleanup whitespace in .cc, .hh, .m, and .mm files
> ❌ [awt-dev] 7f74c7dd: 8212226: SurfaceManager throws "Invalid Image variant" for MultiResolutionImage (Windows)
>      Suggested lists: [2d-dev, awt-dev]
>      Rules matching unmentioned lists [2d-dev]:
>        src/java.desktop/share/classes/sun/java2d/SunGraphics2D.java - [2d-dev: ^src/java.desktop/share/classes/sun/java2d/]
>> 
> [5] https://wiki.openjdk.java.net/display/SKARA/Skara#Skara-Workflow
> 
> [6]
> The rules are made up of sets of regular expressions for the various mailing lists that are used for reviewing changes going into the JDK. If any rule matches, that mailing list will get a copy of the review request email. For directories containing files that belong to different subsystems, it’s usually a good idea to write the rules in a complementary fashion if possible, so that anything not explicitly mentioned gets a reasonable default. As an example, see these rules for a subset of awt / 2d / i18n files:
> 
> “awt-dev”: "src/java.desktop/share/classes/sun/awt/(?!font|sunhints|color/|font/|geom/|im/|image/|print/)”
> “2d-dev”: "src/java.desktop/share/classes/sun/awt/(font|sunhints|color/|font/|geom/|image/|print/)"
> “I18n-dev”: "src/java.desktop/share/classes/sun/awt/im/“
> 
> In this example, anything not explicitly indicated as belonging to either 2d-dev or i18-dev will be matched by awt-dev.
> 


More information about the jdk-dev mailing list