Advice + proposals regarding automodule naming
rfscholte at apache.org
Wed Jan 18 21:14:33 UTC 2017
I'm getting a JavaOne 2015 déjà vu :)
It seems like you expect there will be a new pom-definition to support
these kind of extra information.
The current POM modelVersion (4.0.0) is not only used by Maven but by a
lot of tools, probably even more than we know of. We wonder if they do XSD
checking, so we must be very, very careful with every adjustment. So
pom-4.0.0 is a fact with all its restrictions. We are working on pom-5.0.0
but we will always make sure there will also be a pom-4.0.0 available
(either pre-generated or runtime transformed) for the current tools. Also,
its definition should work for any software technology, not just for Java.
In the beginning I had the idea of working with new scopes to decide if a
dependency belongs to the modulepath or classpath, but there's a strict
set of scopes in pom-4.0.0, so again no option. And by now I know this is
not required, the info is already there once I can read all module-info
It would have helped if a modular jar had a different extension, so every
can see from the *outside* what kind of jar it is.
There's no such thing as a Maven4 artifact: any artifact is a file (often
jar) with a coordinate and an extra file with dependency declarations.
During dependency resolution all build-information is ignored! The problem
with the module-info file is comparable with the java bytecode version:
you have to go in the jar to get this information.
At the moment I'm pretty far with the maven-compiler-plugin, but now every
dependency acts like an automodule. My next step would probably be to
analyze every module-info file and decide if jars belong to the classpath
or modulepath, only allowing modular jars on the module path because of
On Tue, 17 Jan 2017 23:11:11 +0100, <forax at univ-mlv.fr> wrote:
> i fully agree with you that Maven can not use automatic modules.
> Automatic modules have weird name rules, everything is exported and has
> no dependency itself*, so they are useless if you already have already a
> trove of info like the Maven POM.
> In my opinion, the real question is not how to map existing Maven
> artifacts to Java modules but more,
> how Maven 4 artifacts are mapped to Java modules and then how to make
> the transition between Maven 3 artifacts to Maven 4 artifacts as smooth
> as possible.
> Here is my take on what can be a Maven 4 artifact,
> - a Maven 4 artifact can only depends other Maven 4 artifact (and their
> are some way to see a Maven3 artifact as a Maven 4 artifact if the POM
> is siple enough),
> - a Maven 4 artifact do not allow split packages (a lot of Maven 3
> artifact uses split packages because it's a cool way to do an after the
> fact modularisation
> without changing the name of the module)
> - a Maven 4 artifact info is specified with info extracted from the
> module-info and from the POM
> (version is in the POM, exported packages are in the module-info, ...)
> once you have the precise rules, it will be easier to see how to map a
> Maven 3 artifact to a Maven 4 and what are the compatibility rules.
> * apart if you want to play with configurations that mix modulepath and
> classpath but these kind of configurations are really hard to debug.
> ----- Mail original -----
>> De: "Robert Scholte" <rfscholte at apache.org>
>> À: "Remi Forax" <forax at univ-mlv.fr>
>> Cc: jpms-spec-experts at openjdk.java.net, "Brian Fox"
>> <brianf at sonatype.com>
>> Envoyé: Mardi 17 Janvier 2017 13:04:08
>> Objet: Re: Advice + proposals regarding automodule naming
>> Hi Rémi,
>> In the end every non-jdk.* and non-java.* module in the module-info will
>> be a dependency in your buildtool descriptor. Such module must match
>> exactly one versionless dependency, or conflictId as we call it, which
>> in general the groupId + artifactId (type and classifier are not
>> for this story).
>> By ignoring the groupId a module can referred by multiple dependencies.
>> we can expect collissions. For that reason Brian did a quick scan over
>> Maven Central to count the number of duplicate artifactIds.
>> Here's the artifactIds with 100+ groupIds:
>> maven_artifact_id count(DISTINCT maven_group_id) count(maven_group_id)
>> library 391 6854
>> core 312 8188
>> common 142 5084
>> ui 138 1414
>> In theory I could have a Maven project with 391 'library'-jars on the
>> classpath without any problem. And as long as they are direct
>> I have control over this by simply not adding 'library' as requirement
>> module-info. The issues start when different 'library'-jars are
>> dependencies and when they are marked are required in the module-info
>> of my direct or transitive dependencies.
>> Developers of the 'library'-jars cannot use library as the module name
>> are forced to pick another name. As developer of my project in the end I
>> decide which versions of dependencies are used. If the 'library'-jar
>> a different module name and my dependency is still referring to the old
>> module name, the project can't be built.
>> What I expect is that developers are forced to remove the requirements
>> from their module-info because of the mentioned issues. So instead of
>> increasing the number requirements it will be reduced. For that reason
>> say either use a unique module name from the beginning (GA) or wait
>> a dependency has its own module name before adding it as requirement.
>> As far as I know this is the first time the JDK/JRE decides (proposes) a
>> name for an entity based on another entity. There are no relations
>> method-, class-, or package-names and there doesn't have to be a
>> between the module name and the filename, so please don't try to do so.
>> On Mon, 16 Jan 2017 16:44:03 +0100, Remi Forax <forax at univ-mlv.fr>
>>> Hi Robert,
>>> the problem with automatic modules is more general that just the name,
>>> automatics modules also creates a flat hierarchy which doesn't map well
>>> with the Maven artifact descriptor.
>>> I wonder why you want Maven to use automatic modules, or said
>>> differently Maven has a lot of information about the artifact, why do
>>> you want to forget all these information when fetching a Maven
>>> I think that one problem is that you do not want to create a
>>> module-info.class from the Maven POM and insert it into the jar because
>>> it will change the artifact*.
>>> This kind of modules is supported by jigsaw under the name of synthetic
>>> modules. A synthetic module is a module with a module descriptor not
>>> created by javac but by another tool.
>>> In my opinion, automatic modules are interesting when you have jar that
>>> do not come from Maven central but comes from an ad-hoc build tool and
>>> will be considered as a leaf of the dependency DAG.
>>> Otherwise, for existing module system, using a synthetic module seem to
>>> be a better idea.
>>> * given you have also the problem of split packages, you also need a
>>> to merge several artifacts into one modular jar because it's the easy
>>> way to solve the split package problem.
>>> ----- Mail original -----
>>>> De: "Robert Scholte" <rfscholte at apache.org>
>>>> À: jpms-spec-experts at openjdk.java.net
>>>> Cc: "Apache Maven Dev" <dev at maven.apache.org>
>>>> Envoyé: Lundi 16 Janvier 2017 10:37:08
>>>> Objet: Advice + proposals regarding automodule naming
>>>> This is a message from Robert Scholte and Brian Fox. We both have been
>>>> talking about this topic several weeks with other Maven developers and
>>>> came to the conclusion that we should warn the jigsaw team with their
>>>> current approach regarding auto modules. We will share our
>>>> thoughts, conclusions and will suggest two proposals.
>>>> Traditionally, the Java ecosystem has been very mature in terms of
>>>> and namespacing. The reverse fqdn introduced into the java package
>>>> was a
>>>> great choice to ensure classes don’t conflict. Popular build tools
>>>> Maven and nearly all those that followed built upon that this key
>>>> with the introduction of “GroupId” also using the fqdn as part of the
>>>> to ensure the coordinates were properly namespaced.
>>>> We’ve seen some ecosystems diverge from this leading to new challenges
>>>> that ultimately had to be reversed. A great example can be seen in
>>>> the “
>>>> tragic mistake from npm creators ”  which was to launch without a
>>>> namespace concept. Eventually, NPM started running out of useful names
>>>> had to backtrack to introduce “scopes” which is really just a
>>>> . The real problem here is that the major change in namespace was
>>>> backed in after several years of momentum without it. It’s taken a
>>>> time for tooling and best practice to catch up to scopes and in the
>>>> interim, people have been left with a dual mode, some namespaced, some
>>>> namespaced situation that has created chaos. 
>>>> The real issue at hand here as we consider behaviors in the jigsaw
>>>> automodule revolves around two well studied concepts.
>>>> The most important is the “Default effect”  which states that
>>>> the default behavior is will become the most prominent best practice.
>>>> default that uses a filename to generate a very short, un-namespaced
>>>> module id effectively sets the behavior to create generic names that
>>>> eventually conflict...exactly what we’ve seen in npm.
>>>> Additionally, The switching costs introduced in overcoming a default
>>>> un-namespaced module id to one with a unique namespace is also
>>>> once you consider all the potential users. This is why API change is
>>>> and changing the module id after the fact from the default is
>>>> an API change.
>>>> The second principal at hand is the “Principle of least astonishment”.
>>>> want to find a default that doesn’t violate what most users would
>>>> to be the most obvious. One could argue the current auto module
>>>> doesn’t violate this principle, but it’s important to consider
>>>> suggestions in this light.
>>>> First, lets explore the potential downsides if the default effect
>>>> hold with the currently generated auto module id. In Apache Maven, the
>>>> artifact id is the part of the coordinate that generates the filename.
>>>> This means that com.somecompany:artifact:version will become
>>>> artifact-version.jar, which would result in automodule id “artifact”.
>>>> Armed with this understanding, that does an analysis of the Maven
>>>> ecosystem have to say about potential conflicts in the automodule id?
>>>> If we ignore the groupid and version of all the components in the
>>>> Central repository, we end up with over 13,500 (7% of the total
>>>> group:artifact combinations) conflicts. This does not consider
>>>> across other repositories, or within customer portfolios yet it is
>>>> telling. Conflicts will happen. In some cases, the number of conflicts
>>>> the same common names is well above 100. The list of conflicts as of
>>>> October, 2016 can be seen here. 
>>>> At this point, hopefully we’ve made the case for at least
>>>> establishing a
>>>> default module id that
>>>> 1. Uses namespaces to minimizes id conflicts when possible
>>>> 2. Leverages the default effect to create a de facto best practice
>>>> 3. Follows the principle of least astonishment
>>>> We have two potential proposals that solve these goals.
>>>> Proposal 1: Leverage existing coordinates when available.
>>>> Maven is inarguably the most popular build system for Java components,
>>>> with Maven Central being the default and largest repository of Java
>>>> components in the world. By default, every jar built by Maven
>>>> automatically gets a simple properties file inserted into it with its
>>>> unique coordinates. Now, not every jar in Central was built with
>>>> however 94% of them were, as we can find the pom.properties file in
>>>> 1,806,023 of the 1,913,561 central components . Talk about the default
>>>> effect in action!
>>>> It’s further important to recognize that given a jar with a
>>>> declaring coordinates, it means that the project itself has chosen
>>>> coordinates as their own name. In other words, this is how they refer
>>>> themselves, even if other consumers may not be using Maven directly.
>>>> If automodule were able to peek inside a jar and generate the default
>>>> using the groupid and artifactid present in the file, this would
>>>> eliminate all instances of id conflict because a significant portion
>>>> the Java ecosystem is in fact built with Maven. Additionally, the fact
>>>> that 1.8 million (and counting) modules would have namespace as the
>>>> default behavior means we’ve taken a huge step in setting the best
>>>> practice of picking module ids with a namepace. Additionally, since
>>>> project itself has chosen these coordinates and uses them as their
>>>> distribution mechanism, this follows the principle of least
>>>> to consumers regardless of their chosen build system. Finally, since
>>>> of the above are true, it’s unlikely the project would need to migrate
>>>> a new module id when they adopt jigsaw natively, thus avoiding an API
>>>> switching cost for their users.
>>>> Proposal 2: Drop automodules
>>>> Right now Jigsaw tries to calculate a module name solely based on the
>>>> of the jar file, which now already causes issues. Besides the fact
>>>> the module name is not guaranteed unique compared with its Maven
>>>> coordinate, there are extra transformations which makes it even less
>>>> guaranteed that it is unique; e.g. dashes are replaced by dots (which
>>>> both valid artifactId characters), in some cases the number and their
>>>> following characters are stripped off. For artifacts like
>>>> jboss-servlet-api_4.0_spec it makes sense, however we already see
>>>> here where commons-lang, commons-lang2 and commons-lang3 get the same
>>>> module name,
>>>> even though they have different artifactIds and contain different
>>>> packages. Choosing different artifactIds and packages was a very wise
>>>> decision because it made it possible that these jars could live next
>>>> each other. Removing that separation by the authors is a very unwise
>>>> Another known example is the jsrNNN jars, which now all get jsr as the
>>>> module name.
>>>> Is it highly unlikely there is one single rule to capture all the use
>>>> cases and which always result in a module name we can work with.
>>>> For that reason the other proposal is to simply drop automodules.
>>>> try to come up with a name for unnamed jars. It might look like the
>>>> feature of automodules makes migrating easier because every dependency
>>>> will get a name so can complete your module-info for all requirements,
>>>> we expect that once Jigsaw comes to speed the invalid module names are
>>>> actually blocking further development due to name collisions or forced
>>>> renaming by transitive modular jars.
>>>> The advantage of this proposal is that library builders are not forced
>>>> keep the proposed module name in order to maintain backwards
>>>> with the default.. Instead library builders can pick a more suitable
>>>> module name. The modular system doesn’t allow the same package to be
>>>> exported by multiple jars (and automodules exports every package).
>>>> builders can fix this is their new jars, however if end users would
>>>> require both jars because they were specified as requirements in
>>>> transitive jars, you cannot compile this project. There’s just no
>>>> dependency-excludes like Maven has, because “requires” in the
>>>> really means requires. Dropping automodules will prevent these kind of
>>>> issues, because a package can only be exported by a named module.
>>>> Sure, this means that for end users they cannot refer to every jar in
>>>> their module-info. But at least if they add a “requires” to their
>>>> module-info, they can ensure that it’ll always refer to the intended
>>>> modular jar. With build tools like Maven the chance of missing
>>>> on the classpath has already been reduced a lot. In general builds
>>>> become quite stable, so we don’t expect that developers will translate
>>>> dependencies to the module-info file, especially if we warn them about
>>>> possible consequences of depending on automodules. Only referring to
>>>> modules and even a single “requires” is already a gain. There’s no
>>>> to try to speed this up and give the developer the false impression
>>>> it’ll keep working when upgrading to real modular jars. Focus should
>>>> the target, not on the path how to reach it.
>>>> Dropping the automodules will prevent a lot of discussions about what
>>>> the correct way to select a module name and will give the
>>>> for the name back to the place where it belongs: the developer.
>>>>  The fact that so much of the npm ecosystem is effectively
>>>> not-namespaced is has actually
>>>> created potential build time malware injection possibilities. If I
>>>> a package in use by a
>>>> company through log analysis, bug report analysis etc, I could
>>>> go register the same
>>>> name in the default repo with a very high semver and know that it’s
>>>> likely this would be
>>>> picked up over the intended internally developed module because
>>>>  https://en.wikipedia.org/wiki/Default_effect_(psychology)
>>>>  https://en.wikipedia.org/wiki/Principle_of_least_astonishment
>>>>  http://openjdk.java.net/jeps/261 #Risk and assumptions
More information about the jpms-spec-observers